提取 numpy 数组中的特定列

发布于 2024-12-19 19:55:55 字数 197 浏览 0 评论 0原文

这是一个简单的问题,但假设我有一个 MxN 矩阵。我想做的就是提取特定列并将它们存储在另一个 numpy 数组中,但我收到无效的语法错误。 这是代码:

extractedData = data[[:,1],[:,9]]. 

看起来上面的行应该足够了,但我想还不够。我环顾四周,但找不到任何关于这个特定场景的语法。

This is an easy question but say I have an MxN matrix. All I want to do is extract specific columns and store them in another numpy array but I get invalid syntax errors.
Here is the code:

extractedData = data[[:,1],[:,9]]. 

It seems like the above line should suffice but I guess not. I looked around but couldn't find anything syntax wise regarding this specific scenario.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

贩梦商人 2024-12-26 19:55:55

我假设您想要列 19

要一次选择多列,请使用

X = data[:, [1, 9]]

要一次选择一列,请使用 使用

x, y = data[:, 1], data[:, 9]

名称:

data[:, ['Column Name1','Column Name2']]

您可以从 data.dtype.names... 获取名称

I assume you wanted columns 1 and 9?

To select multiple columns at once, use

X = data[:, [1, 9]]

To select one at a time, use

x, y = data[:, 1], data[:, 9]

With names:

data[:, ['Column Name1','Column Name2']]

You can get the names from data.dtype.names

平生欢 2024-12-26 19:55:55

假设您想要使用该代码片段获取第 1 列和第 9 列,它应该是:

extractedData = data[:,[1,9]]

Assuming you want to get columns 1 and 9 with that code snippet, it should be:

extractedData = data[:,[1,9]]
最佳男配角 2024-12-26 19:55:55

如果您只想提取某些列:

idx_IN_columns = [1, 9]
extractedData = data[:,idx_IN_columns]

如果您想排除特定列:

idx_OUT_columns = [1, 9]
idx_IN_columns = [i for i in xrange(np.shape(data)[1]) if i not in idx_OUT_columns]
extractedData = data[:,idx_IN_columns]

if you want to extract only some columns:

idx_IN_columns = [1, 9]
extractedData = data[:,idx_IN_columns]

if you want to exclude specific columns:

idx_OUT_columns = [1, 9]
idx_IN_columns = [i for i in xrange(np.shape(data)[1]) if i not in idx_OUT_columns]
extractedData = data[:,idx_IN_columns]
天暗了我发光 2024-12-26 19:55:55

只是:

>>> m = np.matrix(np.random.random((5, 5)))
>>> m
matrix([[0.91074101, 0.65999332, 0.69774588, 0.007355  , 0.33025395],
        [0.11078742, 0.67463754, 0.43158254, 0.95367876, 0.85926405],
        [0.98665185, 0.86431513, 0.12153138, 0.73006437, 0.13404811],
        [0.24602225, 0.66139215, 0.08400288, 0.56769924, 0.47974697],
        [0.25345299, 0.76385882, 0.11002419, 0.2509888 , 0.06312359]])
>>> m[:,[1, 2]]
matrix([[0.65999332, 0.69774588],
        [0.67463754, 0.43158254],
        [0.86431513, 0.12153138],
        [0.66139215, 0.08400288],
        [0.76385882, 0.11002419]])

列不需要按顺序排列:

>>> m[:,[2, 1, 3]]
matrix([[0.69774588, 0.65999332, 0.007355  ],
        [0.43158254, 0.67463754, 0.95367876],
        [0.12153138, 0.86431513, 0.73006437],
        [0.08400288, 0.66139215, 0.56769924],
        [0.11002419, 0.76385882, 0.2509888 ]])

Just:

>>> m = np.matrix(np.random.random((5, 5)))
>>> m
matrix([[0.91074101, 0.65999332, 0.69774588, 0.007355  , 0.33025395],
        [0.11078742, 0.67463754, 0.43158254, 0.95367876, 0.85926405],
        [0.98665185, 0.86431513, 0.12153138, 0.73006437, 0.13404811],
        [0.24602225, 0.66139215, 0.08400288, 0.56769924, 0.47974697],
        [0.25345299, 0.76385882, 0.11002419, 0.2509888 , 0.06312359]])
>>> m[:,[1, 2]]
matrix([[0.65999332, 0.69774588],
        [0.67463754, 0.43158254],
        [0.86431513, 0.12153138],
        [0.66139215, 0.08400288],
        [0.76385882, 0.11002419]])

The columns need not to be in order:

>>> m[:,[2, 1, 3]]
matrix([[0.69774588, 0.65999332, 0.007355  ],
        [0.43158254, 0.67463754, 0.95367876],
        [0.12153138, 0.86431513, 0.73006437],
        [0.08400288, 0.66139215, 0.56769924],
        [0.11002419, 0.76385882, 0.2509888 ]])
过期情话 2024-12-26 19:55:55

我想指出的一件事是,如果您要提取的列数为 1,则生成的矩阵不会是您可能期望的 Mx1 矩阵,而是一个包含以下元素的数组您提取的列。

要将其转换为矩阵,应在结果数组上使用reshape(M,1)方法。

One thing I would like to point out is, if the number of columns you want to extract is 1 the resulting matrix would not be a Mx1 Matrix as you might expect but instead an array containing the elements of the column you extracted.

To convert it to Matrix the reshape(M,1) method should be used on the resulting array.

爱你不解释 2024-12-26 19:55:55

使用这样的列表从 ND 数组中选择列时还应该注意一件事:

data[:,:,[1,9]]

如果要删除维度(例如,仅选择一行),则生成的数组将是(出于某种原因)排列。所以:

print data.shape            # gives [10,20,30]
selection = data[1,:,[1,9]]
print selection.shape       # gives [2,20] instead of [20,2]!!

One more thing you should pay attention to when selecting columns from N-D array using a list like this:

data[:,:,[1,9]]

If you are removing a dimension (by selecting only one row, for example), the resulting array will be (for some reason) permuted. So:

print data.shape            # gives [10,20,30]
selection = data[1,:,[1,9]]
print selection.shape       # gives [2,20] instead of [20,2]!!
山川志 2024-12-26 19:55:55

您可以使用以下内容:

extracted_data = data.ix[:,['Column1','Column2']]

You can use the following:

extracted_data = data.ix[:,['Column1','Column2']]
错々过的事 2024-12-26 19:55:55

这是另一个示例,当您需要数据中的特定列和范围时,有些人可能会觉得有用,这需要几秒钟才能在数百万行上运行,您可以通过添加其他列表来添加更多列(例如,columns = .. . + [1] + [5] 等:

columns = [0] + [x for x in range(4,62-3)]
print(columns)
selected_data = train_data[:,columns]

Here is yet another example that some may find useful when you need specific columns and ranges from your data, this takes a few seconds to run on millions of rows and you can just add more columns by adding additional lists (e.g., columns = ... + [1] + [5], etc.:

columns = [0] + [x for x in range(4,62-3)]
print(columns)
selected_data = train_data[:,columns]
向日葵 2024-12-26 19:55:55

我认为这里的解决方案不再适用于 python 版本的更新,使用新的 python 函数来实现它的一种方法是:

extracted_data = data[['Column Name1','Column Name2']].to_numpy()

这会给你想要的结果。

您可以在这里找到文档: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

I think the solution here is not working with an update of the python version anymore, one way to do it with a new python function for it is:

extracted_data = data[['Column Name1','Column Name2']].to_numpy()

which gives you the desired outcome.

The documentation you can find here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

也只是曾经 2024-12-26 19:55:55

我无法编辑所选答案,因此我添加一个答案来澄清使用整数进行索引似乎返回视图(而不是副本),而使用列表返回副本

>>> x = np.zeros(shape=[2, 3])
>>> y = x[:, [0, 1]]
>>> z1, z2 = x[:, 0], x[:, 1]

>>> y[0, 0] = 1
>>> print(y)
[[1. 0.]
 [0. 0.]]
>>> print(x)
[[0. 0. 0.]
 [0. 0. 0.]]

>>> z1[0] = 2
>>> print(z1)
[2. 0.]
>>> print(x)
[[2. 0. 0.]
 [0. 0. 0.]]

I could not edit the chosen answer so I'm adding an answer to clarify that using an integer to index seems to be returning a view (not a copy) while using a list returns a copy

>>> x = np.zeros(shape=[2, 3])
>>> y = x[:, [0, 1]]
>>> z1, z2 = x[:, 0], x[:, 1]

>>> y[0, 0] = 1
>>> print(y)
[[1. 0.]
 [0. 0.]]
>>> print(x)
[[0. 0. 0.]
 [0. 0. 0.]]

>>> z1[0] = 2
>>> print(z1)
[2. 0.]
>>> print(x)
[[2. 0. 0.]
 [0. 0. 0.]]
凹づ凸ル 2024-12-26 19:55:55

您还可以使用 extractData=data([:,1],[:,9])

you can also use extractedData=data([:,1],[:,9])

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文