提取 numpy 数组中的特定列

发布于 2024-12-19 19:55:55 字数 197 浏览 0 评论 0原文

这是一个简单的问题，但假设我有一个 MxN 矩阵。我想做的就是提取特定列并将它们存储在另一个 numpy 数组中，但我收到无效的语法错误。这是代码：

extractedData = data[[:,1],[:,9]].

看起来上面的行应该足够了，但我想还不够。我环顾四周，但找不到任何关于这个特定场景的语法。

原文

This is an easy question but say I have an MxN matrix. All I want to do is extract specific columns and store them in another numpy array but I get invalid syntax errors.
Here is the code:

extractedData = data[[:,1],[:,9]].

It seems like the above line should suffice but I guess not. I looked around but couldn't find anything syntax wise regarding this specific scenario.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

贩梦商人 2024-12-26 19:55:55

我假设您想要列 1 和 9 ？

要一次选择多列，请使用

X = data[:, [1, 9]]

要一次选择一列，请使用使用

x, y = data[:, 1], data[:, 9]

名称：

data[:, ['Column Name1','Column Name2']]

您可以从 data.dtype.names... 获取名称

I assume you wanted columns 1 and 9?

To select multiple columns at once, use

X = data[:, [1, 9]]

To select one at a time, use

x, y = data[:, 1], data[:, 9]

With names:

data[:, ['Column Name1','Column Name2']]

You can get the names from data.dtype.names…

回复收藏 0 原文

平生欢 2024-12-26 19:55:55

假设您想要使用该代码片段获取第 1 列和第 9 列，它应该是：

extractedData = data[:,[1,9]]

Assuming you want to get columns 1 and 9 with that code snippet, it should be:

extractedData = data[:,[1,9]]

回复收藏 0 原文

最佳男配角 2024-12-26 19:55:55

如果您只想提取某些列：

idx_IN_columns = [1, 9]
extractedData = data[:,idx_IN_columns]

如果您想排除特定列：

idx_OUT_columns = [1, 9]
idx_IN_columns = [i for i in xrange(np.shape(data)[1]) if i not in idx_OUT_columns]
extractedData = data[:,idx_IN_columns]

if you want to extract only some columns:

idx_IN_columns = [1, 9]
extractedData = data[:,idx_IN_columns]

if you want to exclude specific columns:

idx_OUT_columns = [1, 9]
idx_IN_columns = [i for i in xrange(np.shape(data)[1]) if i not in idx_OUT_columns]
extractedData = data[:,idx_IN_columns]

回复收藏 0 原文

天暗了我发光 2024-12-26 19:55:55

只是：

>>> m = np.matrix(np.random.random((5, 5)))
>>> m
matrix([[0.91074101, 0.65999332, 0.69774588, 0.007355  , 0.33025395],
        [0.11078742, 0.67463754, 0.43158254, 0.95367876, 0.85926405],
        [0.98665185, 0.86431513, 0.12153138, 0.73006437, 0.13404811],
        [0.24602225, 0.66139215, 0.08400288, 0.56769924, 0.47974697],
        [0.25345299, 0.76385882, 0.11002419, 0.2509888 , 0.06312359]])
>>> m[:,[1, 2]]
matrix([[0.65999332, 0.69774588],
        [0.67463754, 0.43158254],
        [0.86431513, 0.12153138],
        [0.66139215, 0.08400288],
        [0.76385882, 0.11002419]])

列不需要按顺序排列：

>>> m[:,[2, 1, 3]]
matrix([[0.69774588, 0.65999332, 0.007355  ],
        [0.43158254, 0.67463754, 0.95367876],
        [0.12153138, 0.86431513, 0.73006437],
        [0.08400288, 0.66139215, 0.56769924],
        [0.11002419, 0.76385882, 0.2509888 ]])

Just:

>>> m = np.matrix(np.random.random((5, 5)))
>>> m
matrix([[0.91074101, 0.65999332, 0.69774588, 0.007355  , 0.33025395],
        [0.11078742, 0.67463754, 0.43158254, 0.95367876, 0.85926405],
        [0.98665185, 0.86431513, 0.12153138, 0.73006437, 0.13404811],
        [0.24602225, 0.66139215, 0.08400288, 0.56769924, 0.47974697],
        [0.25345299, 0.76385882, 0.11002419, 0.2509888 , 0.06312359]])
>>> m[:,[1, 2]]
matrix([[0.65999332, 0.69774588],
        [0.67463754, 0.43158254],
        [0.86431513, 0.12153138],
        [0.66139215, 0.08400288],
        [0.76385882, 0.11002419]])

The columns need not to be in order:

>>> m[:,[2, 1, 3]]
matrix([[0.69774588, 0.65999332, 0.007355  ],
        [0.43158254, 0.67463754, 0.95367876],
        [0.12153138, 0.86431513, 0.73006437],
        [0.08400288, 0.66139215, 0.56769924],
        [0.11002419, 0.76385882, 0.2509888 ]])

回复收藏 0 原文

过期情话 2024-12-26 19:55:55

我想指出的一件事是，如果您要提取的列数为 1，则生成的矩阵不会是您可能期望的 Mx1 矩阵，而是一个包含以下元素的数组您提取的列。

要将其转换为矩阵，应在结果数组上使用reshape(M,1)方法。

回复收藏 0 原文

爱你不解释 2024-12-26 19:55:55

使用这样的列表从 ND 数组中选择列时还应该注意一件事：

data[:,:,[1,9]]

如果要删除维度（例如，仅选择一行），则生成的数组将是（出于某种原因）排列。所以：

print data.shape            # gives [10,20,30]
selection = data[1,:,[1,9]]
print selection.shape       # gives [2,20] instead of [20,2]!!

One more thing you should pay attention to when selecting columns from N-D array using a list like this:

data[:,:,[1,9]]

If you are removing a dimension (by selecting only one row, for example), the resulting array will be (for some reason) permuted. So:

print data.shape            # gives [10,20,30]
selection = data[1,:,[1,9]]
print selection.shape       # gives [2,20] instead of [20,2]!!

回复收藏 0 原文

山川志 2024-12-26 19:55:55

您可以使用以下内容：

extracted_data = data.ix[:,['Column1','Column2']]

You can use the following:

extracted_data = data.ix[:,['Column1','Column2']]

回复收藏 0 原文

错々过的事 2024-12-26 19:55:55

这是另一个示例，当您需要数据中的特定列和范围时，有些人可能会觉得有用，这需要几秒钟才能在数百万行上运行，您可以通过添加其他列表来添加更多列（例如，columns = .. . + [1] + [5] 等：

columns = [0] + [x for x in range(4,62-3)]
print(columns)
selected_data = train_data[:,columns]

Here is yet another example that some may find useful when you need specific columns and ranges from your data, this takes a few seconds to run on millions of rows and you can just add more columns by adding additional lists (e.g., columns = ... + [1] + [5], etc.:

columns = [0] + [x for x in range(4,62-3)]
print(columns)
selected_data = train_data[:,columns]

回复收藏 0 原文

向日葵 2024-12-26 19:55:55

我认为这里的解决方案不再适用于 python 版本的更新，使用新的 python 函数来实现它的一种方法是：

extracted_data = data[['Column Name1','Column Name2']].to_numpy()

这会给你想要的结果。

您可以在这里找到文档： https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

I think the solution here is not working with an update of the python version anymore, one way to do it with a new python function for it is:

extracted_data = data[['Column Name1','Column Name2']].to_numpy()

which gives you the desired outcome.

The documentation you can find here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

回复收藏 0 原文

也只是曾经 2024-12-26 19:55:55

我无法编辑所选答案，因此我添加一个答案来澄清使用整数进行索引似乎返回视图（而不是副本），而使用列表返回副本

>>> x = np.zeros(shape=[2, 3])
>>> y = x[:, [0, 1]]
>>> z1, z2 = x[:, 0], x[:, 1]

>>> y[0, 0] = 1
>>> print(y)
[[1. 0.]
 [0. 0.]]
>>> print(x)
[[0. 0. 0.]
 [0. 0. 0.]]

>>> z1[0] = 2
>>> print(z1)
[2. 0.]
>>> print(x)
[[2. 0. 0.]
 [0. 0. 0.]]

I could not edit the chosen answer so I'm adding an answer to clarify that using an integer to index seems to be returning a view (not a copy) while using a list returns a copy

>>> x = np.zeros(shape=[2, 3])
>>> y = x[:, [0, 1]]
>>> z1, z2 = x[:, 0], x[:, 1]

>>> y[0, 0] = 1
>>> print(y)
[[1. 0.]
 [0. 0.]]
>>> print(x)
[[0. 0. 0.]
 [0. 0. 0.]]

>>> z1[0] = 2
>>> print(z1)
[2. 0.]
>>> print(x)
[[2. 0. 0.]
 [0. 0. 0.]]

回复收藏 0 原文