通过Python中的迭代在numpy/scipy中构建数组？

发布于 2024-08-28 18:00:22 字数 294 浏览 5 评论 0原文

通常，我通过迭代一些数据来构建数组，例如：

my_array = []
for n in range(1000):
  # do operation, get value 
  my_array.append(value)
# cast to array
my_array = array(my_array)

我发现我必须首先构建一个列表，然后将其（使用“数组”）转换为数组。有办法解决这些问题吗？所有这些转换调用都使代码变得混乱......我如何迭代地构建“my_array”，让它从一开始就是一个数组？

原文

Often, I am building an array by iterating through some data, e.g.:

my_array = []
for n in range(1000):
  # do operation, get value 
  my_array.append(value)
# cast to array
my_array = array(my_array)

I find that I have to first build a list and then cast it (using "array") to an array. Is there a way around these? All these casting calls clutter the code... how can I iteratively build up "my_array", with it being an array from the start?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寻找我们的幸福 2024-09-04 18:00:22

NumPy 提供了一个“fromiter”方法：

def myfunc(n):
    for i in range(n):
        yield i**2


np.fromiter(myfunc(5), dtype=int)

它产生

array([ 0,  1,  4,  9, 16])

NumPy provides a 'fromiter' method:

def myfunc(n):
    for i in range(n):
        yield i**2


np.fromiter(myfunc(5), dtype=int)

which yields

array([ 0,  1,  4,  9, 16])

回复收藏 0 原文

清晨说晚安 2024-09-04 18:00:22

推荐的方法是在循环之前进行预分配，并使用切片和索引来插入

my_array = numpy.zeros(1,1000)
for i in xrange(1000):
    #for 1D array
    my_array[i] = functionToGetValue(i)
    #OR to fill an entire row
    my_array[i:] = functionToGetValue(i)
    #or to fill an entire column
    my_array[:,i] = functionToGetValue(i)

numpy 确实提供了 array.resize() 方法，但这会慢得多由于在循环内重新分配内存的成本。如果您必须具有灵活性，那么恐怕唯一的方法就是从列表创建数组。

编辑：如果您担心为数据分配太多内存，我会使用上面的方法来过度分配，然后当循环完成时，使用 array 删除数组中未使用的位.resize()。这将比在循环内不断重新分配数组快远、远。

编辑：响应@user248237的评论，假设您知道数组的任何一维（为了简单起见）：

my_array = numpy.array(10000, SOMECONSTANT)

for i in xrange(someVariable):
    if i >= my_array.shape[0]:
        my_array.resize((my_array.shape[0]*2, SOMECONSTANT))

    my_array[i:] = someFunction()

#lop off extra bits with resize() here

一般原则是“分配比您认为需要的更多的空间，如果情况发生变化，请调整数组大小几次尽可能”。将大小加倍可能会被认为是过度的，但实际上这是其他语言的几个标准库中的几个数据结构所使用的方法（例如，java.util.Vector默认情况下这样做。我认为 C++ 中的 std::vector 的几个实现也可以做到这一点）。

The recommended way to do this is to preallocate before the loop and use slicing and indexing to insert

my_array = numpy.zeros(1,1000)
for i in xrange(1000):
    #for 1D array
    my_array[i] = functionToGetValue(i)
    #OR to fill an entire row
    my_array[i:] = functionToGetValue(i)
    #or to fill an entire column
    my_array[:,i] = functionToGetValue(i)

numpy does provide an array.resize() method, but this will be far slower due to the cost of reallocating memory inside a loop. If you must have flexibility, then I'm afraid the only way is to create an array from a list.

EDIT: If you are worried that you're allocating too much memory for your data, I'd use the method above to over-allocate and then when the loop is done, lop off the unused bits of the array using array.resize(). This will be far, far faster than constantly reallocating the array inside the loop.

EDIT: In response to @user248237's comment, assuming you know any one dimension of the array (for simplicity's sake):

my_array = numpy.array(10000, SOMECONSTANT)

for i in xrange(someVariable):
    if i >= my_array.shape[0]:
        my_array.resize((my_array.shape[0]*2, SOMECONSTANT))

    my_array[i:] = someFunction()

#lop off extra bits with resize() here

The general principle is "allocate more than you think you'll need, and if things change, resize the array as few times as possible". Doubling the size could be thought of as excessive, but in fact this is the method used by several data structures in several standard libraries in other languages (java.util.Vector does this by default for example. I think several implementations of std::vector in C++ do this as well).

回复收藏 0 原文

牵你手 2024-09-04 18:00:22

使用 list.append() 构建数组似乎比 Numpy 数组的任何类型的动态调整大小要快得多：

import numpy as np
import timeit

class ndarray_builder:
  
  def __init__(self, capacity_step, column_count):
    self.capacity_step = capacity_step
    self.column_count = column_count
    self.arr = np.empty((self.capacity_step, self.column_count))
    self.row_pointer = 0

  def __enter__(self):
    return self

  def __exit__(self, type, value, traceback):
    self.close()
  
  def append(self, row):
    if self.row_pointer == self.arr.shape[0]:
      self.arr.resize((self.arr.shape[0] + self.capacity_step, self.column_count))
    self.arr[self.row_pointer] = row
    self.row_pointer += 1
  
  def close(self):
    self.arr.resize((self.row_pointer, self.column_count))

def with_builder():
  with ndarray_builder(1000, 2) as b:
    for i in range(10000):
      b.append((1, 2))
      b.append((3, 4))
  return b.arr

def without_builder():
  b = []
  for i in range(10000):
    b.append((1, 2))
    b.append((3, 4))
  return np.array(b)

print(f'without_builder: {timeit.timeit(without_builder, number=1000)}')
print(f'with_builder: {timeit.timeit(with_builder, number=1000)}')

without_builder: 3.4763141250000444
with_builder: 7.960973499999909

Building up the array using list.append() seems to be much faster than any kind of dynamic resizing of a Numpy array:

import numpy as np
import timeit

class ndarray_builder:
  
  def __init__(self, capacity_step, column_count):
    self.capacity_step = capacity_step
    self.column_count = column_count
    self.arr = np.empty((self.capacity_step, self.column_count))
    self.row_pointer = 0

  def __enter__(self):
    return self

  def __exit__(self, type, value, traceback):
    self.close()
  
  def append(self, row):
    if self.row_pointer == self.arr.shape[0]:
      self.arr.resize((self.arr.shape[0] + self.capacity_step, self.column_count))
    self.arr[self.row_pointer] = row
    self.row_pointer += 1
  
  def close(self):
    self.arr.resize((self.row_pointer, self.column_count))

def with_builder():
  with ndarray_builder(1000, 2) as b:
    for i in range(10000):
      b.append((1, 2))
      b.append((3, 4))
  return b.arr

def without_builder():
  b = []
  for i in range(10000):
    b.append((1, 2))
    b.append((3, 4))
  return np.array(b)

print(f'without_builder: {timeit.timeit(without_builder, number=1000)}')
print(f'with_builder: {timeit.timeit(with_builder, number=1000)}')

without_builder: 3.4763141250000444
with_builder: 7.960973499999909

回复收藏 0 原文

清秋悲枫 2024-09-04 18:00:22

如果我正确理解你的问题，这应该做你想要的：

# the array passed into your function
ax = NP.random.randint(10, 99, 20).reshape(5, 4)

# just define a function to operate on some data
fnx = lambda x : NP.sum(x)**2

# apply the function directly to the numpy array
new_row = NP.apply_along_axis(func1d=fnx, axis=0, arr=ax)

# 'append' the new values to the original array
new_row = new_row.reshape(1,4)
ax = NP.vstack((ax, new_row))

If i understand your question correctly, this should do what you want:

# the array passed into your function
ax = NP.random.randint(10, 99, 20).reshape(5, 4)

# just define a function to operate on some data
fnx = lambda x : NP.sum(x)**2

# apply the function directly to the numpy array
new_row = NP.apply_along_axis(func1d=fnx, axis=0, arr=ax)

# 'append' the new values to the original array
new_row = new_row.reshape(1,4)
ax = NP.vstack((ax, new_row))

回复收藏 0 原文

~没有更多了~