numpy 重新数组可变长度的字符串
是否可以在事先不知道字符串长度的情况下初始化一个保存字符串的 numpy 重新数组?
作为一个(人为的)例子:
mydf = np.empty( (numrows,), dtype=[ ('file_name','STRING'), ('file_size_MB',float) ] )
问题是我在用信息填充它之前构建了我的记录,并且我不一定提前知道 file_name
的最大长度。
我所有的尝试都会导致字符串字段被截断:(
>>> mydf = np.empty( (2,), dtype=[('file_name',str),('file_size_mb',float)] )
>>> mydf['file_name'][0]='foobarasdf.tif'
>>> mydf['file_name'][1]='arghtidlsarbda.jpg'
>>> mydf
array([('', 6.9164002347457e-310), ('', 9.9413127e-317)],
dtype=[('file_name', 'S'), ('file_size_mb', '<f8')])
>>> mydf['file_name']
array(['f', 'a'],
dtype='|S1')
顺便说一句,为什么 mydf['file_name']
显示 'f' 和 'a' 而 mydf
显示 ' ' 和 ''?)
类似地,如果我用类型(例如)|S10
来初始化 file_name
那么事情就会被截断为长度 10。
我唯一可以提出的类似问题find 是这个,但这会计算出适当的字符串长度先验< /em> 因此与我的不太一样(因为我事先一无所知)。
除了使用(例如)|S9999999999999
(即一些荒谬的上限)初始化file_name
之外,还有其他选择吗?
Is it possible to initialise a numpy recarray that will hold strings, without knowing the length of the strings beforehand?
As a (contrived) example:
mydf = np.empty( (numrows,), dtype=[ ('file_name','STRING'), ('file_size_MB',float) ] )
The problem is that I'm constructing my recarray in advance of populating it with information, and I don't necessarily know the maximum length of file_name
in advance.
All my attempts result in the string field being truncated:
>>> mydf = np.empty( (2,), dtype=[('file_name',str),('file_size_mb',float)] )
>>> mydf['file_name'][0]='foobarasdf.tif'
>>> mydf['file_name'][1]='arghtidlsarbda.jpg'
>>> mydf
array([('', 6.9164002347457e-310), ('', 9.9413127e-317)],
dtype=[('file_name', 'S'), ('file_size_mb', '<f8')])
>>> mydf['file_name']
array(['f', 'a'],
dtype='|S1')
(As an aside, why does mydf['file_name']
show 'f' and 'a' whilst mydf
shows '' and ''?)
Similarly, if I initialise with type (say) |S10
for file_name
then things get truncated at length 10.
The only similar question I could find is this one, but this calculates the appropriate string length a priori and hence is not quite the same as mine (as I know nothing in advance).
Is there any alternative other than initalising the file_name
with (eg) |S9999999999999
(ie some ridiculous upper limit)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您始终可以使用
object
作为数据类型,而不是使用STRING
数据类型。这将允许将任何对象分配给数组元素,包括 Python 可变长度字符串。例如:具有可变长度元素是违反数组概念的精神的,但这已经是最接近的了。数组的思想是将元素存储在内存中明确定义且间隔规则的内存地址处,这禁止可变长度元素。通过将指向字符串的指针存储在数组中,可以规避这一限制。 (这基本上就是上面例子的作用。)
Instead of using the
STRING
dtype, one can always useobject
as dtype. That will allow any object to be assigned to an array element, including Python variable length strings. For example:It is a against the spirit of the array concept to have variable length elements, but this is as close as one can get. The idea of an array is that elements are stored in memory at well-defined and regularly spaced memory addresses, which prohibits variable length elements. By storing the pointers to a string in an array, one can circumvent this limitation. (This is basically what the above example does.)