数据清洁:通过删除从字符串数组中提取数字'。'和&#x27 ;;'人物
我有一个大数据集。我试图清洁它。 数据看起来像这样:
data= np.array(['0,51\n0,64\n0,76\n0,84\n1,00', 1.36]) #...
我的目标是提取原始数字:
numbers= [51, 64, 76, 84, 100, 136]
我尝试的方法有效,但我认为这不是那么优雅。有更好的方法吗?
import numpy as np
import re
clean= np.array([])
for i in data:
i = str(i)
if ',' in i:
without= i.replace(',', '')
clean= np.append(clean, without)
elif '.' in i:
without= i.replace('.', '')
clean= np.append(clean, without)
#detect all numbers
numbers= np.array([])
for i in clean:
if type(i) == np.str_:
a= re.findall(r'\b\d+\b', i)
numbers= np.append(numbers, a)
I have a big data set what is messed up. I tried to clean it.
The data looks like this:
data= np.array(['0,51\n0,64\n0,76\n0,84\n1,00', 1.36]) #...
My goal is to extract the raw numbers:
numbers= [51, 64, 76, 84, 100, 136]
What I tried worked, but I think it is not that elegant. Is there a better way to do it?
import numpy as np
import re
clean= np.array([])
for i in data:
i = str(i)
if ',' in i:
without= i.replace(',', '')
clean= np.append(clean, without)
elif '.' in i:
without= i.replace('.', '')
clean= np.append(clean, without)
#detect all numbers
numbers= np.array([])
for i in clean:
if type(i) == np.str_:
a= re.findall(r'\b\d+\b', i)
numbers= np.append(numbers, a)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通常,您绝对不应在循环中使用
np.append
,因为它每次都会重新创建新数组,从而导致二次复杂性效率低下。除此之外,您还可以使用以下单线来解决您的问题:
想法是
,
用。
替换基于此的正确整数。您可以使用np.fromiter(结果,dtype = int)
将其转换为numpy数组。Generally, you should never use
np.append
in a loop since it recreate a new array every time resulting in an inefficient quadratic complexity.Besides this, you can use the following one-liner to solve your problem:
The idea is to replace
,
by a.
and then parse the string as a float so to produce the right integer based on this. You can convert it to a numpy array withnp.fromiter(result, dtype=int)
.