使用 python 对 CSV 中的特定字符进行切片
我的数据采用制表符分隔格式,如下所示:
0/0:23:-1.03,-7.94,-83.75:69.15 0/1:34:-1.01,-11.24,-127.51:99.00 0/0:74:-1.02,-23.28,-301.81:99.00
我只对每个条目的前 3 个字符(即 0/0 和 0/1)感兴趣。我认为最好的方法是在 numpy 中使用 match
和 genfromtxt
。据我所知,这个例子是这样的:
import re
csvfile = 'home/python/batch1.hg19.table'
from numpy import genfromtxt
data = genfromtxt(csvfile, delimiter="\t", dtype=None)
for i in data[1]:
m = re.match('[0-9]/[0-9]', i)
if m:
print m.group(0),
else:
print "NA",
这适用于数据的第一行,但我很难弄清楚如何为输入文件的每一行扩展它。
我应该将其设为一个函数并将其分别应用于每一行还是有更Pythonic的方法来做到这一点?
I have data in tab delimited format that looks like:
0/0:23:-1.03,-7.94,-83.75:69.15 0/1:34:-1.01,-11.24,-127.51:99.00 0/0:74:-1.02,-23.28,-301.81:99.00
I am only interested in the first 3 characters of each entry (ie 0/0 and 0/1). I figured the best way to do this would be to use match
and the genfromtxt
in numpy. This example is as far as I have gotten:
import re
csvfile = 'home/python/batch1.hg19.table'
from numpy import genfromtxt
data = genfromtxt(csvfile, delimiter="\t", dtype=None)
for i in data[1]:
m = re.match('[0-9]/[0-9]', i)
if m:
print m.group(0),
else:
print "NA",
This works for the first row of the data which but I am having a hard time figuring out how to expand it for every row of the input file.
Should I make it a function and apply it to each row seperately or is there a more pythonic way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
除非您真的想使用 NumPy,请尝试以下操作:
它只会迭代文件的每一行,使用制表符作为分隔符标记该行,然后打印您要查找的文本片段。
Unless you really want to use NumPy, try this:
Which just iterates through each line of the file, tokenizes the line using the tab character as the delimiter, then prints the slice of the text you are looking for.
当您想要加载数字数组时,Numpy 非常有用。
这里的格式对于 numpy 来说太复杂了,无法识别,所以你只得到一个字符串数组。这并没有真正发挥 numpy 的优势。
简单方法:
这是一种无需 numpy:即可实现
此数据的
Numpy is great when you want to load in an array of numbers.
The format you have here is too complicated for numpy to recognize, so you just get an array of strings. That's not really playing to numpy's strength.
Here's a simple way to do it without numpy:
yields
on this data:
无需正则表达式即可轻松解析整个文件:
Its pretty easy to parse the whole file without regular expressions:
好久没写python了。但我可能会这样写。
当然,如果您需要验证前三个字符,您仍然需要正则表达式。
I haven't written python in a while. But I would probably write it as such.
Of course if you need to validate the first three characters, you'll still need the regex.