使用 Python 解析 CSV/制表符分隔的 txt 文件
我目前有一个 CSV 文件,在 Excel 中打开时,该文件总共有 5 列。只有 A 列和 C 列对我来说有意义,其余列中的数据无关紧要。
从第 8 行开始,然后以 7 的倍数开始工作(即第 8、15、22、29、36 行等...),我希望使用 Python 2.7 使用这些字段中的信息创建一个字典。 A 列中的数据将是密钥(6 位整数),C 列中的数据将是密钥的相应值。我试图在下面强调这一点,但格式不是最好的: -
A B C D
1 CDCDCDCD
2 VDDBDDB
3
4
5
6
7 DDEFEEF FEFEFEFE
8 123456 JONES
9
10
11
12
13
14
15 293849 SMITH
根据上面的内容,我希望从 A7 (DDEFEEF) 中提取值作为我字典中的键,“FEFEFEFE”是相应的数据,然后将另一个条目添加到我的字典中,跳转到第 15 行,其中“2938495”是我的键,“Smith”是相应的值。
有什么建议吗?源文件是一个 .txt 文件,其中条目以制表符分隔。 谢谢
澄清:
只是为了澄清,到目前为止,我已经尝试了以下操作:-
import csv
mydict = {:}
f = open("myfile", 'rt')
reader = csv.reader(f)
for row in reader:
print row
上面只是一次一行地打印出所有内容。我确实尝试了“for row(7) in reader”,但这返回了错误。然后我研究了它并尝试了以下方法,但它也不起作用:
import csv
from itertools import islice
entries = csv.reader(open("myfile", 'rb'))
mydict = {'key' : 'value'}
for i in xrange(6):
mydict['i(0)] = 'I(2) # integers representing columns
range = islice(entries,6)
for entry in range:
mydict[entries(0) = entries(2)] # integers representing columns
I currently have a CSV file which, when opened in Excel, has a total of 5 columns. Only columns A and C are of any significance to me and the data in the remaining columns is irrelevant.
Starting on line 8 and then working in multiples of 7 (ie. lines 8, 15, 22, 29, 36 etc...), I am looking to create a dictionary with Python 2.7 with the information from these fields. The data in column A will be the key (a 6-digit integer) and the data in column C being the respective value for the key. I've tried to highlight this below but the formatting isn't the best:-
A B C D
1 CDCDCDCD
2 VDDBDDB
3
4
5
6
7 DDEFEEF FEFEFEFE
8 123456 JONES
9
10
11
12
13
14
15 293849 SMITH
As per the above, I am looking to extract the value from A7 (DDEFEEF) as a key in my dictionary and "FEFEFEFE" being the respective data and then add another entry to my dictionary, jumping to line 15 with "2938495" being my key and "Smith" being the respective value.
Any suggestions? The source file is a .txt file with entries being tab-delimited.
Thanks
Clarification:
Just to clarify, so far, I have tried the below:-
import csv
mydict = {:}
f = open("myfile", 'rt')
reader = csv.reader(f)
for row in reader:
print row
The above simply prints out all content though a row at a time. I did try "for row(7) in reader" but this returned an error. I then researched it and had a go at the below but it didn't work neither:
import csv
from itertools import islice
entries = csv.reader(open("myfile", 'rb'))
mydict = {'key' : 'value'}
for i in xrange(6):
mydict['i(0)] = 'I(2) # integers representing columns
range = islice(entries,6)
for entry in range:
mydict[entries(0) = entries(2)] # integers representing columns
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先将文本转换为列表的列表。这将处理解析部分:
其余部分可以通过索引查找来完成:
Start by turning the text into a list of lists. That will take care of the parsing part:
The rest can be done with indexed lookups:
尽管所提供的其他解决方案没有任何问题,但您可以通过使用 python 优秀的库 pandas 来简化并极大地升级您的解决方案。
Pandas 是一个用 Python 处理数据的库,受到许多数据科学家的青睐。
Pandas 有一个简化的 CSV 接口来读取和解析文件,可用于返回字典列表,每个字典包含文件的一行。键将是列名称,值将是每个单元格中的值。
在你的情况下:
Although there is nothing wrong with the other solutions presented, you could simplify and greatly escalate your solutions by using python's excellent library pandas.
Pandas is a library for handling data in Python, preferred by many Data Scientists.
Pandas has a simplified CSV interface to read and parse files, that can be used to return a list of dictionaries, each containing a single line of the file. The keys will be the column names, and the values will be the ones in each cell.
In your case:
如果文件很大,您可能不希望立即将其完全加载到内存中。这种方法避免了这种情况。 (当然,用它制作一个字典仍然会占用一些 RAM,但它保证比原始文件小。)
编辑:不确定我之前从哪里得到的
extend
。我的意思是更新
If the file is large, you may not want to load it entirely into memory at once. This approach avoids that. (Of course, making a dict out of it could still take up some RAM, but it's guaranteed to be smaller than the original file.)
Edit: Not sure where I got
extend
from before. I meantupdate