使用 Python 解析 CSV/制表符分隔的 txt 文件

发布于 2024-12-11 16:55:07 字数 1254 浏览 0 评论 0原文

我目前有一个 CSV 文件,在 Excel 中打开时,该文件总共有 5 列。只有 A 列和 C 列对我来说有意义,其余列中的数据无关紧要。

从第 8 行开始,然后以 7 的倍数开始工作(即第 8、15、22、29、36 行等...),我希望使用 Python 2.7 使用这些字段中的信息创建一个字典。 A 列中的数据将是密钥(6 位整数),C 列中的数据将是密钥的相应值。我试图在下面强调这一点,但格式不是最好的: -

    A        B      C          D
1                           CDCDCDCD  
2                           VDDBDDB
3
4
5
6
7  DDEFEEF                   FEFEFEFE
8  123456         JONES
9
10
11
12
13
14
15 293849         SMITH

根据上面的内容,我希望从 A7 (DDEFEEF) 中提取值作为我字典中的键,“FEFEFEFE”是相应的数据,然后将另一个条目添加到我的字典中,跳转到第 15 行,其中“2938495”是我的键,“Smith”是相应的值。

有什么建议吗?源文件是一个 .txt 文件,其中条目以制表符分隔。 谢谢

澄清:

只是为了澄清,到目前为止,我已经尝试了以下操作:-

import csv

mydict = {:}
f = open("myfile", 'rt')
reader = csv.reader(f)
    for row in reader:
        print row

上面只是一次一行地打印出所有内容。我确实尝试了“for row(7) in reader”,但这返回了错误。然后我研究了它并尝试了以下方法,但它也不起作用:

import csv
from itertools import islice

entries = csv.reader(open("myfile", 'rb'))
mydict = {'key' : 'value'}

for i in xrange(6):
    mydict['i(0)] = 'I(2)    # integers representing columns
    range = islice(entries,6)
    for entry in range:
        mydict[entries(0) = entries(2)] # integers representing columns

I currently have a CSV file which, when opened in Excel, has a total of 5 columns. Only columns A and C are of any significance to me and the data in the remaining columns is irrelevant.

Starting on line 8 and then working in multiples of 7 (ie. lines 8, 15, 22, 29, 36 etc...), I am looking to create a dictionary with Python 2.7 with the information from these fields. The data in column A will be the key (a 6-digit integer) and the data in column C being the respective value for the key. I've tried to highlight this below but the formatting isn't the best:-

    A        B      C          D
1                           CDCDCDCD  
2                           VDDBDDB
3
4
5
6
7  DDEFEEF                   FEFEFEFE
8  123456         JONES
9
10
11
12
13
14
15 293849         SMITH

As per the above, I am looking to extract the value from A7 (DDEFEEF) as a key in my dictionary and "FEFEFEFE" being the respective data and then add another entry to my dictionary, jumping to line 15 with "2938495" being my key and "Smith" being the respective value.

Any suggestions? The source file is a .txt file with entries being tab-delimited.
Thanks

Clarification:

Just to clarify, so far, I have tried the below:-

import csv

mydict = {:}
f = open("myfile", 'rt')
reader = csv.reader(f)
    for row in reader:
        print row

The above simply prints out all content though a row at a time. I did try "for row(7) in reader" but this returned an error. I then researched it and had a go at the below but it didn't work neither:

import csv
from itertools import islice

entries = csv.reader(open("myfile", 'rb'))
mydict = {'key' : 'value'}

for i in xrange(6):
    mydict['i(0)] = 'I(2)    # integers representing columns
    range = islice(entries,6)
    for entry in range:
        mydict[entries(0) = entries(2)] # integers representing columns

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

水染的天色ゝ 2024-12-18 16:55:07

首先将文本转换为列表的列表。这将处理解析部分:

lol = list(csv.reader(open('text.txt', 'rb'), delimiter='\t'))

其余部分可以通过索引查找来完成:

d = dict()
key = lol[6][0]      # cell A7
value = lol[6][3]    # cell D7
d[key] = value       # add the entry to the dictionary
 ...

Start by turning the text into a list of lists. That will take care of the parsing part:

lol = list(csv.reader(open('text.txt', 'rb'), delimiter='\t'))

The rest can be done with indexed lookups:

d = dict()
key = lol[6][0]      # cell A7
value = lol[6][3]    # cell D7
d[key] = value       # add the entry to the dictionary
 ...
樱花落人离去 2024-12-18 16:55:07

尽管所提供的其他解决方案没有任何问题,但您可以通过使用 python 优秀的库 pandas 来简化并极大地升级您的解决方案。

Pandas 是一个用 Python 处理数据的库,受到许多数据科学家的青睐。

Pandas 有一个简化的 CSV 接口来读取和解析文件,可用于返回字典列表,每个字典包含文件的一行。键将是列名称,值将是每个单元格中的值。

在你的情况下:

    import pandas

    def create_dictionary(filename):
        my_data = pandas.DataFrame.from_csv(filename, sep='\t', index_col=False)
        # Here you can delete the dataframe columns you don't want!
        del my_data['B']
        del my_data['D']
        # ...
        # Now you transform the DataFrame to a list of dictionaries
        list_of_dicts = [item for item in my_data.T.to_dict().values()]
        return list_of_dicts

# Usage:
x = create_dictionary("myfile.csv")

Although there is nothing wrong with the other solutions presented, you could simplify and greatly escalate your solutions by using python's excellent library pandas.

Pandas is a library for handling data in Python, preferred by many Data Scientists.

Pandas has a simplified CSV interface to read and parse files, that can be used to return a list of dictionaries, each containing a single line of the file. The keys will be the column names, and the values will be the ones in each cell.

In your case:

    import pandas

    def create_dictionary(filename):
        my_data = pandas.DataFrame.from_csv(filename, sep='\t', index_col=False)
        # Here you can delete the dataframe columns you don't want!
        del my_data['B']
        del my_data['D']
        # ...
        # Now you transform the DataFrame to a list of dictionaries
        list_of_dicts = [item for item in my_data.T.to_dict().values()]
        return list_of_dicts

# Usage:
x = create_dictionary("myfile.csv")
明月松间行 2024-12-18 16:55:07

如果文件很大,您可能不希望立即将其完全加载到内存中。这种方法避免了这种情况。 (当然,用它制作一个字典仍然会占用一些 RAM,但它保证比原始文件小。)

my_dict = {}
for i, line in enumerate(file):
    if (i - 8) % 7:
        continue
    k, v = line.split("\t")[:3:2]
    my_dict[k] = v

编辑:不确定我之前从哪里得到的 extend 。我的意思是更新

If the file is large, you may not want to load it entirely into memory at once. This approach avoids that. (Of course, making a dict out of it could still take up some RAM, but it's guaranteed to be smaller than the original file.)

my_dict = {}
for i, line in enumerate(file):
    if (i - 8) % 7:
        continue
    k, v = line.split("\t")[:3:2]
    my_dict[k] = v

Edit: Not sure where I got extend from before. I meant update

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文