如何使用 python 导入标题完整的 csv 文件，其中第一列是非数字

发布于 2024-09-13 19:16:54 字数 586 浏览 4 评论 0原文

这是对上一个问题的详细阐述，但随着我深入研究 python，我对 python 如何处理 csv 文件感到更加困惑。

我有一个 csv 文件，它必须保持这种状态（例如，无法将其转换为文本文件）。它相当于 5 行 x 11 列的数组或矩阵或向量。

我一直在尝试使用我在这里和其他地方（例如python.org）找到的各种方法读取csv，以便它保留列和行之间的关系，其中第一行和第一行列 = 非数字值。其余的都是浮点值，并且包含正浮点和负浮点的混合。

我想要做的是导入 csv 并在 python 中编译它，这样如果我要引用列标题，它将返回存储在行中的关联值。例如：

>>> workers, constant, age
>>> workers
    w0
    w1
    w2
    w3
    constant
    7.334
    5.235
    3.225
    0
    age
    -1.406
    -4.936
    -1.478
    0

等等...

我正在寻找处理这种数据结构的技术。我对 python 很陌生。

原文

This is an elaboration of a previous question, but as I delve deeper into python, I just get more confused as to how python handles csv files.

I have a csv file, and it must stay that way (e.g., cannot convert it to text file). It is the equivalent of a 5 rows by 11 columns array or matrix, or vector.

I have been attempting to read in the csv using various methods I have found here and other places (e.g. python.org) so that it preserves the relationship between columns and rows, where the first row and the first column = non-numerical values. The rest are float values, and contain a mixture of positive and negative floats.

What I wish to do is import the csv and compile it in python so that if I were to reference a column header, it would return its associated values stored in the rows. For example:

>>> workers, constant, age
>>> workers
    w0
    w1
    w2
    w3
    constant
    7.334
    5.235
    3.225
    0
    age
    -1.406
    -4.936
    -1.478
    0

And so forth...

I am looking for techniques for handling this kind of data structure. I am very new to python.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寂寞笑我太脆弱 2024-09-20 19:16:54

对于 Python 3

删除 rb 参数并使用 r 或不传递参数（默认读取模式）。

with open( <path-to-file>, 'r' ) as theFile:
    reader = csv.DictReader(theFile)
    for line in reader:
        # line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
        # e.g. print( line[ 'workers' ] ) yields 'w0'
        print(line)

对于 Python 2

import csv
with open( <path-to-file>, "rb" ) as theFile:
    reader = csv.DictReader( theFile )
    for line in reader:
        # line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
        # e.g. print( line[ 'workers' ] ) yields 'w0'

Python 有一个强大的内置 CSV 处理程序。事实上，大多数东西已经内置在标准库中。

For Python 3

Remove the rb argument and use either r or don't pass argument (default read mode).

with open( <path-to-file>, 'r' ) as theFile:
    reader = csv.DictReader(theFile)
    for line in reader:
        # line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
        # e.g. print( line[ 'workers' ] ) yields 'w0'
        print(line)

For Python 2

import csv
with open( <path-to-file>, "rb" ) as theFile:
    reader = csv.DictReader( theFile )
    for line in reader:
        # line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
        # e.g. print( line[ 'workers' ] ) yields 'w0'

Python has a powerful built-in CSV handler. In fact, most things are already built in to the standard library.

回复收藏 0 原文

相思故 2024-09-20 19:16:54

Python 的 csv 模块按行处理数据，这是查看此类数据的常用方式。您似乎想要一种按列的方法。这是一种方法。

假设您的文件名为 myclone.csv 并包含

workers,constant,age
w0,7.334,-1.406
w1,5.235,-4.936
w2,3.2225,-1.478
w3,0,0

此代码，应该会给您一个或两个想法：

>>> import csv
>>> f = open('myclone.csv', 'rb')
>>> reader = csv.reader(f)
>>> headers = next(reader, None)
>>> headers
['workers', 'constant', 'age']
>>> column = {}
>>> for h in headers:
...    column[h] = []
...
>>> column
{'workers': [], 'constant': [], 'age': []}
>>> for row in reader:
...   for h, v in zip(headers, row):
...     column[h].append(v)
...
>>> column
{'workers': ['w0', 'w1', 'w2', 'w3'], 'constant': ['7.334', '5.235', '3.2225', '0'], 'age': ['-1.406', '-4.936', '-1.478', '0']}
>>> column['workers']
['w0', 'w1', 'w2', 'w3']
>>> column['constant']
['7.334', '5.235', '3.2225', '0']
>>> column['age']
['-1.406', '-4.936', '-1.478', '0']
>>>

要将数值转换为浮点数，请

converters = [str.strip] + [float] * (len(headers) - 1)

在前面添加此内容，并对

for h, v, conv in zip(headers, row, converters):
  column[h].append(conv(v))

每一行执行此操作，而不是类似的上面两行。

Python's csv module handles data row-wise, which is the usual way of looking at such data. You seem to want a column-wise approach. Here's one way of doing it.

Assuming your file is named myclone.csv and contains

workers,constant,age
w0,7.334,-1.406
w1,5.235,-4.936
w2,3.2225,-1.478
w3,0,0

this code should give you an idea or two:

>>> import csv
>>> f = open('myclone.csv', 'rb')
>>> reader = csv.reader(f)
>>> headers = next(reader, None)
>>> headers
['workers', 'constant', 'age']
>>> column = {}
>>> for h in headers:
...    column[h] = []
...
>>> column
{'workers': [], 'constant': [], 'age': []}
>>> for row in reader:
...   for h, v in zip(headers, row):
...     column[h].append(v)
...
>>> column
{'workers': ['w0', 'w1', 'w2', 'w3'], 'constant': ['7.334', '5.235', '3.2225', '0'], 'age': ['-1.406', '-4.936', '-1.478', '0']}
>>> column['workers']
['w0', 'w1', 'w2', 'w3']
>>> column['constant']
['7.334', '5.235', '3.2225', '0']
>>> column['age']
['-1.406', '-4.936', '-1.478', '0']
>>>

To get your numeric values into floats, add this

converters = [str.strip] + [float] * (len(headers) - 1)

up front, and do this

for h, v, conv in zip(headers, row, converters):
  column[h].append(conv(v))

for each row instead of the similar two lines above.

回复收藏 0 原文

心安伴我暖 2024-09-20 19:16:54

您可以使用 pandas 库并引用行和列，如下所示：

import pandas as pd

input = pd.read_csv("path_to_file");

#for accessing ith row:
input.iloc[i]

#for accessing column named X
input.X

#for accessing ith row and column named X
input.iloc[i].X

You can use pandas library and reference the rows and columns like this:

import pandas as pd

input = pd.read_csv("path_to_file");

#for accessing ith row:
input.iloc[i]

#for accessing column named X
input.X

#for accessing ith row and column named X
input.iloc[i].X

回复收藏 0 原文

静若繁花 2024-09-20 19:16:54

我最近不得不为相当大的数据文件编写这个方法，我发现使用列表理解效果很好

      import csv
      with open("file.csv",'r') as f:
        reader = csv.reader(f)
        headers = next(reader)
        data = [{h:x for (h,x) in zip(headers,row)} for row in reader]
        #data now contains a list of the rows, with each row containing a dictionary 
        #  in the shape {header: value}. If a row terminates early (e.g. there are 12 columns, 
        #  it only has 11 values) the dictionary will not contain a header value for that row.

I recently had to write this method for quite a large datafile, and i found using list comprehension worked quite well

      import csv
      with open("file.csv",'r') as f:
        reader = csv.reader(f)
        headers = next(reader)
        data = [{h:x for (h,x) in zip(headers,row)} for row in reader]
        #data now contains a list of the rows, with each row containing a dictionary 
        #  in the shape {header: value}. If a row terminates early (e.g. there are 12 columns, 
        #  it only has 11 values) the dictionary will not contain a header value for that row.

回复收藏 0 原文

~没有更多了~