使用 CSV 数组作为 Python 的输入

发布于 2024-12-15 10:50:09 字数 1620 浏览 1 评论 0原文

我收到了一个 csv 文件，其中包含 100 多个数组，我需要运行这些数组来运行我的数据分析代码，但我不知道如何在 Python 中读取这些数组。每个数组前面都有一行，该行仅包含一个整数，该整数给出数组中的行数，并以行“1234567890”结尾，用作行分隔符。

这是 csv 文件的片段：

7,,,,,,,  
1,-199.117,-105.4,-4.525,227.5415,225.2925647,-0.0198891,-2.6547518
2,133.0423,55.4573,-48.4174,155.16,144.1380093,-0.322813,0.3949385
3,129.8405,-16.9527,-303.3192,331.0847,130.9425427,-1.5644458,-0.1298311
4,-73.6373,71.4677,151.517,183.9712,102.616198,1.1678785,2.3711453
5,41.2654,10.4196,30.3773,54.0915,42.5605604,0.6351541,0.2473322
6,-20.3159,-32.4484,62.4574,74.8581,38.2836056,1.2022641,-2.1301853
7,-13.2904,22.029,-28.2895,38.5096,25.7276422,-0.9386666,2.1136489  
1234567890,,,,,,,  
5,,,,,,,  
1,-136.0755,-204.2787,-48.2127,259.2592,245.4512762,-0.1881526,-2.158425
2,220.5184,46.9388,-113.6448,265.1745,225.4586784,-0.4581388,0.2097266
3,-45.3132,169.6283,-49.2729,188.9506,175.576326,-0.2669358,1.8318334
4,-40.7141,34.7414,25.5414,60.9535,53.5219844,0.4465159,2.4351851
5,15.3863,-49.6703,17.1692,56.7635,51.9988166,0.312235,-1.2704018  
1234567890,,,,,,,  
6,,,,,,,   
1,-19.3083,295.4128,191.8666,360.3712,296.0431267,0.5935079,1.6360639
2,-169.8708,-128.3904,-1.0052,215.4187,212.9323449,-0.0046663,-2.4943822
3,15.4505,-209.6656,-178.0715,279.4077,210.2341118,-0.7536439,-1.4972381
4,172.4142,13.0485,-63.7912,192.2842,172.9072576,-0.3447988,0.0755371
5,16.7456,24.8768,-46.5025,55.9188,29.9878358,-1.1933262,0.9783247
6,-8.911,4.1138,12.7751,17.7283,9.8147477,0.9089022,2.7090895  
1234567890,,,,,,,

我确信如果 csv 只是一个大数组，我可以导入该数组，但当我从众多数组中挑选一个数组时，我感到很困惑。在将临时数组替换为 csv 文件中的下一个数组之前，需要对临时数组运行数据分析。

原文

I have been presented with a csv file that is full of 100+ arrays that I need to run through my data analysis code but I am not sure how to read these arrays in Python. Each array is preceded with a line that includes only an integer that gives the number of rows in the array and ends with the line '1234567890' to be used as a line separator.

Here is a snippet of the csv file:

7,,,,,,,  
1,-199.117,-105.4,-4.525,227.5415,225.2925647,-0.0198891,-2.6547518
2,133.0423,55.4573,-48.4174,155.16,144.1380093,-0.322813,0.3949385
3,129.8405,-16.9527,-303.3192,331.0847,130.9425427,-1.5644458,-0.1298311
4,-73.6373,71.4677,151.517,183.9712,102.616198,1.1678785,2.3711453
5,41.2654,10.4196,30.3773,54.0915,42.5605604,0.6351541,0.2473322
6,-20.3159,-32.4484,62.4574,74.8581,38.2836056,1.2022641,-2.1301853
7,-13.2904,22.029,-28.2895,38.5096,25.7276422,-0.9386666,2.1136489  
1234567890,,,,,,,  
5,,,,,,,  
1,-136.0755,-204.2787,-48.2127,259.2592,245.4512762,-0.1881526,-2.158425
2,220.5184,46.9388,-113.6448,265.1745,225.4586784,-0.4581388,0.2097266
3,-45.3132,169.6283,-49.2729,188.9506,175.576326,-0.2669358,1.8318334
4,-40.7141,34.7414,25.5414,60.9535,53.5219844,0.4465159,2.4351851
5,15.3863,-49.6703,17.1692,56.7635,51.9988166,0.312235,-1.2704018  
1234567890,,,,,,,  
6,,,,,,,   
1,-19.3083,295.4128,191.8666,360.3712,296.0431267,0.5935079,1.6360639
2,-169.8708,-128.3904,-1.0052,215.4187,212.9323449,-0.0046663,-2.4943822
3,15.4505,-209.6656,-178.0715,279.4077,210.2341118,-0.7536439,-1.4972381
4,172.4142,13.0485,-63.7912,192.2842,172.9072576,-0.3447988,0.0755371
5,16.7456,24.8768,-46.5025,55.9188,29.9878358,-1.1933262,0.9783247
6,-8.911,4.1138,12.7751,17.7283,9.8147477,0.9089022,2.7090895  
1234567890,,,,,,,

I am certain I could import the array if the csv was just one big array but I am stumped when it comes to picking one array out of many. The data analysis needs to be run on the temporary array before it is replaced with the next array in the csv file.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

林空鹿饮溪 2024-12-22 10:50:09

您可以使用 itertools.groupby 将行解析为单独的数组：

import csv
import itertools

with open('errors','w') as err: pass
with open('data','r') as f:
    for key, group in itertools.groupby(
            csv.reader(f),
            lambda row: row[0].startswith('1234567890')):
        if key: continue  # key is True means we've reach the end of an array
        group=list(group) # group is an iterator; we turn it into a list
        array=group[1:]   # everything but the first row is data
        arr_length=int(group[0][0]) # first row contains the length
        if arr_length != len(array): # sanity check
            with open('errors','a') as err:
                err.write('''\
Data file claims arr_length = {l}
{a}
{h}
'''.format(l=arr_length,a=str(list(array)),h='-'*80))
        print(array)

< code>itertools.groupby 返回一个迭代器。它循环遍历 csv.reader(f) 中的行，并将 lambda 函数应用于每一行。当行以 '1234567890' 开头时，lambda 函数返回 True。返回值（例如True 或False）称为key。重要的一点是，itertools.groupby 将返回相同键的所有连续行收集在一起。

You could use itertools.groupby to parse the rows into separate arrays:

import csv
import itertools

with open('errors','w') as err: pass
with open('data','r') as f:
    for key, group in itertools.groupby(
            csv.reader(f),
            lambda row: row[0].startswith('1234567890')):
        if key: continue  # key is True means we've reach the end of an array
        group=list(group) # group is an iterator; we turn it into a list
        array=group[1:]   # everything but the first row is data
        arr_length=int(group[0][0]) # first row contains the length
        if arr_length != len(array): # sanity check
            with open('errors','a') as err:
                err.write('''\
Data file claims arr_length = {l}
{a}
{h}
'''.format(l=arr_length,a=str(list(array)),h='-'*80))
        print(array)

itertools.groupby returns an iterator. It loops through the rows in csv.reader(f), and applies the lambda function to each row. The lambda function returns True when the row starts with '1234567890'. The return value (e.g. True or False) is called the key. The important point is that itertools.groupby collects together all contiguous rows that return the same key.

回复收藏 0 原文

不打扰别人 2024-12-22 10:50:09

这将为您提供一个格式良好的变量，称为“数据”，供您使用。

import csv
rows = csv.reader(open('your_file.csv'))

data = []
temp = []

for row in rows:
    if '1234567890' in row:
        data.append(temp)
        temp = []
        continue
    else:
        temp.append(row)

if temp != []:
    data.append(temp)

This should give you a nicely formatted variable called "data" to work with.

import csv
rows = csv.reader(open('your_file.csv'))

data = []
temp = []

for row in rows:
    if '1234567890' in row:
        data.append(temp)
        temp = []
        continue
    else:
        temp.append(row)

if temp != []:
    data.append(temp)

回复收藏 0 原文

~没有更多了~