使用 CSV 数组作为 Python 的输入
我收到了一个 csv 文件,其中包含 100 多个数组,我需要运行这些数组来运行我的数据分析代码,但我不知道如何在 Python 中读取这些数组。每个数组前面都有一行,该行仅包含一个整数,该整数给出数组中的行数,并以行“1234567890”结尾,用作行分隔符。
这是 csv 文件的片段:
7,,,,,,,
1,-199.117,-105.4,-4.525,227.5415,225.2925647,-0.0198891,-2.6547518
2,133.0423,55.4573,-48.4174,155.16,144.1380093,-0.322813,0.3949385
3,129.8405,-16.9527,-303.3192,331.0847,130.9425427,-1.5644458,-0.1298311
4,-73.6373,71.4677,151.517,183.9712,102.616198,1.1678785,2.3711453
5,41.2654,10.4196,30.3773,54.0915,42.5605604,0.6351541,0.2473322
6,-20.3159,-32.4484,62.4574,74.8581,38.2836056,1.2022641,-2.1301853
7,-13.2904,22.029,-28.2895,38.5096,25.7276422,-0.9386666,2.1136489
1234567890,,,,,,,
5,,,,,,,
1,-136.0755,-204.2787,-48.2127,259.2592,245.4512762,-0.1881526,-2.158425
2,220.5184,46.9388,-113.6448,265.1745,225.4586784,-0.4581388,0.2097266
3,-45.3132,169.6283,-49.2729,188.9506,175.576326,-0.2669358,1.8318334
4,-40.7141,34.7414,25.5414,60.9535,53.5219844,0.4465159,2.4351851
5,15.3863,-49.6703,17.1692,56.7635,51.9988166,0.312235,-1.2704018
1234567890,,,,,,,
6,,,,,,,
1,-19.3083,295.4128,191.8666,360.3712,296.0431267,0.5935079,1.6360639
2,-169.8708,-128.3904,-1.0052,215.4187,212.9323449,-0.0046663,-2.4943822
3,15.4505,-209.6656,-178.0715,279.4077,210.2341118,-0.7536439,-1.4972381
4,172.4142,13.0485,-63.7912,192.2842,172.9072576,-0.3447988,0.0755371
5,16.7456,24.8768,-46.5025,55.9188,29.9878358,-1.1933262,0.9783247
6,-8.911,4.1138,12.7751,17.7283,9.8147477,0.9089022,2.7090895
1234567890,,,,,,,
我确信如果 csv 只是一个大数组,我可以导入该数组,但当我从众多数组中挑选一个数组时,我感到很困惑。在将临时数组替换为 csv 文件中的下一个数组之前,需要对临时数组运行数据分析。
I have been presented with a csv file that is full of 100+ arrays that I need to run through my data analysis code but I am not sure how to read these arrays in Python. Each array is preceded with a line that includes only an integer that gives the number of rows in the array and ends with the line '1234567890' to be used as a line separator.
Here is a snippet of the csv file:
7,,,,,,,
1,-199.117,-105.4,-4.525,227.5415,225.2925647,-0.0198891,-2.6547518
2,133.0423,55.4573,-48.4174,155.16,144.1380093,-0.322813,0.3949385
3,129.8405,-16.9527,-303.3192,331.0847,130.9425427,-1.5644458,-0.1298311
4,-73.6373,71.4677,151.517,183.9712,102.616198,1.1678785,2.3711453
5,41.2654,10.4196,30.3773,54.0915,42.5605604,0.6351541,0.2473322
6,-20.3159,-32.4484,62.4574,74.8581,38.2836056,1.2022641,-2.1301853
7,-13.2904,22.029,-28.2895,38.5096,25.7276422,-0.9386666,2.1136489
1234567890,,,,,,,
5,,,,,,,
1,-136.0755,-204.2787,-48.2127,259.2592,245.4512762,-0.1881526,-2.158425
2,220.5184,46.9388,-113.6448,265.1745,225.4586784,-0.4581388,0.2097266
3,-45.3132,169.6283,-49.2729,188.9506,175.576326,-0.2669358,1.8318334
4,-40.7141,34.7414,25.5414,60.9535,53.5219844,0.4465159,2.4351851
5,15.3863,-49.6703,17.1692,56.7635,51.9988166,0.312235,-1.2704018
1234567890,,,,,,,
6,,,,,,,
1,-19.3083,295.4128,191.8666,360.3712,296.0431267,0.5935079,1.6360639
2,-169.8708,-128.3904,-1.0052,215.4187,212.9323449,-0.0046663,-2.4943822
3,15.4505,-209.6656,-178.0715,279.4077,210.2341118,-0.7536439,-1.4972381
4,172.4142,13.0485,-63.7912,192.2842,172.9072576,-0.3447988,0.0755371
5,16.7456,24.8768,-46.5025,55.9188,29.9878358,-1.1933262,0.9783247
6,-8.911,4.1138,12.7751,17.7283,9.8147477,0.9089022,2.7090895
1234567890,,,,,,,
I am certain I could import the array if the csv was just one big array but I am stumped when it comes to picking one array out of many. The data analysis needs to be run on the temporary array before it is replaced with the next array in the csv file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 itertools.groupby 将行解析为单独的数组:
< code>itertools.groupby 返回一个迭代器。它循环遍历 csv.reader(f) 中的行,并将 lambda 函数应用于每一行。当行以
'1234567890'
开头时,lambda 函数返回True
。返回值(例如True
或False
)称为key
。重要的一点是,itertools.groupby 将返回相同键的所有连续行收集在一起。You could use itertools.groupby to parse the rows into separate arrays:
itertools.groupby
returns an iterator. It loops through the rows incsv.reader(f)
, and applies thelambda
function to each row. The lambda function returnsTrue
when the row starts with'1234567890'
. The return value (e.g.True
orFalse
) is called thekey
. The important point is thatitertools.groupby
collects together all contiguous rows that return the same key.这将为您提供一个格式良好的变量,称为“数据”,供您使用。
This should give you a nicely formatted variable called "data" to work with.