如何读取固定宽度数据?
数据看起来
212253820000025000.00000002500.00000000375.00111120211105202117
212456960000000750.00000000075.00000000011.25111120211102202117
212387470000010000.00000001000.00000000150.00111120211105202117
需要添加分离器,例如
21225382,0000025000.00,000002500.00,000000375.00,11112021,11052021,17
21245696,0000000750.00,000000075.00,000000011.25,11112021,11022021,17
21238747,0000010000.00,000001000.00,000000150.00,11112021,11052021,17
CSV文件长度高接近20000行,有可能要这样做
data looks like
212253820000025000.00000002500.00000000375.00111120211105202117
212456960000000750.00000000075.00000000011.25111120211102202117
212387470000010000.00000001000.00000000150.00111120211105202117
need to add separator like
21225382,0000025000.00,000002500.00,000000375.00,11112021,11052021,17
21245696,0000000750.00,000000075.00,000000011.25,11112021,11022021,17
21238747,0000010000.00,000001000.00,000000150.00,11112021,11052021,17
The CSV file length is high nearly 20000 rows are there is there any possibility to do
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这个问题通常是关于阅读“ 固定宽度数据“。
如果您遇到此数据,则需要按行分析IT行,然后按列列。我将向您展示如何与Python一起做。
首先,您在评论中计算的列与样本输出不符。您似乎已经用2个字符的计数省略了最后一列。
您需要准确的列宽度来执行任务。我获取了您的示例数据并为您计算了这些列并得到了这些数字:
因此,我们将按行读取输入数据,对于每一行,我们将:
beg
结束
表示列在哪里开始(包含)及其结束的位置(独家)这是Python中的外观:
all_rows
只是文本列表的列表:使用此方法,如果您误缩了列宽度或列数,则可以轻松地修改
column_widths
以匹配数据。从这里开始,我们将使用Python的CSV模块来确保正确编写CSV文件:
我的Data.csv文件看起来像:
This question is generally about reading "fixed width data".
If you're stuck with this data, you'll need to parse it line by line then column by column. I'll show you how to do this with Python.
First off, the columns you counted off in the comment do not match your sample output. You seemed to have omitted the last column with a count of 2 characters.
You'll need accurate column widths to perform the task. I took your sample data and counted the columns for you and got these numbers:
So, we'll read the input data line by line, and for every line we'll:
beg
andend
to denote where a column begins (inclusive) and where it ends (exclusive)Here's how this looks in Python:
all_rows
is just a list of lists of text:With this approach, if you miscounted the column width or the number of columns you can easily modify the
Column_widths
to match your data.From here we'll use Python's CSV module to make sure the CSV file is written correctly:
and my data.csv file looks like:
如果您可以访问命令行工具尴尬,则可以如下修复数据:
substr()
给出一部分字符串$ 0
,这是整个行1
开始,然后指定第一列的宽度,8
$ 0
,从9
If you have access to the command-line tool awk, you can fix your data like the following:
substr()
gives a portion of the string$0
, which is the entire line1
then specify the width of your first column,8
$0
, you start at9
(1+8
from the last substr), and give it the second column's width,13