从txt文件获取特定数据到pandas dataframe
我在 txt 文件中有这样的数据:
Wed Mar 23 16:59:25 GMT 2022
1 State
1 ESTAB
Wed Mar 23 16:59:26 GMT 2022
1 State
1 ESTAB
1 CLOSE-WAIT
Wed Mar 23 16:59:27 GMT 2022
1 State
1 ESTAB
10 FIN-WAIT
Wed Mar 23 16:59:28 GMT 2022
1 State
1 CLOSE-WAIT
102 ESTAB
我想得到一个如下所示的 pandas 数据框:
timestamp | State | ESTAB | FIN-WAIT | CLOSE-WAIT
Wed Mar 23 16:59:25 GMT 2022 | 1 | 1 | 0 | 0
Wed Mar 23 16:59:26 GMT 2022 | 1 | 1 | 0 | 1
Wed Mar 23 16:59:27 GMT 2022 | 1 | 1 | 10 | 0
Wed Mar 23 16:59:28 GMT 2022 | 1 | 102 | 0 | 1
这意味着每段第一行中的字符串应该用于第一列 timestamp
。其他列应根据数字后面的字符串填充数字。下一栏在段落之后开始。
我怎样才能用熊猫做到这一点?
I have such data in a txt file:
Wed Mar 23 16:59:25 GMT 2022
1 State
1 ESTAB
Wed Mar 23 16:59:26 GMT 2022
1 State
1 ESTAB
1 CLOSE-WAIT
Wed Mar 23 16:59:27 GMT 2022
1 State
1 ESTAB
10 FIN-WAIT
Wed Mar 23 16:59:28 GMT 2022
1 State
1 CLOSE-WAIT
102 ESTAB
I want to get a pandas dataframe looking like this:
timestamp | State | ESTAB | FIN-WAIT | CLOSE-WAIT
Wed Mar 23 16:59:25 GMT 2022 | 1 | 1 | 0 | 0
Wed Mar 23 16:59:26 GMT 2022 | 1 | 1 | 0 | 1
Wed Mar 23 16:59:27 GMT 2022 | 1 | 1 | 10 | 0
Wed Mar 23 16:59:28 GMT 2022 | 1 | 102 | 0 | 1
That means the string in the first line per paragraph should be used for the first column timestamp
. The other columns should be filled withg the numbers according to the string following the number. The next column begins after a paragraph.
How can I do this with pandas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,您可以将txt文件处理为列表列表。内部列表意味着每个大块线。外部列表意味着不同的块:
然后您可以通过手动定义每个键和值将列表列表转换为字典列表
最后您可以将此字典输入数据帧并填充 nan 单元格
First you can process the txt file to a list of list. Inner list means each hunk lines. Outer list means different hunks:
Then you can turn the list of list to list of dictionary by manually define each key and value
At last you can feed this dictionary into dataframe and fill the nan cell
尝试:
Try: