来自TXT的Pandas DataFrame

发布于 2025-01-22 18:55:40 字数 920 浏览 4 评论 0原文

我有一个像这样的.txt：

       USA

Arizona - New Mexico
Interstate 40
Interstate 10

South Dakota - Minneapolis
Interstate 90

South Carolina - Washington

Arizona - California
Interstate 40
Interstate 10
Interstate 8

    ANOTHER COUNTRY

State A - State B
Highway 1
Highway 2
Highway 3
...
...

我想在熊猫中创建一个数据框和一个CSV，其中第一列包含状态，而第二列则是高速公路。

      States                    HW_Number
Arizona - New Mexico          Interstate 40
Arizona - New Mexico          Interstate 10
South Dakota - Minneapolis    Interstate 90
Arizona - California          Interstate 40
Arizona - California          Interstate 10
Arizona - California          Interstate 9
State A - State B             Highway 1
State A - State B             Highway 2
State A - State B             Highway 3

我该如何做到这一点？并非所有州都有相同数量的高速公路，并且可以拥有0个高速公路，而那些拥有0的高速公路，我不想将其集成到数据范围中。

该国的专栏也可以集成。

谢谢

原文

I have a .txt that goes like this:

       USA

Arizona - New Mexico
Interstate 40
Interstate 10

South Dakota - Minneapolis
Interstate 90

South Carolina - Washington

Arizona - California
Interstate 40
Interstate 10
Interstate 8

    ANOTHER COUNTRY

State A - State B
Highway 1
Highway 2
Highway 3
...
...

I want to create a DataFrame and a CSV in pandas, where the first column contains the States, and the second column the Highway.

      States                    HW_Number
Arizona - New Mexico          Interstate 40
Arizona - New Mexico          Interstate 10
South Dakota - Minneapolis    Interstate 90
Arizona - California          Interstate 40
Arizona - California          Interstate 10
Arizona - California          Interstate 9
State A - State B             Highway 1
State A - State B             Highway 2
State A - State B             Highway 3

How can I manage to do that? Not all the states have the same amount of Highways, and can have 0 Highways, and those that have 0, I do not want to be integrated in the DataFrame.

A column with the Country could be integrated as well.

Thank you

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟火散人牵绊 2025-01-29 18:55:40

正如我所说，一个非常简单的文件可以解析：

import pandas as pd

rows = []
state = None
for line in open('x.txt'):
    if line[0] == ' ':
        continue
    line = line.strip()
    if not line:
        continue
    if '-' in line:
        state = line
    else:
        rows.append( (state,line) )

df = pd.DataFrame(rows, columns=['state','road'])
print(df)

输出：

----------
                        state           road
0        Arizona - New Mexico  Interstate 40
1        Arizona - New Mexico  Interstate 10
2  South Dakota - Minneapolis  Interstate 90
3        Arizona - California  Interstate 40
4        Arizona - California  Interstate 10
5        Arizona - California   Interstate 8
6           State A - State B      Highway 1
7           State A - State B      Highway 2
8           State A - State B      Highway 3

As I said, a pretty easy file to parse:

import pandas as pd

rows = []
state = None
for line in open('x.txt'):
    if line[0] == ' ':
        continue
    line = line.strip()
    if not line:
        continue
    if '-' in line:
        state = line
    else:
        rows.append( (state,line) )

df = pd.DataFrame(rows, columns=['state','road'])
print(df)

Output:

----------
                        state           road
0        Arizona - New Mexico  Interstate 40
1        Arizona - New Mexico  Interstate 10
2  South Dakota - Minneapolis  Interstate 90
3        Arizona - California  Interstate 40
4        Arizona - California  Interstate 10
5        Arizona - California   Interstate 8
6           State A - State B      Highway 1
7           State A - State B      Highway 2
8           State A - State B      Highway 3

回复收藏 0 原文

大姐，你呐 2025-01-29 18:55:40

您可以迭代行并使用结构化数据的特征来创建列表。这些列表可用于制作数据框架或系列。

将文件中的行读取到列表（f.readlines（））
删除空行
跟踪当前状态（没有数字结束）
附加状态和高速公路列表
使用列表来制作dataframe或series

import pandas as pd
import io
f = io.StringIO(
    """
USA

Arizona - New Mexico
Interstate 40
Interstate 10

South Dakota - Minneapolis
Interstate 90

South Carolina - Washington

Arizona - California
Interstate 40
Interstate 10
Interstate 8

ANOTHER COUNTRY

State A - State B
Highway 1
Highway 2
Highway 3
    """
)
lines = f.readlines()
states = []
hw_numbers = []
current_state = None
for line in lines:
    line = line.strip() #removes \n
    
    if line == '': #remove empty rows
        continue
    elif line[-1].isdigit() == False: #if not a digit, then it's a state
        current_state = line
    else: #if it is a digit, then it's a highway
        states.append(current_state)
        hw_numbers.append(line)
pd.DataFrame({
    'States':states,
    'HW_number':hw_numbers
})

You can iterate through the rows and use characteristics of your structured data to create lists. These lists can be used to make a dataframe or series.

read the lines from the file into a list (f.readlines())
remove empty rows
keep track of current state (doesn't end with a number)
append the states and highways to lists
use lists to make a dataframe or series

import pandas as pd
import io
f = io.StringIO(
    """
USA

Arizona - New Mexico
Interstate 40
Interstate 10

South Dakota - Minneapolis
Interstate 90

South Carolina - Washington

Arizona - California
Interstate 40
Interstate 10
Interstate 8

ANOTHER COUNTRY

State A - State B
Highway 1
Highway 2
Highway 3
    """
)
lines = f.readlines()
states = []
hw_numbers = []
current_state = None
for line in lines:
    line = line.strip() #removes \n
    
    if line == '': #remove empty rows
        continue
    elif line[-1].isdigit() == False: #if not a digit, then it's a state
        current_state = line
    else: #if it is a digit, then it's a highway
        states.append(current_state)
        hw_numbers.append(line)
pd.DataFrame({
    'States':states,
    'HW_number':hw_numbers
})

回复收藏 0 原文

~没有更多了~