来自TXT的Pandas DataFrame

发布于 2025-01-22 18:55:40 字数 920 浏览 4 评论 0原文

我有一个像这样的.txt:

       USA

Arizona - New Mexico
Interstate 40
Interstate 10

South Dakota - Minneapolis
Interstate 90

South Carolina - Washington

Arizona - California
Interstate 40
Interstate 10
Interstate 8

    ANOTHER COUNTRY

State A - State B
Highway 1
Highway 2
Highway 3
...
...

我想在熊猫中创建一个数据框和一个CSV,其中第一列包含状态,而第二列则是高速公路。

      States                    HW_Number
Arizona - New Mexico          Interstate 40
Arizona - New Mexico          Interstate 10
South Dakota - Minneapolis    Interstate 90
Arizona - California          Interstate 40
Arizona - California          Interstate 10
Arizona - California          Interstate 9
State A - State B             Highway 1
State A - State B             Highway 2
State A - State B             Highway 3

我该如何做到这一点?并非所有州都有相同数量的高速公路,并且可以拥有0个高速公路,而那些拥有0的高速公路,我不想将其集成到数据范围中。

该国的专栏也可以集成。

谢谢

I have a .txt that goes like this:

       USA

Arizona - New Mexico
Interstate 40
Interstate 10

South Dakota - Minneapolis
Interstate 90

South Carolina - Washington

Arizona - California
Interstate 40
Interstate 10
Interstate 8

    ANOTHER COUNTRY

State A - State B
Highway 1
Highway 2
Highway 3
...
...

I want to create a DataFrame and a CSV in pandas, where the first column contains the States, and the second column the Highway.

      States                    HW_Number
Arizona - New Mexico          Interstate 40
Arizona - New Mexico          Interstate 10
South Dakota - Minneapolis    Interstate 90
Arizona - California          Interstate 40
Arizona - California          Interstate 10
Arizona - California          Interstate 9
State A - State B             Highway 1
State A - State B             Highway 2
State A - State B             Highway 3

How can I manage to do that? Not all the states have the same amount of Highways, and can have 0 Highways, and those that have 0, I do not want to be integrated in the DataFrame.

A column with the Country could be integrated as well.

Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

烟火散人牵绊 2025-01-29 18:55:40

正如我所说,一个非常简单的文件可以解析:

import pandas as pd

rows = []
state = None
for line in open('x.txt'):
    if line[0] == ' ':
        continue
    line = line.strip()
    if not line:
        continue
    if '-' in line:
        state = line
    else:
        rows.append( (state,line) )

df = pd.DataFrame(rows, columns=['state','road'])
print(df)

输出:

----------
                        state           road
0        Arizona - New Mexico  Interstate 40
1        Arizona - New Mexico  Interstate 10
2  South Dakota - Minneapolis  Interstate 90
3        Arizona - California  Interstate 40
4        Arizona - California  Interstate 10
5        Arizona - California   Interstate 8
6           State A - State B      Highway 1
7           State A - State B      Highway 2
8           State A - State B      Highway 3

As I said, a pretty easy file to parse:

import pandas as pd

rows = []
state = None
for line in open('x.txt'):
    if line[0] == ' ':
        continue
    line = line.strip()
    if not line:
        continue
    if '-' in line:
        state = line
    else:
        rows.append( (state,line) )

df = pd.DataFrame(rows, columns=['state','road'])
print(df)

Output:

----------
                        state           road
0        Arizona - New Mexico  Interstate 40
1        Arizona - New Mexico  Interstate 10
2  South Dakota - Minneapolis  Interstate 90
3        Arizona - California  Interstate 40
4        Arizona - California  Interstate 10
5        Arizona - California   Interstate 8
6           State A - State B      Highway 1
7           State A - State B      Highway 2
8           State A - State B      Highway 3

大姐,你呐 2025-01-29 18:55:40

您可以迭代行并使用结​​构化数据的特征来创建列表。这些列表可用于制作数据框架或系列。

  1. 将文件中的行读取到列表(f.readlines())
  2. 删除空行
  3. 跟踪当前状态(没有数字结束)
  4. 附加状态和高速公路列表
  5. 使用列表来制作dataframe或series

import pandas as pd
import io
f = io.StringIO(
    """
USA

Arizona - New Mexico
Interstate 40
Interstate 10

South Dakota - Minneapolis
Interstate 90

South Carolina - Washington

Arizona - California
Interstate 40
Interstate 10
Interstate 8

ANOTHER COUNTRY

State A - State B
Highway 1
Highway 2
Highway 3
    """
)
lines = f.readlines()
states = []
hw_numbers = []
current_state = None
for line in lines:
    line = line.strip() #removes \n
    
    if line == '': #remove empty rows
        continue
    elif line[-1].isdigit() == False: #if not a digit, then it's a state
        current_state = line
    else: #if it is a digit, then it's a highway
        states.append(current_state)
        hw_numbers.append(line)
pd.DataFrame({
    'States':states,
    'HW_number':hw_numbers
})

You can iterate through the rows and use characteristics of your structured data to create lists. These lists can be used to make a dataframe or series.

  1. read the lines from the file into a list (f.readlines())
  2. remove empty rows
  3. keep track of current state (doesn't end with a number)
  4. append the states and highways to lists
  5. use lists to make a dataframe or series

enter image description here

import pandas as pd
import io
f = io.StringIO(
    """
USA

Arizona - New Mexico
Interstate 40
Interstate 10

South Dakota - Minneapolis
Interstate 90

South Carolina - Washington

Arizona - California
Interstate 40
Interstate 10
Interstate 8

ANOTHER COUNTRY

State A - State B
Highway 1
Highway 2
Highway 3
    """
)
lines = f.readlines()
states = []
hw_numbers = []
current_state = None
for line in lines:
    line = line.strip() #removes \n
    
    if line == '': #remove empty rows
        continue
    elif line[-1].isdigit() == False: #if not a digit, then it's a state
        current_state = line
    else: #if it is a digit, then it's a highway
        states.append(current_state)
        hw_numbers.append(line)
pd.DataFrame({
    'States':states,
    'HW_number':hw_numbers
})
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文