在列中熔化python pandas和输入文件名

发布于 2025-02-13 08:45:45 字数 1411 浏览 1 评论 0原文

我有大约（100个文件 +）XLS文件，其中包含不同列的名称和数据类型

file_1.xls：

Id  test  category
1    ab      4
2    cs      3
3    cs      1

file_2.xls：

index remove  stocks  category
1       dr      4         a
2       as      3         b
3       ae      1         v

file 3：.. 。

import pandas as pd
import pathlib

data = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        data.append(pd.read_excel(filename).melt())
df = pd.concat(data, ignore_index=True)

        filename variable value
    0   File_1     Id      1
    1   File_1     Id      2
    2   File_1     Id      3
    3   File_1     test    ab
    4   File_1     test    cs
    5   File_1     test    cs
    6   File_1  category    4
    7   File_1  category    3
    8   File_1  category    1
    9   File_1   index      1
    10  File_1   index      2
    11  File_1   index      3
    12  FILE_2   remove     dr
    13  FILE_2  remove      as
    14  FILE_2  remove      ae
    15  FILE_2  stocks      4
    16  FILE_2  stocks      3
    17  FILE_2  stocks      1
    18  FILE_2  category    a
    19  FILE_2  category    b
    20  FILE_2  category    v
  1000  FILE_100 ....       ..

文件名“来源的名称文件？

原文

I have about (100 files +) XLS files in a folder with different columns names and data types

File_1.xls:

Id  test  category
1    ab      4
2    cs      3
3    cs      1

FILE_2.xls:

index remove  stocks  category
1       dr      4         a
2       as      3         b
3       ae      1         v

File 3: ....

File 100.....

Thats my code:

import pandas as pd
import pathlib

data = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        data.append(pd.read_excel(filename).melt())
df = pd.concat(data, ignore_index=True)

I would like to have the Dataframe in my output like this:

        filename variable value
    0   File_1     Id      1
    1   File_1     Id      2
    2   File_1     Id      3
    3   File_1     test    ab
    4   File_1     test    cs
    5   File_1     test    cs
    6   File_1  category    4
    7   File_1  category    3
    8   File_1  category    1
    9   File_1   index      1
    10  File_1   index      2
    11  File_1   index      3
    12  FILE_2   remove     dr
    13  FILE_2  remove      as
    14  FILE_2  remove      ae
    15  FILE_2  stocks      4
    16  FILE_2  stocks      3
    17  FILE_2  stocks      1
    18  FILE_2  category    a
    19  FILE_2  category    b
    20  FILE_2  category    v
  1000  FILE_100 ....       ..

How I can melt all mu columns and keep in the column "filename" the name of the source file?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

魔 2025-02-20 08:45:45

不要在循环中读取数据，而要收集文件名，然后使用字典理解来添加文件名作为串联键：

import pandas as pd
import pathlib

files = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        files.append(filename)
        
df = (pd.concat({f: pd.read_excel(f).melt() for f in files})
        .rename_axis(['filename', None]).reset_index(0)
        .reset_index(drop=True)
     )

输出：

   filename  variable value
0    File_1        Id     1
1    File_1        Id     2
2    File_1        Id     3
3    File_1      test    ab
4    File_1      test    cs
5    File_1      test    cs
6    File_1  category     4
7    File_1  category     3
8    File_1  category     1
9    File_2     index     1
10   File_2     index     2
11   File_2     index     3
12   File_2    remove    dr
13   File_2    remove    as
14   File_2    remove    ae
15   File_2    stocks     4
16   File_2    stocks     3
17   File_2    stocks     1
18   File_2  category     a
19   File_2  category     b
20   File_2  category     v

Don't read your data in the loop but rather collect the filenames, then use a dictionary comprehension to add the filenames as concatenation keys:

import pandas as pd
import pathlib

files = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        files.append(filename)
        
df = (pd.concat({f: pd.read_excel(f).melt() for f in files})
        .rename_axis(['filename', None]).reset_index(0)
        .reset_index(drop=True)
     )

output:

   filename  variable value
0    File_1        Id     1
1    File_1        Id     2
2    File_1        Id     3
3    File_1      test    ab
4    File_1      test    cs
5    File_1      test    cs
6    File_1  category     4
7    File_1  category     3
8    File_1  category     1
9    File_2     index     1
10   File_2     index     2
11   File_2     index     3
12   File_2    remove    dr
13   File_2    remove    as
14   File_2    remove    ae
15   File_2    stocks     4
16   File_2    stocks     3
17   File_2    stocks     1
18   File_2  category     a
19   File_2  category     b
20   File_2  category     v

回复收藏 0 原文

~没有更多了~