将一个数据帧中的列值作为列表附加到另一个数据帧

发布于 2025-01-11 17:32:05 字数 2667 浏览 0 评论 0原文

我有几十个非常相似的数据帧。我想要的是将每个列中的所有“VALUE”列值合并到列表中,并返回一个数据帧,其中“VALUE”列由这些列表组成。我只想对“PV”包含子字符串列表中的子字符串的行执行此操作。

我想出了一种我认为可行的方法,但它真的很糟糕而且无论如何都不起作用(在 3m 处停止)。必须有更好的方法来做到这一点,这里有人有什么想法吗?感谢您的任何和所有帮助。

import pandas as np
# Example dataFrames
df0 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [1, 2, 3, 4]})
df1 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [5, 6, 7, 8]})
df2 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [10, 11, 12, 13]})
DATAFRAMES

df0 dataFrame          df1 dataFrame      df2 dataFrame

PV   VALUE             PV   VALUE         PV   VALUE
pv1    1               pv1    5           pv1    10
pv2    2               pv2    6           pv2    11
pv3    3               pv3    7           pv3    12
pv4    4               pv4    8           pv4    13
# Nasty code I thought might work
strings = ['v2', 'v4']
for i, row0 in df0.iterrows():
    for j, row1 in df1.iterrows():
        if (row0['PV']==row1['PV']) & any(substring in row0['PV'] for substring in strings):
            df0.at[i,'VALUE'] = [row0['VALUE'], row1['VALUE']]

期望的结果:

PV   VALUE
pv1    1
pv2  [2,6]
pv3    3
pv4  [4,8]

@enke,谢谢您的帮助!我不得不稍微研究一下如何防止嵌套列表发生,并最终使用了以下带注释的函数/代码/输出:

def appendValues(df0, df1, pvStrings=['v2','v4']):
       # Turn values in VALUE column into list objects
       df0['VALUE'] = df0['VALUE'].apply(lambda x: x if isinstance(x,list) else [x])
       # For rows were PV string DOESN'T contain substring, set value to max()+1
       #    apply makes lists [x] empty if they were set to max()+1, else [x]
       df1['VALUE'] = (df1['VALUE']
                     .where(df1['PV'].str.contains('|'.join(pvStrings)), df1['VALUE'].max()+1)
                     .apply(lambda x: [x] if x <= df1['VALUE'].max() else []))
       # concatenate df1's VALUE column to df0
       #    set the indexing column to 'PV'
       #    sum all row values (axis=1) into one list
       data = (df0.merge(df1, on='PV')
                     .set_index('PV')
                     .sum(axis=1))
       # restore singleton lists to their original type, reset index moves current 'PV' index back to a column, and implements new sequential index
       data = data.mask(data.str.len().eq(1), data.str[0]).reset_index(name='VALUE')
       return data

data = appendValues(df0, df1, pvStrings=['v2','v4'])
data = appendValues(data, df2, pvStrings=['v1','v4'])
data

输出:

    PV    VALUE
0   pv1   [1,10]
1   pv2   [2,6]
2   pv3     3
3   pv4  [4,8,13]

I have dozens of very similar dataFrames. What I want is to combine all 'VALUE' column values from each into lists, and return a dataFrame where the 'VALUE' column is comprised of these lists. I only want to do this for rows where 'PV' contains a substring from a list of substrings.

I came up with one way I thought would work, but it's real nasty and doesn't work anyways (stopped it at 3m). There has to be a better way of doing this, does anyone here have any ideas? Thanks for any and all help.

import pandas as np
# Example dataFrames
df0 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [1, 2, 3, 4]})
df1 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [5, 6, 7, 8]})
df2 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [10, 11, 12, 13]})
DATAFRAMES

df0 dataFrame          df1 dataFrame      df2 dataFrame

PV   VALUE             PV   VALUE         PV   VALUE
pv1    1               pv1    5           pv1    10
pv2    2               pv2    6           pv2    11
pv3    3               pv3    7           pv3    12
pv4    4               pv4    8           pv4    13
# Nasty code I thought might work
strings = ['v2', 'v4']
for i, row0 in df0.iterrows():
    for j, row1 in df1.iterrows():
        if (row0['PV']==row1['PV']) & any(substring in row0['PV'] for substring in strings):
            df0.at[i,'VALUE'] = [row0['VALUE'], row1['VALUE']]

Desired result:

PV   VALUE
pv1    1
pv2  [2,6]
pv3    3
pv4  [4,8]

@enke thank you for your help! I had to play with it a bit to figure out how to keep nested lists from occurring, and ended up using the following commented function/code/output:

def appendValues(df0, df1, pvStrings=['v2','v4']):
       # Turn values in VALUE column into list objects
       df0['VALUE'] = df0['VALUE'].apply(lambda x: x if isinstance(x,list) else [x])
       # For rows were PV string DOESN'T contain substring, set value to max()+1
       #    apply makes lists [x] empty if they were set to max()+1, else [x]
       df1['VALUE'] = (df1['VALUE']
                     .where(df1['PV'].str.contains('|'.join(pvStrings)), df1['VALUE'].max()+1)
                     .apply(lambda x: [x] if x <= df1['VALUE'].max() else []))
       # concatenate df1's VALUE column to df0
       #    set the indexing column to 'PV'
       #    sum all row values (axis=1) into one list
       data = (df0.merge(df1, on='PV')
                     .set_index('PV')
                     .sum(axis=1))
       # restore singleton lists to their original type, reset index moves current 'PV' index back to a column, and implements new sequential index
       data = data.mask(data.str.len().eq(1), data.str[0]).reset_index(name='VALUE')
       return data

data = appendValues(df0, df1, pvStrings=['v2','v4'])
data = appendValues(data, df2, pvStrings=['v1','v4'])
data

Output:

    PV    VALUE
0   pv1   [1,10]
1   pv2   [2,6]
2   pv3     3
3   pv4  [4,8,13]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

拔了角的鹿 2025-01-18 17:32:05

您可以过滤df1以查找包含字符串的行;将其与 df0 连接起来;然后 groupby + agg(list) 可以聚合每个“PV”的“VALUE”。

最后,您可以使用 mask 从单例列表中取出元素。

out = (pd.concat([df0, df1[df1['PV'].str.contains('|'.join(strings))]])
       .groupby('PV', as_index=False)['VALUE'].agg(list))
out['VALUE'] = out['VALUE'].mask(out['VALUE'].str.len().eq(1), out['VALUE'].str[0])

或者,我们可以将“VALUE”列列表中的值并合并 + 连接列表:

df0['VALUE'] = df0['VALUE'].apply(lambda x: [x])
df1['VALUE'] = df1['VALUE'].where(df1['PV'].str.contains('|'.join(strings)), df1['VALUE'].max()+1).apply(lambda x: [x] if x <= df1['VALUE'].max() else [])
out = df0.merge(df1, on='PV').set_index('PV').sum(axis=1)
out = out.mask(out.str.len().eq(1), out.str[0]).reset_index(name='VALUE')

输出:

    PV   VALUE
0  pv1       1
1  pv2  [2, 6]
2  pv3       3
3  pv4  [4, 8]

如果您不想过滤掉“PV”中包含“字符串”的行而是将它们保留为单独的行,然后您可以首先 concat + groupby ;然后过滤+爆炸:

out = pd.concat([df0, df1]).groupby('PV', as_index=False)['VALUE'].agg(list)
msk = out['PV'].str.contains('|'.join(strings))
out = pd.concat((out[msk].explode('VALUE'), out[~msk])).sort_index()

输出:

    PV   VALUE
0  pv1  [1, 5]
1  pv2       2
1  pv2       6
2  pv3  [3, 7]
3  pv4       4
3  pv4       8

You could filter df1 for rows that contain strings; concatenate it with df0; then groupby + agg(list) can aggregate "VALUE"s for each "PV".

Finally, you could use mask to take out the elements from the singleton lists.

out = (pd.concat([df0, df1[df1['PV'].str.contains('|'.join(strings))]])
       .groupby('PV', as_index=False)['VALUE'].agg(list))
out['VALUE'] = out['VALUE'].mask(out['VALUE'].str.len().eq(1), out['VALUE'].str[0])

Alternatively, we could make the values in the "VALUE" columns lists and merge + concatenate the lists:

df0['VALUE'] = df0['VALUE'].apply(lambda x: [x])
df1['VALUE'] = df1['VALUE'].where(df1['PV'].str.contains('|'.join(strings)), df1['VALUE'].max()+1).apply(lambda x: [x] if x <= df1['VALUE'].max() else [])
out = df0.merge(df1, on='PV').set_index('PV').sum(axis=1)
out = out.mask(out.str.len().eq(1), out.str[0]).reset_index(name='VALUE')

Output:

    PV   VALUE
0  pv1       1
1  pv2  [2, 6]
2  pv3       3
3  pv4  [4, 8]

If you don't want to filter out the rows that contain "strings" in "PV" but rather keep them as separate rows, then you could concat + groupby first; then filter + explode:

out = pd.concat([df0, df1]).groupby('PV', as_index=False)['VALUE'].agg(list)
msk = out['PV'].str.contains('|'.join(strings))
out = pd.concat((out[msk].explode('VALUE'), out[~msk])).sort_index()

Output:

    PV   VALUE
0  pv1  [1, 5]
1  pv2       2
1  pv2       6
2  pv3  [3, 7]
3  pv4       4
3  pv4       8
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文