将一个数据帧中的列值作为列表附加到另一个数据帧
我有几十个非常相似的数据帧。我想要的是将每个列中的所有“VALUE”列值合并到列表中,并返回一个数据帧,其中“VALUE”列由这些列表组成。我只想对“PV”包含子字符串列表中的子字符串的行执行此操作。
我想出了一种我认为可行的方法,但它真的很糟糕而且无论如何都不起作用(在 3m 处停止)。必须有更好的方法来做到这一点,这里有人有什么想法吗?感谢您的任何和所有帮助。
import pandas as np
# Example dataFrames
df0 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [1, 2, 3, 4]})
df1 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [5, 6, 7, 8]})
df2 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [10, 11, 12, 13]})
DATAFRAMES
df0 dataFrame df1 dataFrame df2 dataFrame
PV VALUE PV VALUE PV VALUE
pv1 1 pv1 5 pv1 10
pv2 2 pv2 6 pv2 11
pv3 3 pv3 7 pv3 12
pv4 4 pv4 8 pv4 13
# Nasty code I thought might work
strings = ['v2', 'v4']
for i, row0 in df0.iterrows():
for j, row1 in df1.iterrows():
if (row0['PV']==row1['PV']) & any(substring in row0['PV'] for substring in strings):
df0.at[i,'VALUE'] = [row0['VALUE'], row1['VALUE']]
期望的结果:
PV VALUE
pv1 1
pv2 [2,6]
pv3 3
pv4 [4,8]
@enke,谢谢您的帮助!我不得不稍微研究一下如何防止嵌套列表发生,并最终使用了以下带注释的函数/代码/输出:
def appendValues(df0, df1, pvStrings=['v2','v4']):
# Turn values in VALUE column into list objects
df0['VALUE'] = df0['VALUE'].apply(lambda x: x if isinstance(x,list) else [x])
# For rows were PV string DOESN'T contain substring, set value to max()+1
# apply makes lists [x] empty if they were set to max()+1, else [x]
df1['VALUE'] = (df1['VALUE']
.where(df1['PV'].str.contains('|'.join(pvStrings)), df1['VALUE'].max()+1)
.apply(lambda x: [x] if x <= df1['VALUE'].max() else []))
# concatenate df1's VALUE column to df0
# set the indexing column to 'PV'
# sum all row values (axis=1) into one list
data = (df0.merge(df1, on='PV')
.set_index('PV')
.sum(axis=1))
# restore singleton lists to their original type, reset index moves current 'PV' index back to a column, and implements new sequential index
data = data.mask(data.str.len().eq(1), data.str[0]).reset_index(name='VALUE')
return data
data = appendValues(df0, df1, pvStrings=['v2','v4'])
data = appendValues(data, df2, pvStrings=['v1','v4'])
data
输出:
PV VALUE
0 pv1 [1,10]
1 pv2 [2,6]
2 pv3 3
3 pv4 [4,8,13]
I have dozens of very similar dataFrames. What I want is to combine all 'VALUE' column values from each into lists, and return a dataFrame where the 'VALUE' column is comprised of these lists. I only want to do this for rows where 'PV' contains a substring from a list of substrings.
I came up with one way I thought would work, but it's real nasty and doesn't work anyways (stopped it at 3m). There has to be a better way of doing this, does anyone here have any ideas? Thanks for any and all help.
import pandas as np
# Example dataFrames
df0 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [1, 2, 3, 4]})
df1 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [5, 6, 7, 8]})
df2 = pd.DataFrame(data={'PV': ['pv1', 'pv2', 'pv3', 'pv4'], 'VALUE': [10, 11, 12, 13]})
DATAFRAMES
df0 dataFrame df1 dataFrame df2 dataFrame
PV VALUE PV VALUE PV VALUE
pv1 1 pv1 5 pv1 10
pv2 2 pv2 6 pv2 11
pv3 3 pv3 7 pv3 12
pv4 4 pv4 8 pv4 13
# Nasty code I thought might work
strings = ['v2', 'v4']
for i, row0 in df0.iterrows():
for j, row1 in df1.iterrows():
if (row0['PV']==row1['PV']) & any(substring in row0['PV'] for substring in strings):
df0.at[i,'VALUE'] = [row0['VALUE'], row1['VALUE']]
Desired result:
PV VALUE
pv1 1
pv2 [2,6]
pv3 3
pv4 [4,8]
@enke thank you for your help! I had to play with it a bit to figure out how to keep nested lists from occurring, and ended up using the following commented function/code/output:
def appendValues(df0, df1, pvStrings=['v2','v4']):
# Turn values in VALUE column into list objects
df0['VALUE'] = df0['VALUE'].apply(lambda x: x if isinstance(x,list) else [x])
# For rows were PV string DOESN'T contain substring, set value to max()+1
# apply makes lists [x] empty if they were set to max()+1, else [x]
df1['VALUE'] = (df1['VALUE']
.where(df1['PV'].str.contains('|'.join(pvStrings)), df1['VALUE'].max()+1)
.apply(lambda x: [x] if x <= df1['VALUE'].max() else []))
# concatenate df1's VALUE column to df0
# set the indexing column to 'PV'
# sum all row values (axis=1) into one list
data = (df0.merge(df1, on='PV')
.set_index('PV')
.sum(axis=1))
# restore singleton lists to their original type, reset index moves current 'PV' index back to a column, and implements new sequential index
data = data.mask(data.str.len().eq(1), data.str[0]).reset_index(name='VALUE')
return data
data = appendValues(df0, df1, pvStrings=['v2','v4'])
data = appendValues(data, df2, pvStrings=['v1','v4'])
data
Output:
PV VALUE
0 pv1 [1,10]
1 pv2 [2,6]
2 pv3 3
3 pv4 [4,8,13]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以过滤
df1
以查找包含字符串
的行;将其与 df0 连接起来;然后groupby
+agg(list)
可以聚合每个“PV”的“VALUE”。最后,您可以使用
mask
从单例列表中取出元素。或者,我们可以将“VALUE”列列表中的值并
合并
+ 连接列表:输出:
如果您不想过滤掉“PV”中包含“字符串”的行而是将它们保留为单独的行,然后您可以首先
concat
+groupby
;然后过滤+爆炸:输出:
You could filter
df1
for rows that containstrings
; concatenate it withdf0
; thengroupby
+agg(list)
can aggregate "VALUE"s for each "PV".Finally, you could use
mask
to take out the elements from the singleton lists.Alternatively, we could make the values in the "VALUE" columns lists and
merge
+ concatenate the lists:Output:
If you don't want to filter out the rows that contain "strings" in "PV" but rather keep them as separate rows, then you could
concat
+groupby
first; then filter +explode
:Output: