熊猫使用默认的非nan值与重叠的列名合并在一起

发布于 2025-01-27 11:49:39 字数 878 浏览 2 评论 0原文

有一个以CSV格式指定的表和一些配置列表。

我想从 configs 中使用默认配置，并在 tables 中覆盖它，如果需要，

configs = """  
schema|new_name
dbo|{table}
qa|test_{table}
"""

tables = """  
table|schema|new_name
employee|hr|{schema}_{table}
advertisers
users
"""

configs = pandas.read_csv(io.StringIO(configs), sep='|')  
tables = pandas.read_csv(io.StringIO(tables), sep='|')

我想交叉加入/merge/concatenate/compateNate/combine combine combine compline compline/combine ythem todef sughem togationframe。

final = """
table|schema|new_name

employee|hr|{schema}_{table}
advertisers|dbo|{table}
users|dbo|{table}

employee|hr|{schema}_{table}
advertisers|qa|test_{table}
users|qa|test_{table}
"""

如果未指定架构，请使用“ DBO”模式和“用户/广告商”表名。
如果指定模式，请使用“ HR”模式和“ HR_Employee”表名。

基本上 - 当2行带有重叠列名称的水平凸进时，请使用一个不是NAN的值创建一个列。

我应该使用什么熊猫命令？

原文

There is a list of tables and some configurations, specified in csv format.

I want to use default configuration from configs, and override it in tables if needed

configs = """  
schema|new_name
dbo|{table}
qa|test_{table}
"""

tables = """  
table|schema|new_name
employee|hr|{schema}_{table}
advertisers
users
"""

configs = pandas.read_csv(io.StringIO(configs), sep='|')  
tables = pandas.read_csv(io.StringIO(tables), sep='|')

I want to cross-join/merge/concatenate/combine them to get a dataframe which contains:

final = """
table|schema|new_name

employee|hr|{schema}_{table}
advertisers|dbo|{table}
users|dbo|{table}

employee|hr|{schema}_{table}
advertisers|qa|test_{table}
users|qa|test_{table}
"""

If schema is not specified, use 'dbo' schema and 'users/advertisers' table name.
If schema is specified, use 'hr' schema and 'hr_employee' table name.

Basically - when horizontal concat of 2 rows with overlapping column names, create one column using whichever value is not NaN.

What pandas command should I use ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萌辣 2025-02-03 11:49:39

编辑：

#cross join both DataFrames
df = tables.merge(configs, suffixes=('','_'), how='cross')

#get columns with suffix _
cols = df.columns[df.columns.str.endswith('_')]
#remove suffix
new = cols.str.strip('_')
#replace missing values from cols by _ colums
df[new] = df[new].fillna(df[cols].rename(columns=lambda x: x.strip('_')))
#remove columns with _
df = df.drop(cols, axis=1)
print (df)
         table schema          new_name
0     employee     hr  {schema}_{table}
1     employee     hr  {schema}_{table}
2  advertisers    dbo           {table}
3  advertisers     qa      test_{table}
4        users    dbo           {table}
5        users     qa      test_{table}

EDIT:

#cross join both DataFrames
df = tables.merge(configs, suffixes=('','_'), how='cross')

#get columns with suffix _
cols = df.columns[df.columns.str.endswith('_')]
#remove suffix
new = cols.str.strip('_')
#replace missing values from cols by _ colums
df[new] = df[new].fillna(df[cols].rename(columns=lambda x: x.strip('_')))
#remove columns with _
df = df.drop(cols, axis=1)
print (df)
         table schema          new_name
0     employee     hr  {schema}_{table}
1     employee     hr  {schema}_{table}
2  advertisers    dbo           {table}
3  advertisers     qa      test_{table}
4        users    dbo           {table}
5        users     qa      test_{table}

回复收藏 0 原文

~没有更多了~