键列表:来自DataFrame列的值

发布于 2025-01-31 00:58:41 字数 947 浏览 4 评论 0原文

我搜索一个“ global ”解决方案,从数据框的列,“键”:“ value” 的列表中提取,使每个“ key” 作为列名和“ value ”作为值:

之前:

id, severity, user, events, city

1,Low,test1,[{'type': 'AAA', 'timestamp': 1653135398011, 'agent': None,...}], Athens
2,Medium,test2,[{'type': 'BBB', 'timestamp': 1653135398012, 'agent': STIX,...}], Buffalo
3,,test3,[{'type': 'CCC', 'timestamp': 1653135398013, 'agent': ACQ,...}], Carson
4,Low,test4,[{'type': 'DDD', 'timestamp': 1653135398014, 'agent': VTC,...}], Detroit

after:

id, severity, user, type, timestamp, agent,..., city

1,Low,test1,AAA,1653135398011,None, ..., Athens
2,Medium,test2,BBB,1653135398012,STIX, ..., Buffalo
3,,test3,CCC,1653135398013,ACQ,..., Carson
4,Low,test4,DDD,1653135398014,VTC,..., Detroit

在stackoverflow上以其名称提取2或3个字段,但是如果我们不知道列表内容,那么如何提取所有内容? 我认为Lambda功能和/或Regex会完成这项工作,但我的技能太糟糕了...

感谢您的帮助

I search a "global" solution to extract, from a dataframe's column, a list of "key":"value" to have each "key" as Column name and "value" as Value:

Before:

id, severity, user, events, city

1,Low,test1,[{'type': 'AAA', 'timestamp': 1653135398011, 'agent': None,...}], Athens
2,Medium,test2,[{'type': 'BBB', 'timestamp': 1653135398012, 'agent': STIX,...}], Buffalo
3,,test3,[{'type': 'CCC', 'timestamp': 1653135398013, 'agent': ACQ,...}], Carson
4,Low,test4,[{'type': 'DDD', 'timestamp': 1653135398014, 'agent': VTC,...}], Detroit

After:

id, severity, user, type, timestamp, agent,..., city

1,Low,test1,AAA,1653135398011,None, ..., Athens
2,Medium,test2,BBB,1653135398012,STIX, ..., Buffalo
3,,test3,CCC,1653135398013,ACQ,..., Carson
4,Low,test4,DDD,1653135398014,VTC,..., Detroit

On stackoverflow some solution extract 2 or 3 fields by their names, but if we don't know list content, how extract everything?
I think lambda function and/or regex will do the job but my skills are too bad...

Thanks for your help

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你没皮卡萌 2025-02-07 00:58:41

您可以尝试这样的事情,

events_df = pd.DataFrame()
for row in df["events"]:
    events_df = events_df.append(row[0], ignore_index=True)

pd.concat([df, events_df], axis=1).drop(["events"], axis=1)

我让它与看起来像这样的dataFrame使用,

   id severity   user                                             events  \
0   1      Low  test1  [{'type': 'AAA', 'timestamp': 1653135398011, '...   
1   2   Medium  test2  [{'type': 'BBB', 'timestamp': 1653135398012, '...   
2   3      NaN  test3  [{'type': 'CCC', 'timestamp': 1653135398013, '...   
3   4      Low  test4  [{'type': 'DDD', 'timestamp': 1653135398014, '...   

      city  
0   Athens  
1  Buffalo  
2   Carson  
3  Detroi

编辑:

谢谢@thyebri的建议。无需使用循环即可完成相同的操作。尽管我没有资格说这是否效率或多或少。

events_df = pd.DataFrame(iter(df["events"].apply(lambda ls: ls[0])))
pd.concat([df, events_df], axis=1).drop(["events"], axis=1)

You can try something like this

events_df = pd.DataFrame()
for row in df["events"]:
    events_df = events_df.append(row[0], ignore_index=True)

pd.concat([df, events_df], axis=1).drop(["events"], axis=1)

I got it working with a DataFrame that looks like this,

   id severity   user                                             events  \
0   1      Low  test1  [{'type': 'AAA', 'timestamp': 1653135398011, '...   
1   2   Medium  test2  [{'type': 'BBB', 'timestamp': 1653135398012, '...   
2   3      NaN  test3  [{'type': 'CCC', 'timestamp': 1653135398013, '...   
3   4      Low  test4  [{'type': 'DDD', 'timestamp': 1653135398014, '...   

      city  
0   Athens  
1  Buffalo  
2   Carson  
3  Detroi

Edit:

Thank @Thyebri for the suggestion. It's possible to complete the same without using a loop. Though I am not qualified to say if it's more or less efficient.

events_df = pd.DataFrame(iter(df["events"].apply(lambda ls: ls[0])))
pd.concat([df, events_df], axis=1).drop(["events"], axis=1)
生生漫 2025-02-07 00:58:41

这是做您问题提出的方法的方法:

from functools import reduce
df = pd.concat([df.drop(columns=['events', 'city']), pd.DataFrame.from_records(reduce(lambda a, b: a + b, df.events)), df['city']], axis=1)

说明:

  • 使用functools.Reduce(),在Events列中创建一个字典对象列表
  • 列使用pd.concat( )要粘合在一起(1)事件前的列,(2)evest> evest> Events使用from_records()创建的列中的字典中的列的数据帧,以及(3)列(3)(3) 的权利上

s )

import pandas as pd
df = pd.DataFrame(columns=['id','severity','user','events','city'], data=[
[1,'Low','test1',[{'type': 'AAA', 'timestamp': 1653135398011, 'agent': None}],'Athens'],
[2,'Medium','test2',[{'type': 'BBB', 'timestamp': 1653135398012, 'agent': 'STIX'}],'Buffalo'],
[3,None,'test3',[{'type': 'CCC', 'timestamp': 1653135398013, 'agent': 'ACQ'}], 'Carson'],
[4,'Low','test4',[{'type': 'DDD', 'timestamp': 1653135398014, 'agent': 'VTC'}], 'Detroit']])

print('Input dataframe:')
print(df)

from functools import reduce
df = pd.concat([df.drop(columns=['events', 'city']), pd.DataFrame.from_records(reduce(lambda a, b: a + b, df.events)), df['city']], axis=1)

print('\nResult:')
print(df)

Input dataframe:
   id severity   user                                             events     city
0   1      Low  test1  [{'type': 'AAA', 'timestamp': 1653135398011, '...   Athens
1   2   Medium  test2  [{'type': 'BBB', 'timestamp': 1653135398012, '...  Buffalo
2   3     None  test3  [{'type': 'CCC', 'timestamp': 1653135398013, '...   Carson
3   4      Low  test4  [{'type': 'DDD', 'timestamp': 1653135398014, '...  Detroit

Result:
   id severity   user type      timestamp agent     city
0   1      Low  test1  AAA  1653135398011  None   Athens
1   2   Medium  test2  BBB  1653135398012  STIX  Buffalo
2   3     None  test3  CCC  1653135398013   ACQ   Carson
3   4      Low  test4  DDD  1653135398014   VTC  Detroit

事件 问题,例如围绕“代理”值(Stix,ACQ,VTC)的引号。

Here is a way to do what your question asks:

from functools import reduce
df = pd.concat([df.drop(columns=['events', 'city']), pd.DataFrame.from_records(reduce(lambda a, b: a + b, df.events)), df['city']], axis=1)

Explanation:

  • Using functools.reduce(), create a list of the dictionary objects in the events column
  • Use pd.concat() to glue together (1) the columns preceding events, (2) a dataframe of columns in the dictionaries found in the values in the events column created using from_records(), and (3) the column(s) to the right of events (in this case, just city)

Full test code:

import pandas as pd
df = pd.DataFrame(columns=['id','severity','user','events','city'], data=[
[1,'Low','test1',[{'type': 'AAA', 'timestamp': 1653135398011, 'agent': None}],'Athens'],
[2,'Medium','test2',[{'type': 'BBB', 'timestamp': 1653135398012, 'agent': 'STIX'}],'Buffalo'],
[3,None,'test3',[{'type': 'CCC', 'timestamp': 1653135398013, 'agent': 'ACQ'}], 'Carson'],
[4,'Low','test4',[{'type': 'DDD', 'timestamp': 1653135398014, 'agent': 'VTC'}], 'Detroit']])

print('Input dataframe:')
print(df)

from functools import reduce
df = pd.concat([df.drop(columns=['events', 'city']), pd.DataFrame.from_records(reduce(lambda a, b: a + b, df.events)), df['city']], axis=1)

print('\nResult:')
print(df)

Output:

Input dataframe:
   id severity   user                                             events     city
0   1      Low  test1  [{'type': 'AAA', 'timestamp': 1653135398011, '...   Athens
1   2   Medium  test2  [{'type': 'BBB', 'timestamp': 1653135398012, '...  Buffalo
2   3     None  test3  [{'type': 'CCC', 'timestamp': 1653135398013, '...   Carson
3   4      Low  test4  [{'type': 'DDD', 'timestamp': 1653135398014, '...  Detroit

Result:
   id severity   user type      timestamp agent     city
0   1      Low  test1  AAA  1653135398011  None   Athens
1   2   Medium  test2  BBB  1653135398012  STIX  Buffalo
2   3     None  test3  CCC  1653135398013   ACQ   Carson
3   4      Low  test4  DDD  1653135398014   VTC  Detroit

NOTE: It was necessary to make slight changes to the dataframe shown in the question, such as putting quotes around the 'agent' values (STIX, ACQ, VTC).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文