*根据词典列表中的值,在dataframe和dataframe和插入列中的无迭代*行
我有一个数据框和词典列表。数据框有84K行。每行都是特定客户端的帐户。
列表中的每个dict都属于特定客户端。他们最多可以拥有50个钥匙,只有2个键。词典还需要按照列出的顺序应用。每个dict中的第一个键/值显示了命令所属的客户端的名称。第二个键/值是规则的名称。
dict示例列表:
0 {'client': 'client 1', 'Billing Code': 'TNL', 'Valuations': '0', 'Account Number': '>99999'}
1 {'client': 'client 1', 'Billing Code': 'MF', 'User': 'BP', 'Flag': 'S'}
...
13 {'client': 'client 2', 'Billing Code': 'TNL', 'Acct Desc': '*test*}
length: 427, dtype: object
dataFrame具有这些列名
df.columns = ['Source.Name','User Bank','Bank','Account Number','Account Description','Valuation Date',
'Preschedule','MF Flag','Load Flag','Global Flag','Money Market Flag','Days Prior to Valuation',
'Number of Holdings','Total Assets','Unit Value/NAV','MCS Field','From Date','Valuations',
'# Sweeps','NASDAQ','TLA','Account Type','Fund Group','Master Account Text','Master Feeder Flag',
'Acct Flag 2','Acct Field 4','Securities At Value','Net Assets','Acct Field 1','Acct Field 2',
'Group Account Indicator','Group Account Number','Region','Account Status','SMS Billing Code',
'Translation Date','Portfolio Manager 1','Acct Flag 1','Dual Flag','Securities At Value Base',
'Net Assets Base','Total Assets Base','Dual OEIC']
输入数据框架 帧
客户 | 。 | 端 | 端 | 文件 | 数据 |
---|---|---|---|---|---|
client | 从 | 直接 | 客户 | | |
| | | | | |
| #1.TXT | AC | 01 | 3 | 超级基金 |
客户端#1 | C#1.TXT | AY | 01 | 4 | S& p索引 |
客户端#1 | C#1.TXT | AY | 01 | 5 | 测试帐户 |
客户clibl#1 | C#1.TXT | AA | 01 | 6 | 索引 |
客户端#1 | C#1.TXT | AA | 01 | 7 | 测试帐户 |
客户端#1 | C#1.TXT | AA | 01 | 8 | Ryan的帐户 |
客户端#2 | C#2.TXT | BA | 01 | 1 | 测试帐户 |
客户#2 | C#2.TXT | BB | 01 | 33 | INDEX |
客户端#2 | C#2.TXT | BB | 01 | 92 | 测试帐户 |
客户端#2 | C#2.TXT | BZ | 01 | 123123 | 索引 |
客户端#3 | C#3.TXT | BB | 01 | 1657 | 测试帐户 |
客户端客户#3 | C#3.TXT | BP | 01 | 15454 | TEST帐户 |
客户端#4 | C#4.TXT | GH | 01 | 100 | 测试帐户 |
客户端客户#4 | C#4.TXT | GH | 01 | 19875 | 索引 |
客户量#4 | C#4.TXT | GY | 01 | 13579 | 测试帐户 |
客户#4 | C#4.TXT | GE | 01 | 2 2 | 索引 |
客户量#4 | C#4.TXT | GE | 01 | 72 | 测试帐户 |
客户端#4 | C#4.TXT | GP | 01 | 96 | 绿色帐户 |
所需的输出 根据427个字典之一的标准,输出应为具有新列['计费代码']的新列['计费代码']的数据框。
客户端短名称 | 源。 | 名称用户银行 | 银行 | 帐户帐号 | 帐户帐户说明 | 计费代码 |
---|---|---|---|---|---|---|
客户#1 | C#1.TXT | AA | 01 | 1 | 测试帐户 | TNL |
客户端#1 | C#1.TXT | AC | 01 | 2 | 我的帐户 | MF |
客户量#1 | C#1。 TXT | AC | 01 | 3 | 超级基金 | HF |
客户量#1 | C#1.TXT | AY | 01 | 4 | S&amp index索引 | 客户 |
端#1 | C#1 C#1.TXT | AY | 01 | 5 | 测试帐户 | TNL |
客户端#1 | C#1.TXT | AA | 01 | 6 | 索引 | 索引 |
客户端#1 | C#1.TXT | AA | 01 | 7 | 测试帐户 | TNL |
客户端#1 | C#1.TXT | AA | 01 | 8 | RYAN帐户 | HF |
客户量#2 | C#2.TXT | BA | 01 | 1 | 测试帐户 | TNL |
客户tnl客户端#2 | C#2.TXT | BB | 01 | 33 | 索引索引 | 客户 |
端#2 | C#2.TXT | BB | 01 | 92 | 测试帐户 | TNL |
客户端#2 | C#2.TXT | BZ | 01 | 123123 | 索引 | 索引 |
客户端客户#3 | C#3.TXT | BB | 01 | 1657 | 测试帐户 | TNL TNL TNL |
客户#3 | C客户端#3.TXT | BP | 01 | 15454 | 测试帐户 | TNL |
客户端#4 | C#4.TXT | GH | 01 | 100 | 测试帐户 | TNL |
客户端#4 | C#4.TXT | GH | 01 | 19875 | INDEX | 索引索引 |
客户端客户#4 | C#4.TXT | GY | 01 | 13579 | 测试帐户 | TNL |
客户端#4 | C#4.TXT | GE | 01 | 2 | 索引索引 | 客户 |
端#4 | C#4.TXT | GE | 01 | 72 | 测试帐户 | TNL |
客户端#4 | C#4.TXT | GP | 01 | 96 | 绿色帐户 | MF |
列名称匹配密钥。
我基本上需要迭代数据的每一行,并确定它是否符合第一个dict中的标准。如果确实如此,则DF ['计费代码'] =如果有意义,则该特定的dict ['计费代码']。如果没有,请继续进行下一个计费代码。
迭代可能需要很长时间才能贯穿所有这些,因此标题中的“ 不是迭代”。不确定这是否是列表理解可以做的。
感谢您提供的任何帮助!
I have a DataFrame and a list of dictionaries. The DataFrame has 84k rows. Each row is an account for a specific client.
Each dict in the list belongs to a specific client. They can have up to 50 keys and as few as 2 keys. The dictionaries also need to be applied in the order they are listed. The first key/value in each dict shows the name of the client the dict belongs to. The second key/value is the name of the rule.
List of Dict Example:
0 {'client': 'client 1', 'Billing Code': 'TNL', 'Valuations': '0', 'Account Number': '>99999'}
1 {'client': 'client 1', 'Billing Code': 'MF', 'User': 'BP', 'Flag': 'S'}
...
13 {'client': 'client 2', 'Billing Code': 'TNL', 'Acct Desc': '*test*}
length: 427, dtype: object
DataFrame has these column names
df.columns = ['Source.Name','User Bank','Bank','Account Number','Account Description','Valuation Date',
'Preschedule','MF Flag','Load Flag','Global Flag','Money Market Flag','Days Prior to Valuation',
'Number of Holdings','Total Assets','Unit Value/NAV','MCS Field','From Date','Valuations',
'# Sweeps','NASDAQ','TLA','Account Type','Fund Group','Master Account Text','Master Feeder Flag',
'Acct Flag 2','Acct Field 4','Securities At Value','Net Assets','Acct Field 1','Acct Field 2',
'Group Account Indicator','Group Account Number','Region','Account Status','SMS Billing Code',
'Translation Date','Portfolio Manager 1','Acct Flag 1','Dual Flag','Securities At Value Base',
'Net Assets Base','Total Assets Base','Dual OEIC']
Input DataFrame
Dataframe containing data directly from client files
Client Short Name | Source.Name | User Bank | Bank | Account Number | Account Description |
---|---|---|---|---|---|
Client #1 | C#1.txt | AA | 01 | 1 | Test Account |
Client #1 | C#1.txt | AC | 01 | 2 | MY ACCOUNT |
Client #1 | C#1.txt | AC | 01 | 3 | SUPER FUND |
Client #1 | C#1.txt | AY | 01 | 4 | S&P INDEX |
Client #1 | C#1.txt | AY | 01 | 5 | Test Account |
Client #1 | C#1.txt | AA | 01 | 6 | INDEX |
Client #1 | C#1.txt | AA | 01 | 7 | Test Account |
Client #1 | C#1.txt | AA | 01 | 8 | RYAN'S Account |
Client #2 | C#2.txt | BA | 01 | 1 | Test Account |
Client #2 | C#2.txt | BB | 01 | 33 | INDEX |
Client #2 | C#2.txt | BB | 01 | 92 | Test Account |
Client #2 | C#2.txt | BZ | 01 | 123123 | INDEX |
Client #3 | C#3.txt | BB | 01 | 1657 | Test Account |
Client #3 | C#3.txt | BP | 01 | 15454 | Test Account |
Client #4 | C#4.txt | GH | 01 | 100 | Test Account |
Client #4 | C#4.txt | GH | 01 | 19875 | INDEX |
Client #4 | C#4.txt | GY | 01 | 13579 | Test Account |
Client #4 | C#4.txt | GE | 01 | 2 | INDEX |
Client #4 | C#4.txt | GE | 01 | 72 | Test Account |
Client #4 | C#4.txt | GP | 01 | 96 | GREEN Account |
Desired Output
Output should be the dataframe with a new column ['Billing Code'] based on the criteria from one of the 427 dictionaries.
Client Short Name | Source.Name | User Bank | Bank | Account Number | Account Description | Billing Code |
---|---|---|---|---|---|---|
Client #1 | C#1.txt | AA | 01 | 1 | Test Account | TNL |
Client #1 | C#1.txt | AC | 01 | 2 | MY ACCOUNT | MF |
Client #1 | C#1.txt | AC | 01 | 3 | SUPER FUND | HF |
Client #1 | C#1.txt | AY | 01 | 4 | S&P INDEX | Index |
Client #1 | C#1.txt | AY | 01 | 5 | Test Account | TNL |
Client #1 | C#1.txt | AA | 01 | 6 | INDEX | Index |
Client #1 | C#1.txt | AA | 01 | 7 | Test Account | TNL |
Client #1 | C#1.txt | AA | 01 | 8 | RYAN'S Account | HF |
Client #2 | C#2.txt | BA | 01 | 1 | Test Account | TNL |
Client #2 | C#2.txt | BB | 01 | 33 | INDEX | Index |
Client #2 | C#2.txt | BB | 01 | 92 | Test Account | TNL |
Client #2 | C#2.txt | BZ | 01 | 123123 | INDEX | Index |
Client #3 | C#3.txt | BB | 01 | 1657 | Test Account | TNL |
Client #3 | C#3.txt | BP | 01 | 15454 | Test Account | TNL |
Client #4 | C#4.txt | GH | 01 | 100 | Test Account | TNL |
Client #4 | C#4.txt | GH | 01 | 19875 | INDEX | Index |
Client #4 | C#4.txt | GY | 01 | 13579 | Test Account | TNL |
Client #4 | C#4.txt | GE | 01 | 2 | INDEX | Index |
Client #4 | C#4.txt | GE | 01 | 72 | Test Account | TNL |
Client #4 | C#4.txt | GP | 01 | 96 | GREEN Account | MF |
Column names match keys.
I basically need to iterate through each row of the data and determine if it meets the criteria in the first dict. If it does then df['Billing Code'] = that specific dict['Billing Code'] if that makes sense. If not then move on to the next billing code.
Iteration could take a very long time to run through all of this hence the "Not Iterate" in the title. Not sure if this is something list comprehension can do.
Thank you for any help anyone can provide!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
编辑:
根据您的评论,我首先创建一个映射
clientId->字典列表
:然后,我将使用
df.groupby
by Client ID并应用自定义功能:结果是:
EDIT:
Based on your comments, I'd first create a mapping
ClientID -> List of Dictionaries
:Then I'd use
df.groupby
by Client ID and apply custom function:The result is: