根据词典列表中的值，在dataframe和dataframe和插入列中的无迭代行

发布于 2025-02-11 19:28:06 字数 6176 浏览 1 评论 0原文

我有一个数据框和词典列表。数据框有84K行。每行都是特定客户端的帐户。

列表中的每个dict都属于特定客户端。他们最多可以拥有50个钥匙，只有2个键。词典还需要按照列出的顺序应用。每个dict中的第一个键/值显示了命令所属的客户端的名称。第二个键/值是规则的名称。

dict示例列表：


0 {'client': 'client 1', 'Billing Code': 'TNL', 'Valuations': '0', 'Account Number': '>99999'}
1 {'client': 'client 1', 'Billing Code': 'MF', 'User': 'BP', 'Flag': 'S'}
...
13 {'client': 'client 2', 'Billing Code': 'TNL', 'Acct Desc': '*test*}

length: 427, dtype: object

dataFrame具有这些列名

df.columns = ['Source.Name','User Bank','Bank','Account Number','Account Description','Valuation Date',
              'Preschedule','MF Flag','Load Flag','Global Flag','Money Market Flag','Days Prior to Valuation',
              'Number of Holdings','Total Assets','Unit Value/NAV','MCS Field','From Date','Valuations',
              '# Sweeps','NASDAQ','TLA','Account Type','Fund Group','Master Account Text','Master Feeder Flag',
              'Acct Flag 2','Acct Field 4','Securities At Value','Net Assets','Acct Field 1','Acct Field 2',
              'Group Account Indicator','Group Account Number','Region','Account Status','SMS Billing Code',
             'Translation Date','Portfolio Manager 1','Acct Flag 1','Dual Flag','Securities At Value Base',
             'Net Assets Base','Total Assets Base','Dual OEIC']

输入数据框架 帧

客户	。	端	端	文件	数据
client	从	直接	客户

	＃1.TXT	AC	01	3	超级基金
客户端＃1	C＃1.TXT	AY	01	4	S＆amp; p索引
客户端＃1	C＃1.TXT	AY	01	5	测试帐户
客户clibl＃1	C＃1.TXT	AA	01	6	索引
客户端＃1	C＃1.TXT	AA	01	7	测试帐户
客户端＃1	C＃1.TXT	AA	01	8	Ryan的帐户
客户端＃2	C＃2.TXT	BA	01	1	测试帐户
客户＃2	C＃2.TXT	BB	01	33	INDEX
客户端＃2	C＃2.TXT	BB	01	92	测试帐户
客户端＃2	C＃2.TXT	BZ	01	123123	索引
客户端＃3	C＃3.TXT	BB	01	1657	测试帐户
客户端客户＃3	C＃3.TXT	BP	01	15454	TEST帐户
客户端＃4	C＃4.TXT	GH	01	100	测试帐户
客户端客户＃4	C＃4.TXT	GH	01	19875	索引
客户量＃4	C＃4.TXT	GY	01	13579	测试帐户
客户＃4	C＃4.TXT	GE	01	2 2	索引
客户量＃4	C＃4.TXT	GE	01	72	测试帐户
客户端＃4	C＃4.TXT	GP	01	96	绿色帐户

所需的输出 根据427个字典之一的标准，输出应为具有新列['计费代码']的新列['计费代码']的数据框。

客户端短名称	源。	名称用户银行	银行	帐户帐号	帐户帐户说明	计费代码
客户＃1	C＃1.TXT	AA	01	1	测试帐户	TNL
客户端＃1	C＃1.TXT	AC	01	2	我的帐户	MF
客户量＃1	C＃1。 TXT	AC	01	3	超级基金	HF
客户量＃1	C＃1.TXT	AY	01	4	S＆amp index索引	客户
端＃1	C＃1 C＃1.TXT	AY	01	5	测试帐户	TNL
客户端＃1	C＃1.TXT	AA	01	6	索引	索引
客户端＃1	C＃1.TXT	AA	01	7	测试帐户	TNL
客户端＃1	C＃1.TXT	AA	01	8	RYAN帐户	HF
客户量＃2	C＃2.TXT	BA	01	1	测试帐户	TNL
客户tnl客户端＃2	C＃2.TXT	BB	01	33	索引索引	客户
端＃2	C＃2.TXT	BB	01	92	测试帐户	TNL
客户端＃2	C＃2.TXT	BZ	01	123123	索引	索引
客户端客户＃3	C＃3.TXT	BB	01	1657	测试帐户	TNL TNL TNL
客户＃3	C客户端＃3.TXT	BP	01	15454	测试帐户	TNL
客户端＃4	C＃4.TXT	GH	01	100	测试帐户	TNL
客户端＃4	C＃4.TXT	GH	01	19875	INDEX	索引索引
客户端客户＃4	C＃4.TXT	GY	01	13579	测试帐户	TNL
客户端＃4	C＃4.TXT	GE	01	2	索引索引	客户
端＃4	C＃4.TXT	GE	01	72	测试帐户	TNL
客户端＃4	C＃4.TXT	GP	01	96	绿色帐户	MF

列名称匹配密钥。

我基本上需要迭代数据的每一行，并确定它是否符合第一个dict中的标准。如果确实如此，则DF ['计费代码'] =如果有意义，则该特定的dict ['计费代码']。如果没有，请继续进行下一个计费代码。

迭代可能需要很长时间才能贯穿所有这些，因此标题中的“ 不是迭代”。不确定这是否是列表理解可以做的。

感谢您提供的任何帮助！

原文

I have a DataFrame and a list of dictionaries. The DataFrame has 84k rows. Each row is an account for a specific client.

Each dict in the list belongs to a specific client. They can have up to 50 keys and as few as 2 keys. The dictionaries also need to be applied in the order they are listed. The first key/value in each dict shows the name of the client the dict belongs to. The second key/value is the name of the rule.

List of Dict Example:


0 {'client': 'client 1', 'Billing Code': 'TNL', 'Valuations': '0', 'Account Number': '>99999'}
1 {'client': 'client 1', 'Billing Code': 'MF', 'User': 'BP', 'Flag': 'S'}
...
13 {'client': 'client 2', 'Billing Code': 'TNL', 'Acct Desc': '*test*}

length: 427, dtype: object

DataFrame has these column names

df.columns = ['Source.Name','User Bank','Bank','Account Number','Account Description','Valuation Date',
              'Preschedule','MF Flag','Load Flag','Global Flag','Money Market Flag','Days Prior to Valuation',
              'Number of Holdings','Total Assets','Unit Value/NAV','MCS Field','From Date','Valuations',
              '# Sweeps','NASDAQ','TLA','Account Type','Fund Group','Master Account Text','Master Feeder Flag',
              'Acct Flag 2','Acct Field 4','Securities At Value','Net Assets','Acct Field 1','Acct Field 2',
              'Group Account Indicator','Group Account Number','Region','Account Status','SMS Billing Code',
             'Translation Date','Portfolio Manager 1','Acct Flag 1','Dual Flag','Securities At Value Base',
             'Net Assets Base','Total Assets Base','Dual OEIC']

Input DataFrame
Dataframe containing data directly from client files

Client Short Name	Source.Name	User Bank	Bank	Account Number	Account Description
Client #1	C#1.txt	AA	01	1	Test Account
Client #1	C#1.txt	AC	01	2	MY ACCOUNT
Client #1	C#1.txt	AC	01	3	SUPER FUND
Client #1	C#1.txt	AY	01	4	S&P INDEX
Client #1	C#1.txt	AY	01	5	Test Account
Client #1	C#1.txt	AA	01	6	INDEX
Client #1	C#1.txt	AA	01	7	Test Account
Client #1	C#1.txt	AA	01	8	RYAN'S Account
Client #2	C#2.txt	BA	01	1	Test Account
Client #2	C#2.txt	BB	01	33	INDEX
Client #2	C#2.txt	BB	01	92	Test Account
Client #2	C#2.txt	BZ	01	123123	INDEX
Client #3	C#3.txt	BB	01	1657	Test Account
Client #3	C#3.txt	BP	01	15454	Test Account
Client #4	C#4.txt	GH	01	100	Test Account
Client #4	C#4.txt	GH	01	19875	INDEX
Client #4	C#4.txt	GY	01	13579	Test Account
Client #4	C#4.txt	GE	01	2	INDEX
Client #4	C#4.txt	GE	01	72	Test Account
Client #4	C#4.txt	GP	01	96	GREEN Account

Desired Output
Output should be the dataframe with a new column ['Billing Code'] based on the criteria from one of the 427 dictionaries.

Client Short Name	Source.Name	User Bank	Bank	Account Number	Account Description	Billing Code
Client #1	C#1.txt	AA	01	1	Test Account	TNL
Client #1	C#1.txt	AC	01	2	MY ACCOUNT	MF
Client #1	C#1.txt	AC	01	3	SUPER FUND	HF
Client #1	C#1.txt	AY	01	4	S&P INDEX	Index
Client #1	C#1.txt	AY	01	5	Test Account	TNL
Client #1	C#1.txt	AA	01	6	INDEX	Index
Client #1	C#1.txt	AA	01	7	Test Account	TNL
Client #1	C#1.txt	AA	01	8	RYAN'S Account	HF
Client #2	C#2.txt	BA	01	1	Test Account	TNL
Client #2	C#2.txt	BB	01	33	INDEX	Index
Client #2	C#2.txt	BB	01	92	Test Account	TNL
Client #2	C#2.txt	BZ	01	123123	INDEX	Index
Client #3	C#3.txt	BB	01	1657	Test Account	TNL
Client #3	C#3.txt	BP	01	15454	Test Account	TNL
Client #4	C#4.txt	GH	01	100	Test Account	TNL
Client #4	C#4.txt	GH	01	19875	INDEX	Index
Client #4	C#4.txt	GY	01	13579	Test Account	TNL
Client #4	C#4.txt	GE	01	2	INDEX	Index
Client #4	C#4.txt	GE	01	72	Test Account	TNL
Client #4	C#4.txt	GP	01	96	GREEN Account	MF

Column names match keys.

I basically need to iterate through each row of the data and determine if it meets the criteria in the first dict. If it does then df['Billing Code'] = that specific dict['Billing Code'] if that makes sense. If not then move on to the next billing code.

Iteration could take a very long time to run through all of this hence the "Not Iterate" in the title. Not sure if this is something list comprehension can do.

Thank you for any help anyone can provide!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

阳光的暖冬 2025-02-18 19:28:06

编辑：

根据您的评论，我首先创建一个映射clientId-＆gt;字典列表：

lst = [
    {
        "client": "Client #1",
        "Billing Code": "TNL",
        "Bank": 1,
        "Account Number": 1,
    },
    {
        "client": "Client #1",
        "Billing Code": "MF",
        "User Bank": "AY",
        "Bank": 1,
    },
    {
        "Billing Code": "TNL",
        "client": "Client #2",
        "User Bank": "BB",
    },
]

# create a mapping client no. -> list of dictionaries
m = {}
for d in lst:
    m.setdefault(d["client"], []).append(d)
    d.pop("client")

然后，我将使用df.groupby by Client ID并应用自定义功能：

def fn(x):
    dictionaries = m.get(x.name, [])

    out = []
    for _, row in x.iterrows():
        for d in dictionaries:
            if all(row[k] == v for k, v in d.items() if k != "Billing Code"):
                out.append(d["Billing Code"])
                break
        else:
            out.append("Unclassified")

    x["Billng Code"] = out
    return x


df = df.groupby("Client Short Name").apply(fn)
print(df)

结果是：

   Client Short Name Source.Name User Bank  Bank  Account Number Account Description   Billng Code
0          Client #1     C#1.txt        AA     1               1        Test Account           TNL
1          Client #1     C#1.txt        AC     1               2          MY ACCOUNT  Unclassified
2          Client #1     C#1.txt        AC     1               3          SUPER FUND  Unclassified
3          Client #1     C#1.txt        AY     1               4           S&P INDEX            MF
4          Client #1     C#1.txt        AY     1               5        Test Account            MF
5          Client #1     C#1.txt        AA     1               6               INDEX  Unclassified
6          Client #1     C#1.txt        AA     1               7        Test Account  Unclassified
7          Client #1     C#1.txt        AA     1               8      RYAN'S Account  Unclassified
8          Client #2     C#2.txt        BA     1               1        Test Account  Unclassified
9          Client #2     C#2.txt        BB     1              33               INDEX           TNL
10         Client #2     C#2.txt        BB     1              92        Test Account           TNL
11         Client #2     C#2.txt        BZ     1          123123               INDEX  Unclassified
12         Client #3     C#3.txt        BB     1            1657        Test Account  Unclassified
13         Client #3     C#3.txt        BP     1           15454        Test Account  Unclassified
14         Client #4     C#4.txt        GH     1             100        Test Account  Unclassified
15         Client #4     C#4.txt        GH     1           19875               INDEX  Unclassified
16         Client #4     C#4.txt        GY     1           13579        Test Account  Unclassified
17         Client #4     C#4.txt        GE     1               2               INDEX  Unclassified
18         Client #4     C#4.txt        GE     1              72        Test Account  Unclassified
19         Client #4     C#4.txt        GP     1              96       GREEN Account  Unclassified

EDIT:

Based on your comments, I'd first create a mapping ClientID -> List of Dictionaries:

lst = [
    {
        "client": "Client #1",
        "Billing Code": "TNL",
        "Bank": 1,
        "Account Number": 1,
    },
    {
        "client": "Client #1",
        "Billing Code": "MF",
        "User Bank": "AY",
        "Bank": 1,
    },
    {
        "Billing Code": "TNL",
        "client": "Client #2",
        "User Bank": "BB",
    },
]

# create a mapping client no. -> list of dictionaries
m = {}
for d in lst:
    m.setdefault(d["client"], []).append(d)
    d.pop("client")

Then I'd use df.groupby by Client ID and apply custom function:

def fn(x):
    dictionaries = m.get(x.name, [])

    out = []
    for _, row in x.iterrows():
        for d in dictionaries:
            if all(row[k] == v for k, v in d.items() if k != "Billing Code"):
                out.append(d["Billing Code"])
                break
        else:
            out.append("Unclassified")

    x["Billng Code"] = out
    return x


df = df.groupby("Client Short Name").apply(fn)
print(df)

The result is:

   Client Short Name Source.Name User Bank  Bank  Account Number Account Description   Billng Code
0          Client #1     C#1.txt        AA     1               1        Test Account           TNL
1          Client #1     C#1.txt        AC     1               2          MY ACCOUNT  Unclassified
2          Client #1     C#1.txt        AC     1               3          SUPER FUND  Unclassified
3          Client #1     C#1.txt        AY     1               4           S&P INDEX            MF
4          Client #1     C#1.txt        AY     1               5        Test Account            MF
5          Client #1     C#1.txt        AA     1               6               INDEX  Unclassified
6          Client #1     C#1.txt        AA     1               7        Test Account  Unclassified
7          Client #1     C#1.txt        AA     1               8      RYAN'S Account  Unclassified
8          Client #2     C#2.txt        BA     1               1        Test Account  Unclassified
9          Client #2     C#2.txt        BB     1              33               INDEX           TNL
10         Client #2     C#2.txt        BB     1              92        Test Account           TNL
11         Client #2     C#2.txt        BZ     1          123123               INDEX  Unclassified
12         Client #3     C#3.txt        BB     1            1657        Test Account  Unclassified
13         Client #3     C#3.txt        BP     1           15454        Test Account  Unclassified
14         Client #4     C#4.txt        GH     1             100        Test Account  Unclassified
15         Client #4     C#4.txt        GH     1           19875               INDEX  Unclassified
16         Client #4     C#4.txt        GY     1           13579        Test Account  Unclassified
17         Client #4     C#4.txt        GE     1               2               INDEX  Unclassified
18         Client #4     C#4.txt        GE     1              72        Test Account  Unclassified
19         Client #4     C#4.txt        GP     1              96       GREEN Account  Unclassified

回复收藏 0 原文

~没有更多了~