*根据词典列表中的值,在dataframe和dataframe和插入列中的无迭代*行

发布于 2025-02-11 19:28:06 字数 6176 浏览 1 评论 0原文

我有一个数据框和词典列表。数据框有84K行。每行都是特定客户端的帐户。

列表中的每个dict都属于特定客户端。他们最多可以拥有50个钥匙,只有2个键。词典还需要按照列出的顺序应用。每个dict中的第一个键/值显示了命令所属的客户端的名称。第二个键/值是规则的名称。

dict示例列表:


0 {'client': 'client 1', 'Billing Code': 'TNL', 'Valuations': '0', 'Account Number': '>99999'}
1 {'client': 'client 1', 'Billing Code': 'MF', 'User': 'BP', 'Flag': 'S'}
...
13 {'client': 'client 2', 'Billing Code': 'TNL', 'Acct Desc': '*test*}

length: 427, dtype: object

dataFrame具有这些列名

df.columns = ['Source.Name','User Bank','Bank','Account Number','Account Description','Valuation Date',
              'Preschedule','MF Flag','Load Flag','Global Flag','Money Market Flag','Days Prior to Valuation',
              'Number of Holdings','Total Assets','Unit Value/NAV','MCS Field','From Date','Valuations',
              '# Sweeps','NASDAQ','TLA','Account Type','Fund Group','Master Account Text','Master Feeder Flag',
              'Acct Flag 2','Acct Field 4','Securities At Value','Net Assets','Acct Field 1','Acct Field 2',
              'Group Account Indicator','Group Account Number','Region','Account Status','SMS Billing Code',
             'Translation Date','Portfolio Manager 1','Acct Flag 1','Dual Flag','Securities At Value Base',
             'Net Assets Base','Total Assets Base','Dual OEIC']

输入数据框架

客户文件数据
client直接客户
​#1.TXTAC013超级基金
客户端#1C#1.TXTAY014S& p索引
客户端#1C#1.TXTAY015测试帐户
客户clibl#1C#1.TXTAA016索引
客户端#1C#1.TXTAA017测试帐户
客户端#1C#1.TXTAA018Ryan的帐户
客户端#2C#2.TXTBA011测试帐户
客户#2C#2.TXTBB0133INDEX
客户端#2C#2.TXTBB0192测试帐户
客户端#2C#2.TXTBZ01123123索引
客户端#3C#3.TXTBB011657测试帐户
客户端客户#3C#3.TXTBP0115454TEST帐户
客户端#4C#4.TXTGH01100测试帐户
客户端客户#4C#4.TXTGH0119875索引
客户量#4C#4.TXTGY0113579测试帐户
客户#4C#4.TXTGE012 2索引
客户量#4C#4.TXTGE0172测试帐户
客户端#4C#4.TXTGP0196绿色帐户

所需的输出 根据427个字典之一的标准,输出应为具有新列['计费代码']的新列['计费代码']的数据框。

客户端短名称源。名称用户银行银行帐户帐号帐户帐户说明计费代码
客户#1C#1.TXTAA011测试帐户TNL
客户端#1C#1.TXTAC012我的帐户MF
客户量#1C#1。 TXTAC013超级基金HF
客户量#1C#1.TXTAY014S&amp index索引客户
端#1C#1 C#1.TXTAY015测试帐户TNL
客户端#1C#1.TXTAA016索引索引
客户端#1C#1.TXTAA017测试帐户TNL
客户端#1C#1.TXTAA018RYAN帐户HF
客户量#2C#2.TXTBA011测试帐户TNL
客户tnl客户端#2C#2.TXTBB0133索引索引客户
端#2C#2.TXTBB0192测试帐户TNL
客户端#2C#2.TXTBZ01123123索引索引
客户端客户#3C#3.TXTBB011657测试帐户TNL TNL TNL
客户#3C客户端#3.TXTBP0115454测试帐户TNL
客户端#4C#4.TXTGH01100测试帐户TNL
客户端#4C#4.TXTGH0119875INDEX索引索引
客户端客户#4C#4.TXTGY0113579测试帐户TNL
客户端#4C#4.TXTGE012索引索引客户
端#4C#4.TXTGE0172测试帐户TNL
客户端#4C#4.TXTGP0196绿色帐户MF

列名称匹配密钥。

我基本上需要迭代数据的每一行,并确定它是否符合第一个dict中的标准。如果确实如此,则DF ['计费代码'] =如果有意义,则该特定的dict ['计费代码']。如果没有,请继续进行下一个计费代码。

迭代可能需要很长时间才能贯穿所有这些,因此标题中的“ 不是迭代”。不确定这是否是列表理解可以做的。

感谢您提供的任何帮助!

I have a DataFrame and a list of dictionaries. The DataFrame has 84k rows. Each row is an account for a specific client.

Each dict in the list belongs to a specific client. They can have up to 50 keys and as few as 2 keys. The dictionaries also need to be applied in the order they are listed. The first key/value in each dict shows the name of the client the dict belongs to. The second key/value is the name of the rule.

List of Dict Example:


0 {'client': 'client 1', 'Billing Code': 'TNL', 'Valuations': '0', 'Account Number': '>99999'}
1 {'client': 'client 1', 'Billing Code': 'MF', 'User': 'BP', 'Flag': 'S'}
...
13 {'client': 'client 2', 'Billing Code': 'TNL', 'Acct Desc': '*test*}

length: 427, dtype: object

DataFrame has these column names

df.columns = ['Source.Name','User Bank','Bank','Account Number','Account Description','Valuation Date',
              'Preschedule','MF Flag','Load Flag','Global Flag','Money Market Flag','Days Prior to Valuation',
              'Number of Holdings','Total Assets','Unit Value/NAV','MCS Field','From Date','Valuations',
              '# Sweeps','NASDAQ','TLA','Account Type','Fund Group','Master Account Text','Master Feeder Flag',
              'Acct Flag 2','Acct Field 4','Securities At Value','Net Assets','Acct Field 1','Acct Field 2',
              'Group Account Indicator','Group Account Number','Region','Account Status','SMS Billing Code',
             'Translation Date','Portfolio Manager 1','Acct Flag 1','Dual Flag','Securities At Value Base',
             'Net Assets Base','Total Assets Base','Dual OEIC']

Input DataFrame
Dataframe containing data directly from client files

Client Short NameSource.NameUser BankBankAccount NumberAccount Description
Client #1C#1.txtAA011Test Account
Client #1C#1.txtAC012MY ACCOUNT
Client #1C#1.txtAC013SUPER FUND
Client #1C#1.txtAY014S&P INDEX
Client #1C#1.txtAY015Test Account
Client #1C#1.txtAA016INDEX
Client #1C#1.txtAA017Test Account
Client #1C#1.txtAA018RYAN'S Account
Client #2C#2.txtBA011Test Account
Client #2C#2.txtBB0133INDEX
Client #2C#2.txtBB0192Test Account
Client #2C#2.txtBZ01123123INDEX
Client #3C#3.txtBB011657Test Account
Client #3C#3.txtBP0115454Test Account
Client #4C#4.txtGH01100Test Account
Client #4C#4.txtGH0119875INDEX
Client #4C#4.txtGY0113579Test Account
Client #4C#4.txtGE012INDEX
Client #4C#4.txtGE0172Test Account
Client #4C#4.txtGP0196GREEN Account

Desired Output
Output should be the dataframe with a new column ['Billing Code'] based on the criteria from one of the 427 dictionaries.

Client Short NameSource.NameUser BankBankAccount NumberAccount DescriptionBilling Code
Client #1C#1.txtAA011Test AccountTNL
Client #1C#1.txtAC012MY ACCOUNTMF
Client #1C#1.txtAC013SUPER FUNDHF
Client #1C#1.txtAY014S&P INDEXIndex
Client #1C#1.txtAY015Test AccountTNL
Client #1C#1.txtAA016INDEXIndex
Client #1C#1.txtAA017Test AccountTNL
Client #1C#1.txtAA018RYAN'S AccountHF
Client #2C#2.txtBA011Test AccountTNL
Client #2C#2.txtBB0133INDEXIndex
Client #2C#2.txtBB0192Test AccountTNL
Client #2C#2.txtBZ01123123INDEXIndex
Client #3C#3.txtBB011657Test AccountTNL
Client #3C#3.txtBP0115454Test AccountTNL
Client #4C#4.txtGH01100Test AccountTNL
Client #4C#4.txtGH0119875INDEXIndex
Client #4C#4.txtGY0113579Test AccountTNL
Client #4C#4.txtGE012INDEXIndex
Client #4C#4.txtGE0172Test AccountTNL
Client #4C#4.txtGP0196GREEN AccountMF

Column names match keys.

I basically need to iterate through each row of the data and determine if it meets the criteria in the first dict. If it does then df['Billing Code'] = that specific dict['Billing Code'] if that makes sense. If not then move on to the next billing code.

Iteration could take a very long time to run through all of this hence the "Not Iterate" in the title. Not sure if this is something list comprehension can do.

Thank you for any help anyone can provide!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

阳光的暖冬 2025-02-18 19:28:06

编辑:

根据您的评论,我首先创建一个映射clientId->字典列表

lst = [
    {
        "client": "Client #1",
        "Billing Code": "TNL",
        "Bank": 1,
        "Account Number": 1,
    },
    {
        "client": "Client #1",
        "Billing Code": "MF",
        "User Bank": "AY",
        "Bank": 1,
    },
    {
        "Billing Code": "TNL",
        "client": "Client #2",
        "User Bank": "BB",
    },
]

# create a mapping client no. -> list of dictionaries
m = {}
for d in lst:
    m.setdefault(d["client"], []).append(d)
    d.pop("client")

然后,我将使用df.groupby by Client ID并应用自定义功能:

def fn(x):
    dictionaries = m.get(x.name, [])

    out = []
    for _, row in x.iterrows():
        for d in dictionaries:
            if all(row[k] == v for k, v in d.items() if k != "Billing Code"):
                out.append(d["Billing Code"])
                break
        else:
            out.append("Unclassified")

    x["Billng Code"] = out
    return x


df = df.groupby("Client Short Name").apply(fn)
print(df)

结果是:

   Client Short Name Source.Name User Bank  Bank  Account Number Account Description   Billng Code
0          Client #1     C#1.txt        AA     1               1        Test Account           TNL
1          Client #1     C#1.txt        AC     1               2          MY ACCOUNT  Unclassified
2          Client #1     C#1.txt        AC     1               3          SUPER FUND  Unclassified
3          Client #1     C#1.txt        AY     1               4           S&P INDEX            MF
4          Client #1     C#1.txt        AY     1               5        Test Account            MF
5          Client #1     C#1.txt        AA     1               6               INDEX  Unclassified
6          Client #1     C#1.txt        AA     1               7        Test Account  Unclassified
7          Client #1     C#1.txt        AA     1               8      RYAN'S Account  Unclassified
8          Client #2     C#2.txt        BA     1               1        Test Account  Unclassified
9          Client #2     C#2.txt        BB     1              33               INDEX           TNL
10         Client #2     C#2.txt        BB     1              92        Test Account           TNL
11         Client #2     C#2.txt        BZ     1          123123               INDEX  Unclassified
12         Client #3     C#3.txt        BB     1            1657        Test Account  Unclassified
13         Client #3     C#3.txt        BP     1           15454        Test Account  Unclassified
14         Client #4     C#4.txt        GH     1             100        Test Account  Unclassified
15         Client #4     C#4.txt        GH     1           19875               INDEX  Unclassified
16         Client #4     C#4.txt        GY     1           13579        Test Account  Unclassified
17         Client #4     C#4.txt        GE     1               2               INDEX  Unclassified
18         Client #4     C#4.txt        GE     1              72        Test Account  Unclassified
19         Client #4     C#4.txt        GP     1              96       GREEN Account  Unclassified

EDIT:

Based on your comments, I'd first create a mapping ClientID -> List of Dictionaries:

lst = [
    {
        "client": "Client #1",
        "Billing Code": "TNL",
        "Bank": 1,
        "Account Number": 1,
    },
    {
        "client": "Client #1",
        "Billing Code": "MF",
        "User Bank": "AY",
        "Bank": 1,
    },
    {
        "Billing Code": "TNL",
        "client": "Client #2",
        "User Bank": "BB",
    },
]

# create a mapping client no. -> list of dictionaries
m = {}
for d in lst:
    m.setdefault(d["client"], []).append(d)
    d.pop("client")

Then I'd use df.groupby by Client ID and apply custom function:

def fn(x):
    dictionaries = m.get(x.name, [])

    out = []
    for _, row in x.iterrows():
        for d in dictionaries:
            if all(row[k] == v for k, v in d.items() if k != "Billing Code"):
                out.append(d["Billing Code"])
                break
        else:
            out.append("Unclassified")

    x["Billng Code"] = out
    return x


df = df.groupby("Client Short Name").apply(fn)
print(df)

The result is:

   Client Short Name Source.Name User Bank  Bank  Account Number Account Description   Billng Code
0          Client #1     C#1.txt        AA     1               1        Test Account           TNL
1          Client #1     C#1.txt        AC     1               2          MY ACCOUNT  Unclassified
2          Client #1     C#1.txt        AC     1               3          SUPER FUND  Unclassified
3          Client #1     C#1.txt        AY     1               4           S&P INDEX            MF
4          Client #1     C#1.txt        AY     1               5        Test Account            MF
5          Client #1     C#1.txt        AA     1               6               INDEX  Unclassified
6          Client #1     C#1.txt        AA     1               7        Test Account  Unclassified
7          Client #1     C#1.txt        AA     1               8      RYAN'S Account  Unclassified
8          Client #2     C#2.txt        BA     1               1        Test Account  Unclassified
9          Client #2     C#2.txt        BB     1              33               INDEX           TNL
10         Client #2     C#2.txt        BB     1              92        Test Account           TNL
11         Client #2     C#2.txt        BZ     1          123123               INDEX  Unclassified
12         Client #3     C#3.txt        BB     1            1657        Test Account  Unclassified
13         Client #3     C#3.txt        BP     1           15454        Test Account  Unclassified
14         Client #4     C#4.txt        GH     1             100        Test Account  Unclassified
15         Client #4     C#4.txt        GH     1           19875               INDEX  Unclassified
16         Client #4     C#4.txt        GY     1           13579        Test Account  Unclassified
17         Client #4     C#4.txt        GE     1               2               INDEX  Unclassified
18         Client #4     C#4.txt        GE     1              72        Test Account  Unclassified
19         Client #4     C#4.txt        GP     1              96       GREEN Account  Unclassified
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文