通过数据合并 /循环

发布于 2025-02-10 02:50:26 字数 2864 浏览 2 评论 0原文

您好需要从数据中内部获取内部，Desired_result是从VBA代码中进行的，该代码比较了两张desired_result且100％准确，如果某人可以帮助我获得所需的输出，则条件是：

err['p'] == scr['p'] & err['errd'] >= scr['scrd'] & err['errq'] - scr['scrq'] >= 0

检查所有scr scr有多少个条件['n'] genh thru try错误，但是如果一个人通过错误，则err [errq'] - = scr ['scrq']，然后跳到下一个项目scr [ 'p']，scr ['n']是唯一的，请参阅下面的示例代码：

import pandas as pd

err = pd.DataFrame({
    'p' : ['10100.A','10101.A','10101.A','10101.A','10102.A','10102.A','10102.A','10103.A','10103.A','10147.A','10147.A'],
    'errd' : ['18-5-2022','16-5-2022','4-5-2022','13-5-2022','9-5-2022','2-5-2022','29-5-2022','6-5-2022','11-5-2022','25-5-2022','6-5-2022'],
    'errq' : [1, 1, 1, 1, 1, 2, 46, 1, 4, 1, 5]})
err = err.sort_values('errd')
scr = pd.DataFrame({
    'p' : ['10101.A','10101.A','10101.A','10102.A','10102.A','10102.A','10103.A','10147.A','10147.A','10147.A','10147.A','10147.A'],
    'scrd' : ['10-5-2022','10-5-2022','9-5-2022','13-5-2022','9-5-2022','9-5-2022','25-5-2022','6-5-2022','6-5-2022','6-5-2022','6-5-2022','11-5-2022'],
    'scrq' : [1,1,1,1,1,1,1,1,1,1,1,1],
    'n' : ['7000000051481339','7000000051481342','7000000051722237','7000000052018581','7000000051721987','7000000051721990','7000000052725251','7000000051530150','7000000051530152','7000000051530157','7000000051546193','7000000051761150']})

desired_result = pd.DataFrame({
    'report' : ['7000000051722237','7000000051481339','7000000051721987','7000000051721990','7000000052018581','7000000051530150','7000000051530152','7000000051530157','7000000051546193','7000000051761150'],
    'match_err_scr' : ['10101.A','10101.A','10102.A','10102.A','10102.A','10147.A','10147.A','10147.A','10147.A','10147.A']})

到目前为止我尝试过的：

match = []
#Iterating scr rows
for i, row in scr.iterrows():
    #Checking for match row now is full row in scr 
        if row['scrq'] <= err[(err['p'] == row['p']) & (err['errd'] >= row['scrd'])]['errq'].sum():
            r = row.to_dict()
            match.append(r)
#Creating new data frame
report = pd.DataFrame(match)
report

合并左滤波器后来

report1 = scr.merge(err, how = 'left', on = 'p')
flt = (report1['errd'] >= report1['scrd']) & (report1['errq'] - report1['scrq'] >= 0)

report1 = report1.loc[flt]
report1 = report1.drop_duplicates(subset = ['n'])
report1

嵌套循环逐渐慢速，然后再次不正确

match = []

for i, row in scr.iterrows():
    for e, erow in err.iterrows():
        if (row['p'] == erow['p']) & (erow['errd'] >= row['scrd']) & (erow['errq'] - row['scrq'] >= 0):
            err['errq'][e]-= row['scrq']
            row_to_dict = row.to_dict()
            match.append(row_to_dict)
            break
report2 = pd.DataFrame(match)
report2

原文

Hello need to get insides from data, the desired_result is from VBA code that compare two sheets desired_result is checked and 100% accurate, If someone can assist me get the desired output, conditions are:

err['p'] == scr['p'] & err['errd'] >= scr['scrd'] & err['errq'] - scr['scrq'] >= 0

Its all about checking how many of scr['n'] wenth thru err but if one passes thru err then err['errq'] -= scr['scrq'] and jump to next item scr['p'], scr['n'] are unique, please see the sample code below:

import pandas as pd

err = pd.DataFrame({
    'p' : ['10100.A','10101.A','10101.A','10101.A','10102.A','10102.A','10102.A','10103.A','10103.A','10147.A','10147.A'],
    'errd' : ['18-5-2022','16-5-2022','4-5-2022','13-5-2022','9-5-2022','2-5-2022','29-5-2022','6-5-2022','11-5-2022','25-5-2022','6-5-2022'],
    'errq' : [1, 1, 1, 1, 1, 2, 46, 1, 4, 1, 5]})
err = err.sort_values('errd')
scr = pd.DataFrame({
    'p' : ['10101.A','10101.A','10101.A','10102.A','10102.A','10102.A','10103.A','10147.A','10147.A','10147.A','10147.A','10147.A'],
    'scrd' : ['10-5-2022','10-5-2022','9-5-2022','13-5-2022','9-5-2022','9-5-2022','25-5-2022','6-5-2022','6-5-2022','6-5-2022','6-5-2022','11-5-2022'],
    'scrq' : [1,1,1,1,1,1,1,1,1,1,1,1],
    'n' : ['7000000051481339','7000000051481342','7000000051722237','7000000052018581','7000000051721987','7000000051721990','7000000052725251','7000000051530150','7000000051530152','7000000051530157','7000000051546193','7000000051761150']})

desired_result = pd.DataFrame({
    'report' : ['7000000051722237','7000000051481339','7000000051721987','7000000051721990','7000000052018581','7000000051530150','7000000051530152','7000000051530157','7000000051546193','7000000051761150'],
    'match_err_scr' : ['10101.A','10101.A','10102.A','10102.A','10102.A','10147.A','10147.A','10147.A','10147.A','10147.A']})

What i have tried so far:

match = []
#Iterating scr rows
for i, row in scr.iterrows():
    #Checking for match row now is full row in scr 
        if row['scrq'] <= err[(err['p'] == row['p']) & (err['errd'] >= row['scrd'])]['errq'].sum():
            r = row.to_dict()
            match.append(r)
#Creating new data frame
report = pd.DataFrame(match)
report

Merge left filter later

report1 = scr.merge(err, how = 'left', on = 'p')
flt = (report1['errd'] >= report1['scrd']) & (report1['errq'] - report1['scrq'] >= 0)

report1 = report1.loc[flt]
report1 = report1.drop_duplicates(subset = ['n'])
report1

Nested loop way to slow and again not correct

match = []

for i, row in scr.iterrows():
    for e, erow in err.iterrows():
        if (row['p'] == erow['p']) & (erow['errd'] >= row['scrd']) & (erow['errq'] - row['scrq'] >= 0):
            err['errq'][e]-= row['scrq']
            row_to_dict = row.to_dict()
            match.append(row_to_dict)
            break
report2 = pd.DataFrame(match)
report2

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半枫 2025-02-17 02:50:26

不是答案，而是需要帮助理解问题的必要条件。
@b02t，这就是我认为的数据片。

因此，我是正确的，因为您仅将scr.loc [0]与err.loc [3]，scr.loc [1]与err.loc [1]和scr.loc [2]与err.loc [2]进行比较。 ]？还是您将SCR中的每一行与ERR中的每一行进行比较？

看Desired_result，我不明白SCR.LOC [2]如何在Desired_Result中，因为，使用Err.loc [2]，（errd [errd']＆gt; = scr ['srcd']）evals错误。并且，遵循相同的方法，SCR.LOC [1]应在Desired_result中。

>>> err[err['p'] == '10101.A']
         p       errd  errq   
3  10101.A  13-5-2022     1   
1  10101.A  16-5-2022     1   
2  10101.A   4-5-2022     1   
>>> scr[scr['p'] == '10101.A']
         p       scrd  scrq                 n
0  10101.A  10-5-2022     1  7000000051481339
1  10101.A  10-5-2022     1  7000000051481342
2  10101.A   9-5-2022     1  7000000051722237
>>> desired_result
             report match_err_scr
0  7000000051722237       10101.A
1  7000000051481339       10101.A
2  7000000051721987       10102.A
3  7000000051721990       10102.A
4  7000000052018581       10102.A
5  7000000051530150       10147.A
6  7000000051530152       10147.A
7  7000000051530157       10147.A
8  7000000051546193       10147.A
9  7000000051761150       10147.A

Not an answer, but required to help understand the question.
@B02T, this is what I am seeing as a slice of the data.

So I am correct in that you are only comparing scr.loc[0] to err.loc[3], scr.loc[1] to err.loc[1] and scr.loc[2] to err.loc[2] ? Or are you comparing each row in scr to each row in err?

Looking at the desired_result, I don't understand how scr.loc[2] could be in the desired_result since, using err.loc[2], (err['errd'] >= scr['srcd']) evals to False. And, following the same methodology, scr.loc[1] should be in desired_result.

>>> err[err['p'] == '10101.A']
         p       errd  errq   
3  10101.A  13-5-2022     1   
1  10101.A  16-5-2022     1   
2  10101.A   4-5-2022     1   
>>> scr[scr['p'] == '10101.A']
         p       scrd  scrq                 n
0  10101.A  10-5-2022     1  7000000051481339
1  10101.A  10-5-2022     1  7000000051481342
2  10101.A   9-5-2022     1  7000000051722237
>>> desired_result
             report match_err_scr
0  7000000051722237       10101.A
1  7000000051481339       10101.A
2  7000000051721987       10102.A
3  7000000051721990       10102.A
4  7000000052018581       10102.A
5  7000000051530150       10147.A
6  7000000051530152       10147.A
7  7000000051530157       10147.A
8  7000000051546193       10147.A
9  7000000051761150       10147.A

回复收藏 0 原文

~没有更多了~