通过数据合并 /循环
您好需要从数据中内部获取内部,Desired_result是从VBA代码中进行的,该代码比较了两张desired_result且100%准确,如果某人可以帮助我获得所需的输出,则条件是:
err['p'] == scr['p'] & err['errd'] >= scr['scrd'] & err['errq'] - scr['scrq'] >= 0
检查所有scr scr有多少个条件['n']
genh thru try错误,但是如果一个人通过错误,则err [errq'] - = scr ['scrq']
,然后跳到下一个项目scr [ 'p']
,scr ['n']
是唯一的,请参阅下面的示例代码:
import pandas as pd
err = pd.DataFrame({
'p' : ['10100.A','10101.A','10101.A','10101.A','10102.A','10102.A','10102.A','10103.A','10103.A','10147.A','10147.A'],
'errd' : ['18-5-2022','16-5-2022','4-5-2022','13-5-2022','9-5-2022','2-5-2022','29-5-2022','6-5-2022','11-5-2022','25-5-2022','6-5-2022'],
'errq' : [1, 1, 1, 1, 1, 2, 46, 1, 4, 1, 5]})
err = err.sort_values('errd')
scr = pd.DataFrame({
'p' : ['10101.A','10101.A','10101.A','10102.A','10102.A','10102.A','10103.A','10147.A','10147.A','10147.A','10147.A','10147.A'],
'scrd' : ['10-5-2022','10-5-2022','9-5-2022','13-5-2022','9-5-2022','9-5-2022','25-5-2022','6-5-2022','6-5-2022','6-5-2022','6-5-2022','11-5-2022'],
'scrq' : [1,1,1,1,1,1,1,1,1,1,1,1],
'n' : ['7000000051481339','7000000051481342','7000000051722237','7000000052018581','7000000051721987','7000000051721990','7000000052725251','7000000051530150','7000000051530152','7000000051530157','7000000051546193','7000000051761150']})
desired_result = pd.DataFrame({
'report' : ['7000000051722237','7000000051481339','7000000051721987','7000000051721990','7000000052018581','7000000051530150','7000000051530152','7000000051530157','7000000051546193','7000000051761150'],
'match_err_scr' : ['10101.A','10101.A','10102.A','10102.A','10102.A','10147.A','10147.A','10147.A','10147.A','10147.A']})
到目前为止我尝试过的:
match = []
#Iterating scr rows
for i, row in scr.iterrows():
#Checking for match row now is full row in scr
if row['scrq'] <= err[(err['p'] == row['p']) & (err['errd'] >= row['scrd'])]['errq'].sum():
r = row.to_dict()
match.append(r)
#Creating new data frame
report = pd.DataFrame(match)
report
合并左滤波器后来
report1 = scr.merge(err, how = 'left', on = 'p')
flt = (report1['errd'] >= report1['scrd']) & (report1['errq'] - report1['scrq'] >= 0)
report1 = report1.loc[flt]
report1 = report1.drop_duplicates(subset = ['n'])
report1
嵌套循环逐渐慢速,然后再次不正确
match = []
for i, row in scr.iterrows():
for e, erow in err.iterrows():
if (row['p'] == erow['p']) & (erow['errd'] >= row['scrd']) & (erow['errq'] - row['scrq'] >= 0):
err['errq'][e]-= row['scrq']
row_to_dict = row.to_dict()
match.append(row_to_dict)
break
report2 = pd.DataFrame(match)
report2
Hello need to get insides from data, the desired_result is from VBA code that compare two sheets desired_result is checked and 100% accurate, If someone can assist me get the desired output, conditions are:
err['p'] == scr['p'] & err['errd'] >= scr['scrd'] & err['errq'] - scr['scrq'] >= 0
Its all about checking how many of scr['n']
wenth thru err but if one passes thru err then err['errq'] -= scr['scrq']
and jump to next item scr['p']
, scr['n']
are unique, please see the sample code below:
import pandas as pd
err = pd.DataFrame({
'p' : ['10100.A','10101.A','10101.A','10101.A','10102.A','10102.A','10102.A','10103.A','10103.A','10147.A','10147.A'],
'errd' : ['18-5-2022','16-5-2022','4-5-2022','13-5-2022','9-5-2022','2-5-2022','29-5-2022','6-5-2022','11-5-2022','25-5-2022','6-5-2022'],
'errq' : [1, 1, 1, 1, 1, 2, 46, 1, 4, 1, 5]})
err = err.sort_values('errd')
scr = pd.DataFrame({
'p' : ['10101.A','10101.A','10101.A','10102.A','10102.A','10102.A','10103.A','10147.A','10147.A','10147.A','10147.A','10147.A'],
'scrd' : ['10-5-2022','10-5-2022','9-5-2022','13-5-2022','9-5-2022','9-5-2022','25-5-2022','6-5-2022','6-5-2022','6-5-2022','6-5-2022','11-5-2022'],
'scrq' : [1,1,1,1,1,1,1,1,1,1,1,1],
'n' : ['7000000051481339','7000000051481342','7000000051722237','7000000052018581','7000000051721987','7000000051721990','7000000052725251','7000000051530150','7000000051530152','7000000051530157','7000000051546193','7000000051761150']})
desired_result = pd.DataFrame({
'report' : ['7000000051722237','7000000051481339','7000000051721987','7000000051721990','7000000052018581','7000000051530150','7000000051530152','7000000051530157','7000000051546193','7000000051761150'],
'match_err_scr' : ['10101.A','10101.A','10102.A','10102.A','10102.A','10147.A','10147.A','10147.A','10147.A','10147.A']})
What i have tried so far:
match = []
#Iterating scr rows
for i, row in scr.iterrows():
#Checking for match row now is full row in scr
if row['scrq'] <= err[(err['p'] == row['p']) & (err['errd'] >= row['scrd'])]['errq'].sum():
r = row.to_dict()
match.append(r)
#Creating new data frame
report = pd.DataFrame(match)
report
Merge left filter later
report1 = scr.merge(err, how = 'left', on = 'p')
flt = (report1['errd'] >= report1['scrd']) & (report1['errq'] - report1['scrq'] >= 0)
report1 = report1.loc[flt]
report1 = report1.drop_duplicates(subset = ['n'])
report1
Nested loop way to slow and again not correct
match = []
for i, row in scr.iterrows():
for e, erow in err.iterrows():
if (row['p'] == erow['p']) & (erow['errd'] >= row['scrd']) & (erow['errq'] - row['scrq'] >= 0):
err['errq'][e]-= row['scrq']
row_to_dict = row.to_dict()
match.append(row_to_dict)
break
report2 = pd.DataFrame(match)
report2
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不是答案,而是需要帮助理解问题的必要条件。
@b02t,这就是我认为的数据片。
因此,我是正确的,因为您仅将scr.loc [0]与err.loc [3],scr.loc [1]与err.loc [1]和scr.loc [2]与err.loc [2]进行比较。 ]?还是您将SCR中的每一行与ERR中的每一行进行比较?
看Desired_result,我不明白SCR.LOC [2]如何在Desired_Result中,因为,使用Err.loc [2],(errd [errd']&gt; = scr ['srcd'])evals错误。并且,遵循相同的方法,SCR.LOC [1]应在Desired_result中。
Not an answer, but required to help understand the question.
@B02T, this is what I am seeing as a slice of the data.
So I am correct in that you are only comparing scr.loc[0] to err.loc[3], scr.loc[1] to err.loc[1] and scr.loc[2] to err.loc[2] ? Or are you comparing each row in scr to each row in err?
Looking at the desired_result, I don't understand how scr.loc[2] could be in the desired_result since, using err.loc[2], (err['errd'] >= scr['srcd']) evals to False. And, following the same methodology, scr.loc[1] should be in desired_result.