当3行类似
我正在尝试识别重复发票。在我的数据集中,我有一些校正导致我识别误报的实例。我想找出一种净校正的方法,只返回最终发票。
在我的示例中,前3笔交易都是相关的。我想用熊猫写一些东西,以识别前两条线净净,只剩下三行。
代码创建表格
df = pd.DataFrame({'Reference Number': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
'InvoiceNumber': {0: 'A123', 1: 'A123', 2: 'A123', 3: 'A342', 4: 'A444'},
'InvoiceAmount': {0: 100, 1: -100, 2: 100, 3: 123, 4: 345},
'DocType': {0: 'IN', 1: 'AD', 2: 'IN', 3: 'IN', 4: 'IN'},
'Date': {0: '1/1/2022',
1: '1/2/2022',
2: '1/3/2022',
3: '1/3/2022',
4: '1/3/2022'}})
完成后我想要表的样子:
I'm trying to identify duplicate invoices. In my dataset, I have instances where there are correction that are causing me to identify false positives. I would like to figure out a way to net correction and only return the final invoice.
In my example the first 3 transactions are all related. I would like to write something using pandas that will identify that the first two lines net out and only leaves 3rd line.
Code to Create the table
df = pd.DataFrame({'Reference Number': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
'InvoiceNumber': {0: 'A123', 1: 'A123', 2: 'A123', 3: 'A342', 4: 'A444'},
'InvoiceAmount': {0: 100, 1: -100, 2: 100, 3: 123, 4: 345},
'DocType': {0: 'IN', 1: 'AD', 2: 'IN', 3: 'IN', 4: 'IN'},
'Date': {0: '1/1/2022',
1: '1/2/2022',
2: '1/3/2022',
3: '1/3/2022',
4: '1/3/2022'}})
What I would want the table to look like when finished:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不同发票之间的关系尚不清楚。
可能很简单:
The relation between different invoices are not clear.
It could be simply: