当3行类似

发布于 2025-02-11 03:55:33 字数 881 浏览 10 评论 0原文

我正在尝试识别重复发票。在我的数据集中，我有一些校正导致我识别误报的实例。我想找出一种净校正的方法，只返回最终发票。

在我的示例中，前3笔交易都是相关的。我想用熊猫写一些东西，以识别前两条线净净，只剩下三行。

代码创建表格

df =   pd.DataFrame({'Reference Number': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
 'InvoiceNumber': {0: 'A123', 1: 'A123', 2: 'A123', 3: 'A342', 4: 'A444'},
 'InvoiceAmount': {0: 100, 1: -100, 2: 100, 3: 123, 4: 345},
 'DocType': {0: 'IN', 1: 'AD', 2: 'IN', 3: 'IN', 4: 'IN'},
 'Date': {0: '1/1/2022',
  1: '1/2/2022',
  2: '1/3/2022',
  3: '1/3/2022',
  4: '1/3/2022'}})

完成后我想要表的样子：

原文

I'm trying to identify duplicate invoices. In my dataset, I have instances where there are correction that are causing me to identify false positives. I would like to figure out a way to net correction and only return the final invoice.

In my example the first 3 transactions are all related. I would like to write something using pandas that will identify that the first two lines net out and only leaves 3rd line.

Code to Create the table

df =   pd.DataFrame({'Reference Number': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
 'InvoiceNumber': {0: 'A123', 1: 'A123', 2: 'A123', 3: 'A342', 4: 'A444'},
 'InvoiceAmount': {0: 100, 1: -100, 2: 100, 3: 123, 4: 345},
 'DocType': {0: 'IN', 1: 'AD', 2: 'IN', 3: 'IN', 4: 'IN'},
 'Date': {0: '1/1/2022',
  1: '1/2/2022',
  2: '1/3/2022',
  3: '1/3/2022',
  4: '1/3/2022'}})

What I would want the table to look like when finished:

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

做个少女永远怀春 2025-02-18 03:55:33

不同发票之间的关系尚不清楚。

可能很简单：

>>> df.drop_duplicates('InvoiceNumber', keep='last')
   Reference Number InvoiceNumber  InvoiceAmount DocType      Date
2                 3          A123            100      IN  1/3/2022
3                 4          A342            123      IN  1/3/2022
4                 5          A444            345      IN  1/3/2022

The relation between different invoices are not clear.

It could be simply:

>>> df.drop_duplicates('InvoiceNumber', keep='last')
   Reference Number InvoiceNumber  InvoiceAmount DocType      Date
2                 3          A123            100      IN  1/3/2022
3                 4          A342            123      IN  1/3/2022
4                 5          A444            345      IN  1/3/2022

回复收藏 0 原文

~没有更多了~