当3行类似

发布于 2025-02-11 03:55:33 字数 881 浏览 10 评论 0原文

我正在尝试识别重复发票。在我的数据集中,我有一些校正导致我识别误报的实例。我想找出一种净校正的方法,只返回最终发票。

在我的示例中,前3笔交易都是相关的。我想用熊猫写一些东西,以识别前两条线净净,只剩下三行。

代码创建表格

df =   pd.DataFrame({'Reference Number': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
 'InvoiceNumber': {0: 'A123', 1: 'A123', 2: 'A123', 3: 'A342', 4: 'A444'},
 'InvoiceAmount': {0: 100, 1: -100, 2: 100, 3: 123, 4: 345},
 'DocType': {0: 'IN', 1: 'AD', 2: 'IN', 3: 'IN', 4: 'IN'},
 'Date': {0: '1/1/2022',
  1: '1/2/2022',
  2: '1/3/2022',
  3: '1/3/2022',
  4: '1/3/2022'}})

完成后我想要表的样子:

”在此处输入图像说明”

enter image description here

I'm trying to identify duplicate invoices. In my dataset, I have instances where there are correction that are causing me to identify false positives. I would like to figure out a way to net correction and only return the final invoice.

In my example the first 3 transactions are all related. I would like to write something using pandas that will identify that the first two lines net out and only leaves 3rd line.

Code to Create the table

df =   pd.DataFrame({'Reference Number': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
 'InvoiceNumber': {0: 'A123', 1: 'A123', 2: 'A123', 3: 'A342', 4: 'A444'},
 'InvoiceAmount': {0: 100, 1: -100, 2: 100, 3: 123, 4: 345},
 'DocType': {0: 'IN', 1: 'AD', 2: 'IN', 3: 'IN', 4: 'IN'},
 'Date': {0: '1/1/2022',
  1: '1/2/2022',
  2: '1/3/2022',
  3: '1/3/2022',
  4: '1/3/2022'}})

What I would want the table to look like when finished:

enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

做个少女永远怀春 2025-02-18 03:55:33

不同发票之间的关系尚不清楚。

可能很简单:

>>> df.drop_duplicates('InvoiceNumber', keep='last')
   Reference Number InvoiceNumber  InvoiceAmount DocType      Date
2                 3          A123            100      IN  1/3/2022
3                 4          A342            123      IN  1/3/2022
4                 5          A444            345      IN  1/3/2022

The relation between different invoices are not clear.

It could be simply:

>>> df.drop_duplicates('InvoiceNumber', keep='last')
   Reference Number InvoiceNumber  InvoiceAmount DocType      Date
2                 3          A123            100      IN  1/3/2022
3                 4          A342            123      IN  1/3/2022
4                 5          A444            345      IN  1/3/2022
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文