选择差异最小的行
我对 SQL 非常熟悉,但我想不出解决这个“相似”数据分析问题的好方法:
给定一个包含一组整数的表,我需要将每个整数与第二个表中的整数进行匹配最相似(绝对差异最小)。通常,我会进行笛卡尔连接并按数字差异进行排序,但我只需要为每个表中的每一行获取一个配对,因此任何一个表中的值都不能使用两次。
知道如何实现这一点吗?
编辑:示例:
TABLE_A
34
57
108
219
380
5000
TABLE_B
4000
200
400
99
40
33
34
4600
配对将是 table_a 中的一行和 table_b 中最近的行:
结果
34,34
57,40
108,99
219,200
380,400
5000,4600
因此,任一表中的行都不会出现两次。
编辑:更多说明:我正在尝试解决这个问题,其中给定 table_a 中的 1 行,我们找到 table_b 中最接近的 1 行。这成为一对并被删除。然后从 table_a 中取出下一行并重复。因此,我们试图找到每一行的最佳匹配并优化该配对,而不是尝试优化总差异。
I'm pretty strong with SQL, but I can't think of good solution to this "look-alike" data analysis problem:
Given a table with a set of integers, I need to match each integer with the integer in a second table that is most similar (smallest absolute difference). Normally I'd do a Cartesian join and order by the difference in numbers, but I need to only get one pairing for each row from each table so no value from either table can be used twice.
Any idea how to accomplish this?
EDIT: Example:
TABLE_A
34
57
108
219
380
5000
TABLE_B
4000
200
400
99
40
33
34
4600
The pairing would be one row from table_a and the closest row from table_b:
RESULT
34,34
57,40
108,99
219,200
380,400
5000,4600
So no row from either table appears twice.
EDIT: more clarification: I'm trying to solve this problem where given 1 row from table_a, we find the 1 row from table_b that's closest. That becomes a pair and is removed. Then take the next row from table_a and repeat. So we're trying to find the best match for each row and optimize that pairing, not trying to optimize total differences.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
假设
这也假设 b 中的行可以重复:尝试一下,看看它是否符合您的要求。但是,这应该适合您的示例数据,因此它可以回答您的问题......
Assuming
This also assume rows in b can be repeated: try it and see if it does what you want. However, this should fit your sample data so it would answer your question...
您可能需要使用游标来处理此问题。将每个表中的数据复制到它们自己的临时表中,并一次一行应用您的逻辑。
如果没有游标,这件事变得困难(如果不是不可能的话),因为处理第一个表中每个数字的顺序将影响最终结果。
如果你的第一个表看起来像这样
你的第二个表看起来像这样
那么如果你先处理 9
,你的结果将如下所示 如果你先处理 10,结果将如下所示
You will probably need to use a cursor to handle this. Copy the data from each table to their own temp table and apply your logic one row at a time.
What makes this difficult, if not impossible without a cursor, is the fact that the order in which you handle each number from the first table will affect the end result.
If your first table looks like this
And your second table looks like this
Then your result will look like this if you process the 9 first
And the result would look like this if you processed the 10 first