选择差异最小的行

发布于 2024-11-09 00:41:41 字数 623 浏览 0 评论 0原文

我对 SQL 非常熟悉，但我想不出解决这个“相似”数据分析问题的好方法：

给定一个包含一组整数的表，我需要将每个整数与第二个表中的整数进行匹配最相似（绝对差异最小）。通常，我会进行笛卡尔连接并按数字差异进行排序，但我只需要为每个表中的每一行获取一个配对，因此任何一个表中的值都不能使用两次。

知道如何实现这一点吗？

编辑：示例：

TABLE_A

TABLE_B

配对将是 table_a 中的一行和 table_b 中最近的行：

结果

因此，任一表中的行都不会出现两次。

编辑：更多说明：我正在尝试解决这个问题，其中给定 table_a 中的 1 行，我们找到 table_b 中最接近的 1 行。这成为一对并被删除。然后从 table_a 中取出下一行并重复。因此，我们试图找到每一行的最佳匹配并优化该配对，而不是尝试优化总差异。

原文

I'm pretty strong with SQL, but I can't think of good solution to this "look-alike" data analysis problem:

Given a table with a set of integers, I need to match each integer with the integer in a second table that is most similar (smallest absolute difference). Normally I'd do a Cartesian join and order by the difference in numbers, but I need to only get one pairing for each row from each table so no value from either table can be used twice.

Any idea how to accomplish this?

EDIT: Example:

TABLE_A

TABLE_B

The pairing would be one row from table_a and the closest row from table_b:

RESULT

So no row from either table appears twice.

EDIT: more clarification: I'm trying to solve this problem where given 1 row from table_a, we find the 1 row from table_b that's closest. That becomes a pair and is removed. Then take the next row from table_a and repeat. So we're trying to find the best match for each row and optimize that pairing, not trying to optimize total differences.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

稀香 2024-11-16 00:41:41

假设

如果给定 table_a 中的 1 行，我们会找到 table_b 中最接近的 1 行

select
   *
from
   TABLE_A a
   cross apply
   (select top 1 Number from TABLE_B b order by abs(b.Number - a.Number)) b2

这也假设 b 中的行可以重复：尝试一下，看看它是否符合您的要求。但是，这应该适合您的示例数据，因此它可以回答您的问题......

Assuming

where given 1 row from table_a, we find the 1 row from table_b that's closest

select
   *
from
   TABLE_A a
   cross apply
   (select top 1 Number from TABLE_B b order by abs(b.Number - a.Number)) b2

This also assume rows in b can be repeated: try it and see if it does what you want. However, this should fit your sample data so it would answer your question...

回复收藏 0 原文

や莫失莫忘 2024-11-16 00:41:41

select v.*
from

   (select a.value as avalue, b.value as bvalue,
   (abs(a.value - b.value)) as difference 
   from 
   TABLE_A a,
   TABLE_B b) v,

   (select a.value as avalue, b.value as bvalue,
   min((abs(a.value - b.value))) as difference 
   from 
   TABLE_A a,
   TABLE_B b
   group by a.value, b.value) m

where m.avalue = v.avalue and m.bvalue = v.value and m.difference = v.difference

select v.*
from

   (select a.value as avalue, b.value as bvalue,
   (abs(a.value - b.value)) as difference 
   from 
   TABLE_A a,
   TABLE_B b) v,

   (select a.value as avalue, b.value as bvalue,
   min((abs(a.value - b.value))) as difference 
   from 
   TABLE_A a,
   TABLE_B b
   group by a.value, b.value) m

where m.avalue = v.avalue and m.bvalue = v.value and m.difference = v.difference

回复收藏 0 原文

别理我 2024-11-16 00:41:41

您可能需要使用游标来处理此问题。将每个表中的数据复制到它们自己的临时表中，并一次一行应用您的逻辑。

如果没有游标，这件事变得困难（如果不是不可能的话），因为处理第一个表中每个数字的顺序将影响最终结果。

如果你的第一个表看起来像这样

9
10

你的第二个表看起来像这样

5
6

那么如果你先处理 9

9,6
10,5

，你的结果将如下所示如果你先处理 10，结果将如下所示

10,6
9,5

You will probably need to use a cursor to handle this. Copy the data from each table to their own temp table and apply your logic one row at a time.

What makes this difficult, if not impossible without a cursor, is the fact that the order in which you handle each number from the first table will affect the end result.

If your first table looks like this