SQL“连接”关于空值
由于我无法控制的原因,我需要连接两个表,并且需要空值来匹配。我能想到的最好的选择是吐出一个 UUID 并将其用作我的比较值,但这看起来很难看
SELECT * FROM T1 JOIN T2 ON nvl(T1.SOMECOL,'f44087d5935dccbda23f71f3e9beb491') =
nvl(T2.SOMECOL,'f44087d5935dccbda23f71f3e9beb491')
我怎样才能做得更好?如果重要的话,这在 Oracle 上,并且上下文是一个应用程序,其中必须将一批用户上传的数据与一批现有数据进行比较,以查看是否有任何行匹配。回想起来,我们应该阻止任一数据集中的任何连接列包含空值,但我们没有,现在我们必须忍受它。
编辑:需要明确的是,我不仅仅关心空值。如果列不为空,我希望它们与其实际值匹配。
For reasons beyond my control, I need to join two tables and I need null values to match. The best option I could think of was to spit out a UUID and use that as my comparison value but it seems ugly
SELECT * FROM T1 JOIN T2 ON nvl(T1.SOMECOL,'f44087d5935dccbda23f71f3e9beb491') =
nvl(T2.SOMECOL,'f44087d5935dccbda23f71f3e9beb491')
How can I do better? This is on Oracle if it matters, and the context is an application in which a batch of user-uploaded data has to be compared to a batch of existing data to see if any rows match. In retrospect we should have prevented any of the join columns in either data set from containing nulls, but we didn't and now we have to live with it.
Edit: To be clear, I'm not only concerned with nulls. If the columns are not null I want them to match on their actual values.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
也许这会起作用,但我从未真正尝试过:
Maybe this would work, but I've never actually tried it:
在 SQL Server 中,我使用过:
显然效率不高,因为 OR,但除非有保留值,否则您可以将 NULL 映射到两侧,而不会产生歧义或折叠,这大约是您能做的最好的事情(如果有,为什么是 NULL甚至允许在您的设计中......)
In SQL Server I have used:
Obviously not efficient, because of the OR, but unless there's a reserved value you can map NULLs to on both sides without ambiguity or folding that's about the best you can do (and if there was, why was NULL even allowed in your design...)
在 Oracle 中,您可以使用
decode
连接 null 值:decode
将 null 视为相等,因此无需“魔法”数字即可工作。两列必须具有相同的数据类型。它不会生成最具可读性的代码,但可能仍然比 t1.id = t2.id 或(t1.id 为 null,t2.id 为 null)更好
In oracle you can join null values using
decode
:decode
treats nulls as equal, so this works without "magic" numbers. The two columns must have the same data type.It won't make the most readable code, but probably still better than
t1.id = t2.id or (t1.id is null and t2.id is null)
对于此类任务,Oracle 内部使用未记录的函数 sys_op_map_nonnull(),其中您的查询将变为:
未记录,因此如果您采用此方法,请务必小心。
For this sort of task Oracle internally uses an undocumented function sys_op_map_nonnull(), where your query would become:
Undocumented, so be careful if you go this route.
您不能做得更好,但是您拥有的 JOIN 不会以任何方式执行实际的“JOIN”(T1.SOMECOL 和 T2.SOMECOL 之间不会有任何关联,除了它们都有一个 NULL 值)柱子)。基本上,这意味着您将无法在 NULL 上使用 JOIN 来查看行是否匹配。
NULL 永远不会等于另一个 NULL。未知价值的东西怎么能等于其他未知价值的东西呢?
You can't do any better, but the JOIN you have will not do an actual "JOIN" in any way (there won't be any correlation between T1.SOMECOL and T2.SOMECOL other than they both have a NULL value for that column). Basically that means that you won't be able to use a JOIN on NULLs to see if rows match.
NULL is never equal to another NULL. How can something of unknown value be equal to something else of unknown value?
很简单,利用
COALESCE
,它将返回其第一个非空参数:您唯一需要担心的是“魔术字符串”不能是任一表中连接字段的合法值。
Simple, utilize
COALESCE
, which will return its first non-null parameter:The only thing you will have to worry about is that 'magic string' cannot be among the legal values for the join field in either table.
您可以尝试使用以下查询。
You could try using with the below query.
如果值为空,您真的希望能够连接表吗?难道不能排除连接谓词中可能的空值吗?我发现很难理解两个表中的行可以通过空值关联。如果 table1.col_a 中有 100 个 null,table2.col_b 中有 100 个 null,则仅针对包含 null 的行将返回 10000 行。听起来不正确。
但是,您确实说过您需要它。我是否可以建议将空列合并为较小的字符串,因为字符比较相对昂贵。更好的是,如果列中的数据是文本,则将空值合并为整数。然后您就可以非常快速地进行“比较”,并且不太可能与现有数据发生冲突。
Do you really want to be able to join the tables if a value is null? Can't you just exclude the possible null values in the join predicate? I find it hard to grok that rows in two tables can be related by a null value. If you have 100 nulls in table1.col_a and 100 nulls in table2.col_b, you're going to have 10000 rows returned just for the rows with null. It sounds incorrect.
However, you did say you need it. Can I suggest coalescing the null column into a smaller string as character comparisons are relatively expensive. Even better, coalesce the nulls into an integer if the data in the columns is going to be text. Then you have very quick 'comparisons' and you're unlikely to collide with existing data.
只是把它扔在那里——有没有一种方法可以将这些空值合并成一个已知值,比如空字符串?不太了解你的桌子是如何布局的意味着我不能确定你是否会失去这种方式的意义——即有一个空字符串代表“用户拒绝输入电话号码”而 NULL 代表“我们忘记了”来询问一下”,或者类似的事情?
我确信这是不可能的,但如果是的话,您将有已知的值进行比较,并且可以通过这种方式获得合法的连接。
Just throwing this out there -- is there a way you could coalesce those nulls into a known value, like an empty string? Not knowing much about how your table is laid out means that I can't be sure if you'll be losing meaning that way -- i.e. having an empty string represent "user refused to enter a phone number" and NULL being "we forgot to ask about it", or something like that?
Odds are it's not possible, I'm sure, but if it is, you'll have known values to compare and you can get a legit join that way.
这不是与检查两列中是否存在空值相同吗?
或者
Isn't it the same as checking for presence of nulls in both columns?
or
为什么不这样做:
SELECT * FROM T1 JOIN T2 ON nvl(T1.SOMECOL,'null') =
nvl(T2.SOMECOL,'null')
我不知道你为什么使用 UUID。您可以使用列中不存在的任何字符串,例如字符串“null”,以减少内存占用。例如,使用
nvl
的解决方案比 Eric Petroelje 提出的使用or ... is null
的解决方案要快得多。Why not something like that :
SELECT * FROM T1 JOIN T2 ON nvl(T1.SOMECOL,'null') =
nvl(T2.SOMECOL,'null')
I don't know why you are using the UUID. You could use any string not present in the columns, like the string "null", for example, for lower memory footprint. And the solution using
nvl
is much faster than the solution usingor ... is null
proposed by Eric Petroelje, for example.我相信您仍然可以使用 nvl() 进行连接:
但是您需要在列 col1 上添加基于函数的索引 索引
应该会显着提高 NVL(..) 上的连接速度。
I believe you could still could use nvl() for join:
But you will need to add function based indexes on columns col1
Indexes should improve the speed of the join on NVL(..) significantly.
@Sarath Avanavu
这不是最好的方法。如果 TA.COL1 保留值 0 并且 TB.COL2 为 NULL,它将连接这些记录,这是不正确的。
@Sarath Avanavu
This one is not the best approach. If TA.COL1 keeps value 0 and TB.COL2 is NULL it will join those records, which is not correct.
您还可以使用 CASE 替换子查询中的 null 值,然后 JOIN 结果:
You can also use CASE to replace the null value in Subqueries, then JOIN the results: