如何通过正则表达式连接表
假设我有两个表 msg 用于消息,mnc 用于移动网络代码。 他们没有任何关系。但我想加入他们
SELECT msg.message,
msg.src_addr,
msg.dst_addr,
mnc.name,
FROM "msg"
JOIN "mnc"
ON array_to_string(regexp_matches(msg.src_addr || '+' || msg.dst_addr, '38(...)'), '') = mnc.code
但查询失败并出现错误:
psql:marketing.sql:28: ERROR: argument of JOIN/ON must not return a set
LINE 12: ON array_to_string(regexp_matches(msg.src_addr || '+' || msg...
有没有办法进行这样的加入?还是我走错了路?
Say I have two tables msg for messages and mnc for mobile network codes.
They share no relations. But I want to join them
SELECT msg.message,
msg.src_addr,
msg.dst_addr,
mnc.name,
FROM "msg"
JOIN "mnc"
ON array_to_string(regexp_matches(msg.src_addr || '+' || msg.dst_addr, '38(...)'), '') = mnc.code
But query fails with error:
psql:marketing.sql:28: ERROR: argument of JOIN/ON must not return a set
LINE 12: ON array_to_string(regexp_matches(msg.src_addr || '+' || msg...
Is there a way to do such join? Or am I moving wrong way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一种非常奇怪的加入方式。一侧的每个匹配项都与另一个表中的每一行相结合...
regexp_matches()
可能是不适合您的目的的函数。您想要一个简单的 正则表达式匹配 (~ )
。实际上,
LIKE
运算符 将更快:使用
LIKE
可能是最快的。此外,您只需要
mnc.code
恰好为 3 个字符。使用 regexp match
您可以使用正则表达式编写相同的内容,但肯定会更慢。这是一个与原始示例接近的工作示例:
这还要求
msg.src_addr
和msg.dst_addr
为NOT NULL
。第二个查询演示了附加检查
length(mnc.code) = 3
如何进入JOIN
条件或WHERE
子句。这里效果一样。使用
regexp_matches()
,您可以使用
regexp_matches()
:但是相比之下会慢一些。
说明:
您的
regexp_matches()
表达式仅返回 first 匹配项的所有捕获子字符串的数组。由于您只捕获一个子字符串(模式中的一对括号),因此您将专门获得具有一个元素的数组。您可以使用附加的“全局”开关
'g'
获得所有匹配 - 但在多行中。因此,您需要一个子选择来测试所有(或聚合)它们。将其放入EXISTS
- 半连接中,您就可以得到您想要的结果。也许您可以通过这三个方面的性能测试来报告?
为此,请使用
EXPLAIN ANALYZE
。A very odd way to join. Every match on one side is combined with every row from the other table ...
regexp_matches()
is probably the wrong function for your purpose. You want a simple regular expression match (~
). Actually, theLIKE
operator will be faster:Presumably fastest with
LIKE
In addition, you only want
mnc.code
of exactly 3 characters.With regexp match
You could write the same with regular expressions but it will most definitely be slower. Here is a working example close to your original:
This also requires
msg.src_addr
andmsg.dst_addr
to beNOT NULL
.The second query demonstrates how the additional check
length(mnc.code) = 3
can go into theJOIN
condition or aWHERE
clause. Same effect here.With
regexp_matches()
You could make this work with
regexp_matches()
:But it will be slow in comparison.
Explanation:
Your
regexp_matches()
expression just returns an array of all captured substrings of the first match. As you only capture one substring (one pair of brackets in your pattern), you will exclusively get arrays with one element.You get all matches with the additional "globally" switch
'g'
- but in multiple rows. So you need a sub-select to test them all (or aggregate). Put that in anEXISTS
- semi-join and you arrive at what you wanted.Maybe you can report back with a performance test of all three?
Use
EXPLAIN ANALYZE
for that.您眼前的问题是
regexp_matches
可以返回一行或多行。Your immediate problem is that
regexp_matches
could return one or more rows.尝试改用“substring”,它会根据正则表达式模式提取子字符串。
Try using "substring" instead, which extracts a substring given a regex pattern.