如何通过正则表达式连接表

发布于 2024-12-24 03:44:27 字数 492 浏览 2 评论 0原文

假设我有两个表 msg 用于消息，mnc 用于移动网络代码。他们没有任何关系。但我想加入他们

SELECT msg.message,
    msg.src_addr,
    msg.dst_addr,
    mnc.name,
FROM "msg"
JOIN "mnc"
ON array_to_string(regexp_matches(msg.src_addr || '+' || msg.dst_addr, '38(...)'), '') = mnc.code

但查询失败并出现错误：

psql:marketing.sql:28: ERROR:  argument of JOIN/ON must not return a set
LINE 12: ON array_to_string(regexp_matches(msg.src_addr || '+' || msg...

有没有办法进行这样的加入？还是我走错了路？

原文

Say I have two tables msg for messages and mnc for mobile network codes.
They share no relations. But I want to join them

SELECT msg.message,
    msg.src_addr,
    msg.dst_addr,
    mnc.name,
FROM "msg"
JOIN "mnc"
ON array_to_string(regexp_matches(msg.src_addr || '+' || msg.dst_addr, '38(...)'), '') = mnc.code

But query fails with error:

psql:marketing.sql:28: ERROR:  argument of JOIN/ON must not return a set
LINE 12: ON array_to_string(regexp_matches(msg.src_addr || '+' || msg...

Is there a way to do such join? Or am I moving wrong way?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓦然回首 2024-12-31 03:44:27

一种非常奇怪的加入方式。一侧的每个匹配项都与另一个表中的每一行相结合...

regexp_matches() 可能是不适合您的目的的函数。您想要一个简单的正则表达式匹配 (~ ）。实际上， LIKE 运算符将更快：

使用 `LIKE` 可能是最快的。

SELECT msg.message
     , msg.src_addr
     , msg.dst_addr
     , mnc.name
FROM   mnc
JOIN   msg ON msg.src_addr LIKE ('%38' || mnc.code || '%')
           OR msg.dst_addr LIKE ('%38' || mnc.code || '%')
WHERE  length(mnc.code) = 3;

此外，您只需要 mnc.code 恰好为 3 个字符。

使用 regexp match

您可以使用正则表达式编写相同的内容，但肯定会更慢。这是一个与原始示例接近的工作示例：

SELECT msg.message
     , msg.src_addr
     , msg.dst_addr
     , mnc.name
FROM   mnc
JOIN   msg ON (msg.src_addr || '+' || msg.dst_addr) ~ (38 || mnc.code)
           AND length(mnc.code) = 3;

这还要求 msg.src_addr 和 msg.dst_addr 为 NOT NULL。

第二个查询演示了附加检查 length(mnc.code) = 3 如何进入 JOIN 条件或 WHERE 子句。这里效果一样。

使用`regexp_matches()`，

您可以使用regexp_matches()：

SELECT msg.message
     , msg.src_addr
     , msg.dst_addr
     , mnc.name
FROM   mnc
JOIN   msg ON EXISTS (
    SELECT * 
    FROM   regexp_matches(msg.src_addr ||'+'|| msg.dst_addr, '38(...)', 'g') x(y)
    WHERE  y[1] = mnc.code
    );

但是相比之下会慢一些。

说明：
您的 regexp_matches() 表达式仅返回 first 匹配项的所有捕获子字符串的数组。由于您只捕获一个子字符串（模式中的一对括号），因此您将专门获得具有一个元素的数组。

您可以使用附加的“全局”开关 'g' 获得所有匹配 - 但在多行中。因此，您需要一个子选择来测试所有（或聚合）它们。将其放入 EXISTS - 半连接中，您就可以得到您想要的结果。

也许您可以通过这三个方面的性能测试来报告？
为此，请使用 EXPLAIN ANALYZE 。

A very odd way to join. Every match on one side is combined with every row from the other table ...

regexp_matches() is probably the wrong function for your purpose. You want a simple regular expression match (~). Actually, the LIKE operator will be faster:

Presumably fastest with `LIKE`

SELECT msg.message
     , msg.src_addr
     , msg.dst_addr
     , mnc.name
FROM   mnc
JOIN   msg ON msg.src_addr LIKE ('%38' || mnc.code || '%')
           OR msg.dst_addr LIKE ('%38' || mnc.code || '%')
WHERE  length(mnc.code) = 3;

In addition, you only want mnc.code of exactly 3 characters.

With regexp match

You could write the same with regular expressions but it will most definitely be slower. Here is a working example close to your original:

SELECT msg.message
     , msg.src_addr
     , msg.dst_addr
     , mnc.name
FROM   mnc
JOIN   msg ON (msg.src_addr || '+' || msg.dst_addr) ~ (38 || mnc.code)
           AND length(mnc.code) = 3;

This also requires msg.src_addr and msg.dst_addr to be NOT NULL.

The second query demonstrates how the additional check length(mnc.code) = 3 can go into the JOIN condition or a WHERE clause. Same effect here.

With `regexp_matches()`

You could make this work with regexp_matches():

SELECT msg.message
     , msg.src_addr
     , msg.dst_addr
     , mnc.name
FROM   mnc
JOIN   msg ON EXISTS (
    SELECT * 
    FROM   regexp_matches(msg.src_addr ||'+'|| msg.dst_addr, '38(...)', 'g') x(y)
    WHERE  y[1] = mnc.code
    );

But it will be slow in comparison.

Explanation:
Your regexp_matches() expression just returns an array of all captured substrings of the first match. As you only capture one substring (one pair of brackets in your pattern), you will exclusively get arrays with one element.

You get all matches with the additional "globally" switch 'g' - but in multiple rows. So you need a sub-select to test them all (or aggregate). Put that in an EXISTS - semi-join and you arrive at what you wanted.

Maybe you can report back with a performance test of all three?
Use EXPLAIN ANALYZE for that.

回复收藏 0 原文

坠似风落 2024-12-31 03:44:27

您眼前的问题是 regexp_matches可以返回一行或多行。

回复收藏 0 原文

铜锣湾横着走 2024-12-31 03:44:27

尝试改用“substring”，它会根据正则表达式模式提取子字符串。

SELECT msg.message,
    msg.src_addr,
    msg.dst_addr,
    mnc.name
FROM "msg"
JOIN "mnc"
ON substring(msg.src_addr || '+' || msg.dst_addr from '38(...)') = mnc.code

Try using "substring" instead, which extracts a substring given a regex pattern.

SELECT msg.message,
    msg.src_addr,
    msg.dst_addr,
    mnc.name
FROM "msg"
JOIN "mnc"
ON substring(msg.src_addr || '+' || msg.dst_addr from '38(...)') = mnc.code

回复收藏 0 原文

~没有更多了~