当前位置：文江博客话题详情

正则表达式或 LIKE 模式的转义函数

发布于 2024-10-19 14:27:04 字数 1332 浏览 6 评论 0原文

为了放弃阅读整个问题，我的基本问题是：
PostgreSQL 中是否有一个函数可以转义字符串中的正则表达式字符？

我已经查阅了文档，但无法找到这样的函数。

这是完整的问题：

在 PostgreSQL 数据库中，我有一列具有唯一名称。我还有一个过程，定期将名称插入到该字段中，并且为了防止重复，如果需要输入已经存在的名称，它会在末尾附加一个空格和括号，并带有计数。

即名称、名称 (1)、名称 (2)、名称 (3) 等。

就目前情况而言，我使用以下代码来查找要在系列中添加的下一个数字（用 plpgsql 编写）：

var_name_id := 1;

SELECT CAST(substring(a.name from E'\\((\\d+)\\)$') AS int)
INTO var_last_name_id
FROM my_table.names a
WHERE a.name LIKE var_name || ' (%)'
ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) DESC
LIMIT 1;

IF var_last_name_id IS NOT NULL THEN
    var_name_id = var_last_name_id + 1;
END IF;

var_new_name := var_name || ' (' || var_name_id || ')';

(var_name 包含我尝试插入的名称。）

这目前有效，但问题出在 WHERE 语句中：

WHERE a.name LIKE var_name || ' (%)'

此检查不会验证 % 是一个数字，它不考虑多个括号，如“Name ((1))”之类的内容，如果存在任何一种情况，都会抛出强制转换异常。

WHERE 语句确实需要更像：

WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'

但是 var_name 可能包含正则表达式字符，这导致了上面的问题：PostgreSQL 中是否有一个函数可以转义正则表达式字符串中的字符，所以我可以做类似的事情：

WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'

非常感谢任何建议，包括可能重新设计我的重复名称解决方案。

原文

To forgo reading the entire problem, my basic question is:
Is there a function in PostgreSQL to escape regular expression characters in a string?

I've probed the documentation but was unable to find such a function.

Here is the full problem:

In a PostgreSQL database, I have a column with unique names in it. I also have a process which periodically inserts names into this field, and, to prevent duplicates, if it needs to enter a name that already exists, it appends a space and parentheses with a count to the end.

i.e. Name, Name (1), Name (2), Name (3), etc.

As it stands, I use the following code to find the next number to add in the series (written in plpgsql):

var_name_id := 1;

SELECT CAST(substring(a.name from E'\\((\\d+)\\)
(var_name contains the name I'm trying to insert.)
This works for now, but the problem lies in the WHERE statement:
WHERE a.name LIKE var_name || ' (%)'

This check doesn't verify that the % in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.
The WHERE statement really needs to be something more like:
WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'

But var_name could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:
WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'

Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.
) AS int)
INTO var_last_name_id
FROM my_table.names a
WHERE a.name LIKE var_name || ' (%)'
ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)
(var_name contains the name I'm trying to insert.)
This works for now, but the problem lies in the WHERE statement:

This check doesn't verify that the % in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.
The WHERE statement really needs to be something more like:

But var_name could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:

Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.
) AS int) DESC
LIMIT 1;

IF var_last_name_id IS NOT NULL THEN
    var_name_id = var_last_name_id + 1;
END IF;

var_new_name := var_name || ' (' || var_name_id || ')';

(var_name contains the name I'm trying to insert.)

This works for now, but the problem lies in the WHERE statement:

This check doesn't verify that the % in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.

The WHERE statement really needs to be something more like:

But var_name could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:

Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

很酷又爱笑 2024-10-26 14:27:04

要解决顶部的问题：

假设 standard_conforming_strings = on，就像 Postgres 9.1 以来的默认设置一样。

正则表达式转义函数

让我们从 正则表达式 模式：

!$()*+.:<=>?[\]^{|}-

包裹在括号表达式大多数都失去了特殊含义 - 有一些例外：

- 需要位于第一个或最后一个，或者它表示一个范围< /em> 字符数。
] 和 \ 必须用 \ 转义（在替换中也是如此）。

添加捕获后向引用的括号下面我们得到这个正则表达式模式：

([!$()*+.:<=>?[\\\]^{|}-])

使用它，该函数用反斜杠 (\) 转义所有特殊字符 - 从而删除特殊含义：

CREATE OR REPLACE FUNCTION f_regexp_escape(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT regexp_replace($1, '([!$()*+.:<=>?[\\\]^{|}-])', '\\\1', 'g')
$func$;

添加 PARALLEL SAFE （因为它is) 在 Postgres 10 或更高版本中，以允许使用它的查询并行化。

演示

SELECT f_regexp_escape('test(1) > Foo*');

test\(1\) \> Foo\*

while:

SELECT 'test(1) > Foo*' ~ 'test(1) > Foo*';

返回 FALSE，这可能会让天真的用户感到惊讶，

SELECT 'test(1) > Foo*' ~ f_regexp_escape('test(1) > Foo*');

而现在则返回 TRUE。

`LIKE` 转义函数

为了完整起见，LIKE 模式，其中只有三个字符是特殊的：

\%_

手册：

默认转义字符是反斜杠，但可以使用 ESCAPE 子句选择不同的转义字符。

该函数采用默认值：

CREATE OR REPLACE FUNCTION f_like_escape(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT replace(replace(replace($1
         , '\', '\\')  -- must come 1st
         , '%', '\%')
         , '_', '\_');
$func$;

我们也可以在这里使用更优雅的 regexp_replace() ，但对于少数字符，级联 replace() 函数速度更快。

同样，Postgres 10 或更高版本中的 PARALLEL SAFE。

演示

SELECT f_like_escape('20% \ 50% low_prices');

20\% \\ 50\% low\_prices

To address the question at the top:

Assuming standard_conforming_strings = on, like it's default since Postgres 9.1.

Regular expression escape function

Let's start with a complete list of characters with special meaning in regular expression patterns:

!$()*+.:<=>?[\]^{|}-

Wrapped in a bracket expression most of them lose their special meaning - with a few exceptions:

- needs to be first or last or it signifies a range of characters.
] and \ have to be escaped with \ (in the replacement, too).

After adding capturing parentheses for the back reference below we get this regexp pattern:

([!$()*+.:<=>?[\\\]^{|}-])

Using it, this function escapes all special characters with a backslash (\) - thereby removing the special meaning:

CREATE OR REPLACE FUNCTION f_regexp_escape(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT regexp_replace($1, '([!$()*+.:<=>?[\\\]^{|}-])', '\\\1', 'g')
$func$;

Add PARALLEL SAFE (because it is) in Postgres 10 or later to allow parallelism for queries using it.

Demo

SELECT f_regexp_escape('test(1) > Foo*');

Returns:

test\(1\) \> Foo\*

And while:

SELECT 'test(1) > Foo*' ~ 'test(1) > Foo*';

returns FALSE, which may come as a surprise to naive users,

SELECT 'test(1) > Foo*' ~ f_regexp_escape('test(1) > Foo*');

Returns TRUE as it should now.

`LIKE` escape function

For completeness, the pendant for LIKE patterns, where only three characters are special:

\%_

The manual:

The default escape character is the backslash but a different one can be selected by using the ESCAPE clause.

This function assumes the default:

CREATE OR REPLACE FUNCTION f_like_escape(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT replace(replace(replace($1
         , '\', '\\')  -- must come 1st
         , '%', '\%')
         , '_', '\_');
$func$;

We could use the more elegant regexp_replace() here, too, but for the few characters, a cascade of replace() functions is faster.

Again, PARALLEL SAFE in Postgres 10 or later.

Demo

SELECT f_like_escape('20% \ 50% low_prices');

Returns:

20\% \\ 50\% low\_prices

回复收藏 0 原文

秋风の叶未落 2024-10-26 14:27:04

尝试这样的方法如何，用 var_name 替换我的硬编码 'John Bernard'：

create table my_table(name text primary key);
insert into my_table(name) values ('John Bernard'), 
                                  ('John Bernard (1)'), 
                                  ('John Bernard (2)'), 
                                  ('John Bernard (3)');


select max(regexp_replace(substring(name, 13), ' |\(|\)', '', 'g')::integer+1) 
from my_table 
where substring(name, 1, 12)='John Bernard' 
      and substring(name, 13)~'^ \([1-9][0-9]*\)
一个警告：我假设单用户访问数据库，而此过程是跑步（你也在跑步）。如果情况并非如此，那么 max(n)+1 方法将不是一个好的方法。
;

 max
-----
   4
(1 row)

一个警告：我假设单用户访问数据库，而此过程是跑步（你也在跑步）。如果情况并非如此，那么 max(n)+1 方法将不是一个好的方法。

how about trying something like this, substituting var_name for my hard-coded 'John Bernard':

create table my_table(name text primary key);
insert into my_table(name) values ('John Bernard'), 
                                  ('John Bernard (1)'), 
                                  ('John Bernard (2)'), 
                                  ('John Bernard (3)');


select max(regexp_replace(substring(name, 13), ' |\(|\)', '', 'g')::integer+1) 
from my_table 
where substring(name, 1, 12)='John Bernard' 
      and substring(name, 13)~'^ \([1-9][0-9]*\)
one caveat: I am assuming single-user access to the database while this process is running (and so are you in your approach). If that is not the case then the max(n)+1 approach will not be a good one.
;

 max
-----
   4
(1 row)

one caveat: I am assuming single-user access to the database while this process is running (and so are you in your approach). If that is not the case then the max(n)+1 approach will not be a good one.

回复收藏 0 原文

扛刀软妹 2024-10-26 14:27:04

您可以随意更改架构吗？我认为如果您可以使用复合主键，问题就会消失：

name text not null,
number integer not null,
primary key (name, number)

显示层有责任将 Fred #0 显示为“Fred”，将 Fred #1 显示为“Fred (1)”，&c。

如果您愿意，您可以为此任务创建一个视图。这是数据：

=> select * from foo;
  name  | number 
--------+--------
 Fred   |      0
 Fred   |      1
 Barney |      0
 Betty  |      0
 Betty  |      1
 Betty  |      2
(6 rows)

视图：

create or replace view foo_view as
select *,
case
  when number = 0 then
    name
  else
    name || ' (' || number || ')'
end as name_and_number
from foo;

和结果：

=> select * from foo_view;
  name  | number | name_and_number 
--------+--------+-----------------
 Fred   |      0 | Fred
 Fred   |      1 | Fred (1)
 Barney |      0 | Barney
 Betty  |      0 | Betty
 Betty  |      1 | Betty (1)
 Betty  |      2 | Betty (2)
(6 rows)

Are you at liberty to change the schema? I think the problem would go away if you could use a composite primary key:

name text not null,
number integer not null,
primary key (name, number)

It then becomes the duty of the display layer to display Fred #0 as "Fred", Fred #1 as "Fred (1)", &c.

If you like, you can create a view for this duty. Here's the data:

=> select * from foo;
  name  | number 
--------+--------
 Fred   |      0
 Fred   |      1
 Barney |      0
 Betty  |      0
 Betty  |      1
 Betty  |      2
(6 rows)

The view:

create or replace view foo_view as
select *,
case
  when number = 0 then
    name
  else
    name || ' (' || number || ')'
end as name_and_number
from foo;

And the result:

=> select * from foo_view;
  name  | number | name_and_number 
--------+--------+-----------------
 Fred   |      0 | Fred
 Fred   |      1 | Fred (1)
 Barney |      0 | Barney
 Betty  |      0 | Betty
 Betty  |      1 | Betty (1)
 Betty  |      2 | Betty (2)
(6 rows)

回复收藏 0 原文

~没有更多了~

关于作者

遗心遗梦遗幸福

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

正则表达式或 LIKE 模式的转义函数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

正则表达式转义函数

演示

`LIKE` 转义函数

演示

Regular expression escape function

Demo

`LIKE` escape function

Demo

关于作者

相关话题

热门标签

推荐作者

苦中寻乐

lueluelue

嗼ふ静

王权女流氓

与花如笺

残酷

友情链接

正则表达式或 LIKE 模式的转义函数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

正则表达式转义函数

演示

LIKE 转义函数

演示

Regular expression escape function

Demo

LIKE escape function

Demo

关于作者

相关话题

热门标签

推荐作者

苦中寻乐

lueluelue

嗼ふ静

王权女流氓

与花如笺

残酷

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

`LIKE` 转义函数

`LIKE` escape function