正则表达式或 LIKE 模式的转义函数

发布于 2024-10-19 14:27:04 字数 1332 浏览 6 评论 0原文

为了放弃阅读整个问题,我的基本问题是:
PostgreSQL 中是否有一个函数可以转义字符串中的正则表达式字符?

我已经查阅了文档,但无法找到这样的函数。

这是完整的问题:

在 PostgreSQL 数据库中,我有一列具有唯一名称。我还有一个过程,定期将名称插入到该字段中,并且为了防止重复,如果需要输入已经存在的名称,它会在末尾附加一个空格和括号,并带有计数。

即名称、名称 (1)、名称 (2)、名称 (3) 等。

就目前情况而言,我使用以下代码来查找要在系列中添加的下一个数字(用 plpgsql 编写):

var_name_id := 1;

SELECT CAST(substring(a.name from E'\\((\\d+)\\)$') AS int)
INTO var_last_name_id
FROM my_table.names a
WHERE a.name LIKE var_name || ' (%)'
ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) DESC
LIMIT 1;

IF var_last_name_id IS NOT NULL THEN
    var_name_id = var_last_name_id + 1;
END IF;

var_new_name := var_name || ' (' || var_name_id || ')';

(var_name 包含我尝试插入的名称。)

这目前有效,但问题出在 WHERE 语句中:

WHERE a.name LIKE var_name || ' (%)'

此检查不会验证 % 是一个数字,它不考虑多个括号,如“Name ((1))”之类的内容,如果存在任何一种情况,都会抛出强制转换异常。

WHERE 语句确实需要更像:

WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'

但是 var_name 可能包含正则表达式字符,这导致了上面的问题:PostgreSQL 中是否有一个函数可以转义正则表达式字符串中的字符,所以我可以做类似的事情:

WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'

非常感谢任何建议,包括可能重新设计我的重复名称解决方案。

To forgo reading the entire problem, my basic question is:
Is there a function in PostgreSQL to escape regular expression characters in a string?

I've probed the documentation but was unable to find such a function.

Here is the full problem:

In a PostgreSQL database, I have a column with unique names in it. I also have a process which periodically inserts names into this field, and, to prevent duplicates, if it needs to enter a name that already exists, it appends a space and parentheses with a count to the end.

i.e. Name, Name (1), Name (2), Name (3), etc.

As it stands, I use the following code to find the next number to add in the series (written in plpgsql):

var_name_id := 1;

SELECT CAST(substring(a.name from E'\\((\\d+)\\)

(var_name contains the name I'm trying to insert.)

This works for now, but the problem lies in the WHERE statement:

WHERE a.name LIKE var_name || ' (%)'

This check doesn't verify that the % in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.

The WHERE statement really needs to be something more like:

WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'

But var_name could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:

WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'

Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.

) AS int) INTO var_last_name_id FROM my_table.names a WHERE a.name LIKE var_name || ' (%)' ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)

(var_name contains the name I'm trying to insert.)

This works for now, but the problem lies in the WHERE statement:


This check doesn't verify that the % in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.

The WHERE statement really needs to be something more like:


But var_name could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:


Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.

) AS int) DESC LIMIT 1; IF var_last_name_id IS NOT NULL THEN var_name_id = var_last_name_id + 1; END IF; var_new_name := var_name || ' (' || var_name_id || ')';

(var_name contains the name I'm trying to insert.)

This works for now, but the problem lies in the WHERE statement:

This check doesn't verify that the % in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.

The WHERE statement really needs to be something more like:

But var_name could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:

Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

很酷又爱笑 2024-10-26 14:27:04

要解决顶部的问题:

假设 standard_conforming_strings = on,就像 Postgres 9.1 以来的默认设置一样。

正则表达式转义函数

让我们从 正则表达式 模式:

!$()*+.:<=>?[\]^{|}-

包裹在 括号表达式 大多数都失去了特殊含义 - 有一些例外:

  • - 需要位于第一个或最后一个,或者它表示一个范围< /em> 字符数。
  • ]\ 必须用 \ 转义(在替换中也是如此)。

添加捕获后向引用的括号 下面我们得到这个正则表达式模式:

([!$()*+.:<=>?[\\\]^{|}-])

使用它,该函数用反斜杠 (\) 转义所有特殊字符 - 从而删除特殊含义:

CREATE OR REPLACE FUNCTION f_regexp_escape(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT regexp_replace($1, '([!$()*+.:<=>?[\\\]^{|}-])', '\\\1', 'g')
$func$;

添加 PARALLEL SAFE (因为它is) 在 Postgres 10 或更高版本中,以允许使用它的查询并行化。

演示

SELECT f_regexp_escape('test(1) > Foo*');

返回:

test\(1\) \> Foo\*

while:

SELECT 'test(1) > Foo*' ~ 'test(1) > Foo*';

返回 FALSE,这可能会让天真的用户感到惊讶,

SELECT 'test(1) > Foo*' ~ f_regexp_escape('test(1) > Foo*');

而现在则返回 TRUE

LIKE 转义函数

为了完整起见,LIKE 模式,其中只有三个字符是特殊的:

\%_

手册:

默认转义字符是反斜杠,但可以使用 ESCAPE 子句选择不同的转义字符。

该函数采用默认值:

CREATE OR REPLACE FUNCTION f_like_escape(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT replace(replace(replace($1
         , '\', '\\')  -- must come 1st
         , '%', '\%')
         , '_', '\_');
$func$;

我们也可以在这里使用更优雅的 regexp_replace() ,但对于少数字符,级联 replace() 函数速度更快。

同样,Postgres 10 或更高版本中的 PARALLEL SAFE

演示

SELECT f_like_escape('20% \ 50% low_prices');

返回:

20\% \\ 50\% low\_prices

To address the question at the top:

Assuming standard_conforming_strings = on, like it's default since Postgres 9.1.

Regular expression escape function

Let's start with a complete list of characters with special meaning in regular expression patterns:

!$()*+.:<=>?[\]^{|}-

Wrapped in a bracket expression most of them lose their special meaning - with a few exceptions:

  • - needs to be first or last or it signifies a range of characters.
  • ] and \ have to be escaped with \ (in the replacement, too).

After adding capturing parentheses for the back reference below we get this regexp pattern:

([!$()*+.:<=>?[\\\]^{|}-])

Using it, this function escapes all special characters with a backslash (\) - thereby removing the special meaning:

CREATE OR REPLACE FUNCTION f_regexp_escape(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT regexp_replace($1, '([!$()*+.:<=>?[\\\]^{|}-])', '\\\1', 'g')
$func$;

Add PARALLEL SAFE (because it is) in Postgres 10 or later to allow parallelism for queries using it.

Demo

SELECT f_regexp_escape('test(1) > Foo*');

Returns:

test\(1\) \> Foo\*

And while:

SELECT 'test(1) > Foo*' ~ 'test(1) > Foo*';

returns FALSE, which may come as a surprise to naive users,

SELECT 'test(1) > Foo*' ~ f_regexp_escape('test(1) > Foo*');

Returns TRUE as it should now.

LIKE escape function

For completeness, the pendant for LIKE patterns, where only three characters are special:

\%_

The manual:

The default escape character is the backslash but a different one can be selected by using the ESCAPE clause.

This function assumes the default:

CREATE OR REPLACE FUNCTION f_like_escape(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT replace(replace(replace($1
         , '\', '\\')  -- must come 1st
         , '%', '\%')
         , '_', '\_');
$func$;

We could use the more elegant regexp_replace() here, too, but for the few characters, a cascade of replace() functions is faster.

Again, PARALLEL SAFE in Postgres 10 or later.

Demo

SELECT f_like_escape('20% \ 50% low_prices');

Returns:

20\% \\ 50\% low\_prices
秋风の叶未落 2024-10-26 14:27:04

尝试这样的方法如何,用 var_name 替换我的硬编码 'John Bernard'

create table my_table(name text primary key);
insert into my_table(name) values ('John Bernard'), 
                                  ('John Bernard (1)'), 
                                  ('John Bernard (2)'), 
                                  ('John Bernard (3)');


select max(regexp_replace(substring(name, 13), ' |\(|\)', '', 'g')::integer+1) 
from my_table 
where substring(name, 1, 12)='John Bernard' 
      and substring(name, 13)~'^ \([1-9][0-9]*\)

一个警告:我假设单用户访问数据库,而此过程是跑步(你也在跑步)。如果情况并非如此,那么 max(n)+1 方法将不是一个好的方法。

; max ----- 4 (1 row)

一个警告:我假设单用户访问数据库,而此过程是跑步(你也在跑步)。如果情况并非如此,那么 max(n)+1 方法将不是一个好的方法。

how about trying something like this, substituting var_name for my hard-coded 'John Bernard':

create table my_table(name text primary key);
insert into my_table(name) values ('John Bernard'), 
                                  ('John Bernard (1)'), 
                                  ('John Bernard (2)'), 
                                  ('John Bernard (3)');


select max(regexp_replace(substring(name, 13), ' |\(|\)', '', 'g')::integer+1) 
from my_table 
where substring(name, 1, 12)='John Bernard' 
      and substring(name, 13)~'^ \([1-9][0-9]*\)

one caveat: I am assuming single-user access to the database while this process is running (and so are you in your approach). If that is not the case then the max(n)+1 approach will not be a good one.

; max ----- 4 (1 row)

one caveat: I am assuming single-user access to the database while this process is running (and so are you in your approach). If that is not the case then the max(n)+1 approach will not be a good one.

扛刀软妹 2024-10-26 14:27:04

您可以随意更改架构吗?我认为如果您可以使用复合主键,问题就会消失:

name text not null,
number integer not null,
primary key (name, number)

显示层有责任将 Fred #0 显示为“Fred”,将 Fred #1 显示为“Fred (1)”,&c。

如果您愿意,您可以为此任务创建一个视图。这是数据:

=> select * from foo;
  name  | number 
--------+--------
 Fred   |      0
 Fred   |      1
 Barney |      0
 Betty  |      0
 Betty  |      1
 Betty  |      2
(6 rows)

视图:

create or replace view foo_view as
select *,
case
  when number = 0 then
    name
  else
    name || ' (' || number || ')'
end as name_and_number
from foo;

和结果:

=> select * from foo_view;
  name  | number | name_and_number 
--------+--------+-----------------
 Fred   |      0 | Fred
 Fred   |      1 | Fred (1)
 Barney |      0 | Barney
 Betty  |      0 | Betty
 Betty  |      1 | Betty (1)
 Betty  |      2 | Betty (2)
(6 rows)

Are you at liberty to change the schema? I think the problem would go away if you could use a composite primary key:

name text not null,
number integer not null,
primary key (name, number)

It then becomes the duty of the display layer to display Fred #0 as "Fred", Fred #1 as "Fred (1)", &c.

If you like, you can create a view for this duty. Here's the data:

=> select * from foo;
  name  | number 
--------+--------
 Fred   |      0
 Fred   |      1
 Barney |      0
 Betty  |      0
 Betty  |      1
 Betty  |      2
(6 rows)

The view:

create or replace view foo_view as
select *,
case
  when number = 0 then
    name
  else
    name || ' (' || number || ')'
end as name_and_number
from foo;

And the result:

=> select * from foo_view;
  name  | number | name_and_number 
--------+--------+-----------------
 Fred   |      0 | Fred
 Fred   |      1 | Fred (1)
 Barney |      0 | Barney
 Betty  |      0 | Betty
 Betty  |      1 | Betty (1)
 Betty  |      2 | Betty (2)
(6 rows)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文