正则表达式或 LIKE 模式的转义函数
为了放弃阅读整个问题,我的基本问题是:
PostgreSQL 中是否有一个函数可以转义字符串中的正则表达式字符?
我已经查阅了文档,但无法找到这样的函数。
这是完整的问题:
在 PostgreSQL 数据库中,我有一列具有唯一名称。我还有一个过程,定期将名称插入到该字段中,并且为了防止重复,如果需要输入已经存在的名称,它会在末尾附加一个空格和括号,并带有计数。
即名称、名称 (1)、名称 (2)、名称 (3) 等。
就目前情况而言,我使用以下代码来查找要在系列中添加的下一个数字(用 plpgsql 编写):
var_name_id := 1;
SELECT CAST(substring(a.name from E'\\((\\d+)\\)$') AS int)
INTO var_last_name_id
FROM my_table.names a
WHERE a.name LIKE var_name || ' (%)'
ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) DESC
LIMIT 1;
IF var_last_name_id IS NOT NULL THEN
var_name_id = var_last_name_id + 1;
END IF;
var_new_name := var_name || ' (' || var_name_id || ')';
(var_name
包含我尝试插入的名称。)
这目前有效,但问题出在 WHERE
语句中:
WHERE a.name LIKE var_name || ' (%)'
此检查不会验证 % 是一个数字,它不考虑多个括号,如“Name ((1))”之类的内容,如果存在任何一种情况,都会抛出强制转换异常。
WHERE
语句确实需要更像:
WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'
但是 var_name
可能包含正则表达式字符,这导致了上面的问题:PostgreSQL 中是否有一个函数可以转义正则表达式字符串中的字符,所以我可以做类似的事情:
WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'
非常感谢任何建议,包括可能重新设计我的重复名称解决方案。
To forgo reading the entire problem, my basic question is:
Is there a function in PostgreSQL to escape regular expression characters in a string?
I've probed the documentation but was unable to find such a function.
Here is the full problem:
In a PostgreSQL database, I have a column with unique names in it. I also have a process which periodically inserts names into this field, and, to prevent duplicates, if it needs to enter a name that already exists, it appends a space and parentheses with a count to the end.
i.e. Name, Name (1), Name (2), Name (3), etc.
As it stands, I use the following code to find the next number to add in the series (written in plpgsql):
var_name_id := 1;
SELECT CAST(substring(a.name from E'\\((\\d+)\\)
(var_name
contains the name I'm trying to insert.)
This works for now, but the problem lies in the WHERE
statement:
WHERE a.name LIKE var_name || ' (%)'
This check doesn't verify that the %
in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.
The WHERE
statement really needs to be something more like:
WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'
But var_name
could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:
WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'
Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.
) AS int)
INTO var_last_name_id
FROM my_table.names a
WHERE a.name LIKE var_name || ' (%)'
ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)
(var_name
contains the name I'm trying to insert.)
This works for now, but the problem lies in the WHERE
statement:
This check doesn't verify that the %
in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.
The WHERE
statement really needs to be something more like:
But var_name
could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:
Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.
) AS int) DESC
LIMIT 1;
IF var_last_name_id IS NOT NULL THEN
var_name_id = var_last_name_id + 1;
END IF;
var_new_name := var_name || ' (' || var_name_id || ')';
(var_name
contains the name I'm trying to insert.)
This works for now, but the problem lies in the WHERE
statement:
This check doesn't verify that the %
in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.
The WHERE
statement really needs to be something more like:
But var_name
could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:
Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
要解决顶部的问题:
假设
standard_conforming_strings = on
,就像 Postgres 9.1 以来的默认设置一样。正则表达式转义函数
让我们从 正则表达式 模式:
包裹在 括号表达式 大多数都失去了特殊含义 - 有一些例外:
-
需要位于第一个或最后一个,或者它表示一个范围< /em> 字符数。]
和\
必须用\
转义(在替换中也是如此)。添加捕获后向引用的括号 下面我们得到这个正则表达式模式:
使用它,该函数用反斜杠 (
\
) 转义所有特殊字符 - 从而删除特殊含义:添加
PARALLEL SAFE
(因为它is) 在 Postgres 10 或更高版本中,以允许使用它的查询并行化。演示
返回:
while:
返回
FALSE
,这可能会让天真的用户感到惊讶,而现在则返回
TRUE
。LIKE
转义函数为了完整起见,
LIKE
模式,其中只有三个字符是特殊的:手册:
该函数采用默认值:
我们也可以在这里使用更优雅的
regexp_replace()
,但对于少数字符,级联replace()
函数速度更快。同样,Postgres 10 或更高版本中的
PARALLEL SAFE
。演示
返回:
To address the question at the top:
Assuming
standard_conforming_strings = on
, like it's default since Postgres 9.1.Regular expression escape function
Let's start with a complete list of characters with special meaning in regular expression patterns:
Wrapped in a bracket expression most of them lose their special meaning - with a few exceptions:
-
needs to be first or last or it signifies a range of characters.]
and\
have to be escaped with\
(in the replacement, too).After adding capturing parentheses for the back reference below we get this regexp pattern:
Using it, this function escapes all special characters with a backslash (
\
) - thereby removing the special meaning:Add
PARALLEL SAFE
(because it is) in Postgres 10 or later to allow parallelism for queries using it.Demo
Returns:
And while:
returns
FALSE
, which may come as a surprise to naive users,Returns
TRUE
as it should now.LIKE
escape functionFor completeness, the pendant for
LIKE
patterns, where only three characters are special:The manual:
This function assumes the default:
We could use the more elegant
regexp_replace()
here, too, but for the few characters, a cascade ofreplace()
functions is faster.Again,
PARALLEL SAFE
in Postgres 10 or later.Demo
Returns:
尝试这样的方法如何,用
var_name
替换我的硬编码'John Bernard'
:一个警告:我假设单用户访问数据库,而此过程是跑步(你也在跑步)。如果情况并非如此,那么 max(n)+1 方法将不是一个好的方法。
how about trying something like this, substituting
var_name
for my hard-coded'John Bernard'
:one caveat: I am assuming single-user access to the database while this process is running (and so are you in your approach). If that is not the case then the
max(n)+1
approach will not be a good one.您可以随意更改架构吗?我认为如果您可以使用复合主键,问题就会消失:
显示层有责任将 Fred #0 显示为“Fred”,将 Fred #1 显示为“Fred (1)”,&c。
如果您愿意,您可以为此任务创建一个视图。这是数据:
视图:
和结果:
Are you at liberty to change the schema? I think the problem would go away if you could use a composite primary key:
It then becomes the duty of the display layer to display Fred #0 as "Fred", Fred #1 as "Fred (1)", &c.
If you like, you can create a view for this duty. Here's the data:
The view:
And the result: