如何统计 MySQL/正则表达式替换器中的单词数?

发布于 2024-08-11 08:24:14 字数 302 浏览 8 评论 0 原文

如何在 MySQL 查询中具有与 Regex.Replace 函数相同的行为(例如在 .NET/C# 中)?

我需要它,因为和很多人一样,我想计算一个字段中的单词数。但是,我对以下答案不满意(在该网站上多次给出):

SELECT LENGTH(name) - LENGTH(REPLACE(name, ' ', '') +1 FROM table

因为当两个单词之间有超过一个空格时,它不会给出好的结果。

顺便说一句,我认为 Regex.Replace 函数可能很有趣,所以欢迎所有好主意!

How can I, in a MySQL query, have the same behaviour as the Regex.Replace function (for instance in .NET/C#)?

I need that because, as many people, I would like to count the number of words in a field. However, I'm not satisfied with the following answer (given several times on that site):

SELECT LENGTH(name) - LENGTH(REPLACE(name, ' ', '') +1 FROM table

Because it doesn't give good results when there are more that one space between two words.

By the way, I think the Regex.Replace function may be interesting so all the good ideas are welcome !

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

天煞孤星 2024-08-18 08:24:15

REGEXP_REPLACE 可用作 MySQL 用户定义函数

字数统计:如果您可以控制进入数据库的数据,则可以在插入之前删除双空格。此外,如果您必须经常访问字数统计,您可以在代码中计算一次并将计数存储在数据库中。

There's REGEXP_REPLACE available as MySQL user-defined functions.

Word counting: If you can control the data going into the database, you can remove double whitespace before insert. Also if you have to access the word count often, you can compute it once in your code and store the count in the database.

盗琴音 2024-08-18 08:24:15

更新:现已添加一个单独的MySQL 8.0+ 的答案,应优先使用。 (保留此答案,以防仅限于使用早期版本。)

几乎与 这个问题,但这个答案将解决基于来自 这篇博文

演示

Rextester 在线演示

对于示例文本,给出的计数为 61 -与我尝试过的所有在线单词计数器相同(例如 https://wordcounter.net/)。

SQL(为简洁起见,不包括函数代码)

SELECT txt,
       -- Count the number of gaps between words
       CHAR_LENGTH(txt) -
       CHAR_LENGTH(reg_replace(txt,
                               '[[:space:]]+', -- Look for a chunk of whitespace
                               '^.', -- Replace the first character from the chunk
                               '',   -- Replace with nothing (i.e. remove the character)
                               TRUE, -- Greedy matching
                               1,  -- Minimum match length
                               0,  -- No maximum match length
                               1,  -- Minimum sub-match length
                               0   -- No maximum sub-match length
                               ))
       + 1 -- The word count is 1 more than the number of gaps between words
       - IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
       - IF (txt REGEXP '[[:space:]]
, 1, 0) -- Exclude whitespace at the end from count
       AS `word count`
FROM tbl;

UPDATE: Have now added a separate answer for MySQL 8.0+, which should be used in preference. (Retained this answer in case of being constrainted to using an earlier version.)

Almost a duplicate of this question but this answer will address the use case of counting words based on the advanced version of the custom regular expression replacer from this blog post.

Demo

Rextester online demo

For the sample text, this gives a count of 61 - the same as all online word counters I've tried (e.g. https://wordcounter.net/).

SQL (excluding function code for brevity):

SELECT txt,
       -- Count the number of gaps between words
       CHAR_LENGTH(txt) -
       CHAR_LENGTH(reg_replace(txt,
                               '[[:space:]]+', -- Look for a chunk of whitespace
                               '^.', -- Replace the first character from the chunk
                               '',   -- Replace with nothing (i.e. remove the character)
                               TRUE, -- Greedy matching
                               1,  -- Minimum match length
                               0,  -- No maximum match length
                               1,  -- Minimum sub-match length
                               0   -- No maximum sub-match length
                               ))
       + 1 -- The word count is 1 more than the number of gaps between words
       - IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
       - IF (txt REGEXP '[[:space:]]
, 1, 0) -- Exclude whitespace at the end from count
       AS `word count`
FROM tbl;
怀念你的温柔 2024-08-18 08:24:15

MySQL 8.0 现在提供了一个不错的 REGEXP_REPLACE 函数,这使得这变得更简单:

SQL

SELECT -- Count the number of gaps between words
       CHAR_LENGTH(txt) -
           CHAR_LENGTH(REGEXP_REPLACE(
               txt,
               '[[:space:]]([[:space:]]*)', -- A chunk of one or more whitespace characters
               '$1')) -- Discard the first whitespace character and retain the rest
           + 1 -- The word count is 1 more than the number of gaps between words
           - IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
           - IF (txt REGEXP '[[:space:]]

Demo

DB-Fiddle 在线演示

, 1, 0) -- Exclude whitespace at the end from count AS `Word count` FROM tbl;

Demo

DB-Fiddle 在线演示

MySQL 8.0 now provides a decent REGEXP_REPLACE function, which makes this much simpler:

SQL

SELECT -- Count the number of gaps between words
       CHAR_LENGTH(txt) -
           CHAR_LENGTH(REGEXP_REPLACE(
               txt,
               '[[:space:]]([[:space:]]*)', -- A chunk of one or more whitespace characters
               '$1')) -- Discard the first whitespace character and retain the rest
           + 1 -- The word count is 1 more than the number of gaps between words
           - IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
           - IF (txt REGEXP '[[:space:]]

Demo

DB-Fiddle online demo

, 1, 0) -- Exclude whitespace at the end from count AS `Word count` FROM tbl;

Demo

DB-Fiddle online demo

多彩岁月 2024-08-18 08:24:15

答案是否定的,在 MySQL 中不能有相同的行为。

但我建议您早先查看这个问题链接到 UDF 的主题,据称该 UDF 启用了部分此功能。

The answer is no you cannot have the same behaviour in MySQL.

But i recommend you checkout this earlier question on the subject which links to a UDF that supposedly enables some of this functionality.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文