如何从MySQL中的文本字段中提取两个连续的数字？

发布于 2024-10-22 22:11:45 字数 283 浏览 3 评论 0原文

我有一个 MySQL 数据库，我有一个查询：

SELECT `id`, `originaltext` FROM `source` WHERE `originaltext` regexp '[0-9][0-9]'

这会检测所有包含 2 位数字的原始文本。

我需要 MySQL 将这些数字作为字段返回，以便我可以进一步操作它们。

理想情况下，如果我可以添加额外的标准，那就应该是> 20 就很好了，但我也可以单独做。

原文

I have a MySQL database and I have a query as:

SELECT `id`, `originaltext` FROM `source` WHERE `originaltext` regexp '[0-9][0-9]'

This detects all originaltexts which have numbers with 2 digits in it.

I need MySQL to return those numbers as a field, so i can manipulate them further.

Ideally, if I can add additional criteria that is should be > 20 would be great, but i can do that separately as well.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

通知家属抬走 2024-10-29 22:11:45

如果您希望数据库具有更多正则表达式功能，可以考虑使用LIB_MYSQLUDF_PREG。这是一个导入 PCRE 库的 MySQL 用户函数的开源库。 LIB_MYSQLUDF_PREG 仅以源代码形式提供。要使用它，您需要能够编译它并将其安装到您的 MySQL 服务器中。安装此库不会以任何方式改变 MySQL 的内置正则表达式支持。它仅提供以下附加功能：

PREG_CAPTURE 从字符串中提取正则表达式匹配。 PREG_POSITION 返回正则表达式与字符串匹配的位置。 PREG_REPLACE 对字符串执行搜索和替换。 PREG_RLIKE 测试正则表达式是否与字符串匹配。

所有这些函数都将正则表达式作为其第一个参数。该正则表达式的格式必须类似于 Perl 正则表达式运算符。例如，要测试正则表达式是否不区分主题大小写，您可以使用 MySQL 代码 PREG_RLIKE('/regex/i', subject)。这与 PHP 的 preg 函数类似，后者也需要 PHP 字符串内的正则表达式使用额外的 // 分隔符。

如果您想要更简单的东西，您可以更改此功能以更好地满足您的需求。

CREATE FUNCTION REGEXP_EXTRACT(string TEXT, exp TEXT)
-- Extract the first longest string that matches the regular expression
-- If the string is 'ABCD', check all strings and see what matches: 'ABCD', 'ABC', 'AB', 'A', 'BCD', 'BC', 'B', 'CD', 'C', 'D'
-- It's not smart enough to handle things like (A)|(BCD) correctly in that it will return the whole string, not just the matching token.

RETURNS TEXT
DETERMINISTIC
BEGIN
  DECLARE s INT DEFAULT 1;
  DECLARE e INT;
  DECLARE adjustStart TINYINT DEFAULT 1;
  DECLARE adjustEnd TINYINT DEFAULT 1;

  -- Because REGEXP matches anywhere in the string, and we only want the part that matches, adjust the expression to add '^' and '

  -- Of course, if those are already there, don't add them, but change the method of extraction accordingly.

  IF LEFT(exp, 1) = '^' THEN 
    SET adjustStart = 0;
  ELSE
    SET exp = CONCAT('^', exp);
  END IF;

  IF RIGHT(exp, 1) = '
 THEN
    SET adjustEnd = 0;
  ELSE
    SET exp = CONCAT(exp, '
);
  END IF;

  -- Loop through the string, moving the end pointer back towards the start pointer, then advance the start pointer and repeat
  -- Bail out of the loops early if the original expression started with '^' or ended with '
, since that means the pointers can't move
  WHILE (s <= LENGTH(string)) DO
    SET e = LENGTH(string);
    WHILE (e >= s) DO
      IF SUBSTRING(string, s, e) REGEXP exp THEN
        RETURN SUBSTRING(string, s, e);
      END IF;
      IF adjustEnd THEN
        SET e = e - 1;
      ELSE
        SET e = s - 1; -- ugh, such a hack to end it early
      END IF;
    END WHILE;
    IF adjustStart THEN
      SET s = s + 1;
    ELSE
      SET s = LENGTH(string) + 1; -- ugh, such a hack to end it early
    END IF;
  END WHILE;

  RETURN NULL;

END

If you want more regular expression power in your database, you can consider using LIB_MYSQLUDF_PREG. This is an open source library of MySQL user functions that imports the PCRE library. LIB_MYSQLUDF_PREG is delivered in source code form only. To use it, you'll need to be able to compile it and install it into your MySQL server. Installing this library does not change MySQL's built-in regex support in any way. It merely makes the following additional functions available:

PREG_CAPTURE extracts a regex match from a string. PREG_POSITION returns the position at which a regular expression matches a string. PREG_REPLACE performs a search-and-replace on a string. PREG_RLIKE tests whether a regex matches a string.

All these functions take a regular expression as their first parameter. This regular expression must be formatted like a Perl regular expression operator. E.g. to test if regex matches the subject case insensitively, you'd use the MySQL code PREG_RLIKE('/regex/i', subject). This is similar to PHP's preg functions, which also require the extra // delimiters for regular expressions inside the PHP string.

If you want something more simpler, you could alter this function to suit better your needs.

CREATE FUNCTION REGEXP_EXTRACT(string TEXT, exp TEXT)
-- Extract the first longest string that matches the regular expression
-- If the string is 'ABCD', check all strings and see what matches: 'ABCD', 'ABC', 'AB', 'A', 'BCD', 'BC', 'B', 'CD', 'C', 'D'
-- It's not smart enough to handle things like (A)|(BCD) correctly in that it will return the whole string, not just the matching token.

RETURNS TEXT
DETERMINISTIC
BEGIN
  DECLARE s INT DEFAULT 1;
  DECLARE e INT;
  DECLARE adjustStart TINYINT DEFAULT 1;
  DECLARE adjustEnd TINYINT DEFAULT 1;

  -- Because REGEXP matches anywhere in the string, and we only want the part that matches, adjust the expression to add '^' and '

  -- Of course, if those are already there, don't add them, but change the method of extraction accordingly.

  IF LEFT(exp, 1) = '^' THEN 
    SET adjustStart = 0;
  ELSE
    SET exp = CONCAT('^', exp);
  END IF;

  IF RIGHT(exp, 1) = '
 THEN
    SET adjustEnd = 0;
  ELSE
    SET exp = CONCAT(exp, '
);
  END IF;

  -- Loop through the string, moving the end pointer back towards the start pointer, then advance the start pointer and repeat
  -- Bail out of the loops early if the original expression started with '^' or ended with '
, since that means the pointers can't move
  WHILE (s <= LENGTH(string)) DO
    SET e = LENGTH(string);
    WHILE (e >= s) DO
      IF SUBSTRING(string, s, e) REGEXP exp THEN
        RETURN SUBSTRING(string, s, e);
      END IF;
      IF adjustEnd THEN
        SET e = e - 1;
      ELSE
        SET e = s - 1; -- ugh, such a hack to end it early
      END IF;
    END WHILE;
    IF adjustStart THEN
      SET s = s + 1;
    ELSE
      SET s = LENGTH(string) + 1; -- ugh, such a hack to end it early
    END IF;
  END WHILE;

  RETURN NULL;

END

回复收藏 0 原文

演多会厌 2024-10-29 22:11:45

MySQL 中没有任何用于使用正则表达式提取文本的语法。您可以使用 REGEXP 来识别包含两个连续数字的行，但要提取它们，您必须使用普通的字符串操作函数，这在这种情况下非常困难。

替代方案：

从数据库中选择整个值，然后在客户端上使用正则表达式。
使用对 SQL 标准有更好支持的不同数据库（我知道这可能不是一个选择）。然后您可以使用：SUBSTRING(originaltext from '%#[0-9]{2}#%' for '#')。

回复收藏 0 原文

°如果伤别离去 2024-10-29 22:11:45

我认为更干净的方法是使用 REGEXP_SUBSTR( )：

这会精确提取两个任意数字：

SELECT REGEXP_SUBSTR(`originalText`,'[0-9]{2}') AS `twoDigits` FROM `source`;

这会精确提取两个数字，但范围是 20-99（例如：1112 return null；1521返回 52)：

SELECT REGEXP_SUBSTR(`originalText`,'[2-9][0-9]') AS `twoDigits` FROM `source`;

我在 v8.0 中测试了它们并且它们可以工作。仅此而已，祝你好运！

I think the cleaner way is using REGEXP_SUBSTR():

This extracts exactly two any digits:

SELECT REGEXP_SUBSTR(`originalText`,'[0-9]{2}') AS `twoDigits` FROM `source`;

This extracts exactly two digits, but from 20-99 (example: 1112 return null; 1521 returns 52):

SELECT REGEXP_SUBSTR(`originalText`,'[2-9][0-9]') AS `twoDigits` FROM `source`;

I test both in v8.0 and they work. That's all, good luck!

回复收藏 0 原文

情痴 2024-10-29 22:11:45

我遇到了同样的问题，这是我找到的解决方案（但并非在所有情况下都有效）：

使用 LOCATE() 查找您想要的字符串的开头和结尾't 匹配
使用 MID() 提取之间的子字符串...
保持正则表达式仅匹配您确定找到匹配项的行。

回复收藏 0 原文

南城追梦 2024-10-29 22:11:45

我使用我的代码作为存储过程（函数），应努力提取从单个块中的数字构建的任何数字。这是我更广泛的图书馆的一部分。

DELIMITER $

--  2013.04 [email protected]
--  FindNumberInText("ab 234 95 cd", TRUE) => 234  
--  FindNumberInText("ab 234 95 cd", FALSE) => 95

DROP FUNCTION IF EXISTS FindNumberInText$
CREATE FUNCTION FindNumberInText(_input VARCHAR(64), _fromLeft BOOLEAN) RETURNS VARCHAR(32)
BEGIN
  DECLARE _r              VARCHAR(32) DEFAULT '';
  DECLARE _i              INTEGER DEFAULT 1;
  DECLARE _start          INTEGER DEFAULT 0;
  DECLARE _IsCharNumeric  BOOLEAN;

  IF NOT _fromLeft THEN SET _input = REVERSE(_input); END IF;
  _loop: REPEAT
    SET _IsCharNumeric = LOCATE(MID(_input, _i, 1), "0123456789") > 0;
    IF _IsCharNumeric THEN
      IF _start = 0 THEN SET _start  = _i;  END IF;
    ELSE
      IF _start > 0 THEN LEAVE _loop;       END IF;
    END IF;
    SET _i = _i + 1;
  UNTIL _i > length(_input) END REPEAT;

  IF _start > 0 THEN
    SET _r = MID(_input, _start, _i - _start);
    IF NOT _fromLeft THEN SET _r = REVERSE(_r);  END IF;
  END IF;
  RETURN _r;
END$

I used my code as a Stored Procedure (Function), shall work to extract any number built from digits in a single block. This is a part of my wider library.

DELIMITER $

--  2013.04 [email protected]
--  FindNumberInText("ab 234 95 cd", TRUE) => 234  
--  FindNumberInText("ab 234 95 cd", FALSE) => 95

DROP FUNCTION IF EXISTS FindNumberInText$
CREATE FUNCTION FindNumberInText(_input VARCHAR(64), _fromLeft BOOLEAN) RETURNS VARCHAR(32)
BEGIN
  DECLARE _r              VARCHAR(32) DEFAULT '';
  DECLARE _i              INTEGER DEFAULT 1;
  DECLARE _start          INTEGER DEFAULT 0;
  DECLARE _IsCharNumeric  BOOLEAN;

  IF NOT _fromLeft THEN SET _input = REVERSE(_input); END IF;
  _loop: REPEAT
    SET _IsCharNumeric = LOCATE(MID(_input, _i, 1), "0123456789") > 0;
    IF _IsCharNumeric THEN
      IF _start = 0 THEN SET _start  = _i;  END IF;
    ELSE
      IF _start > 0 THEN LEAVE _loop;       END IF;
    END IF;
    SET _i = _i + 1;
  UNTIL _i > length(_input) END REPEAT;

  IF _start > 0 THEN
    SET _r = MID(_input, _start, _i - _start);
    IF NOT _fromLeft THEN SET _r = REVERSE(_r);  END IF;
  END IF;
  RETURN _r;
END$

回复收藏 0 原文

小苏打饼 2024-10-29 22:11:45

如果要返回字符串的一部分：

SELECT id , substring(columnName,(locate('partOfString',columnName)),10) from tableName;

Locate() 将返回匹配字符串的起始位置，该位置将成为 Function Substring() 的起始位置

If you want to return a part of a string :

SELECT id , substring(columnName,(locate('partOfString',columnName)),10) from tableName;

Locate() will return the starting postion of the matching string which becomes starting position of Function Substring()

回复收藏 0 原文

相守太难 2024-10-29 22:11:45

我知道自从提出这个问题以来已经有一段时间了，但遇到它并认为这对我的自定义正则表达式替换器来说是一个很好的挑战 - 请参阅这篇博文。

...好消息是它可以，尽管需要调用很多次。请参阅此在线 rextester 演示，其中显示了下面 SQL 的工作原理。

SELECT reg_replace(
         reg_replace(
           reg_replace(
             reg_replace(
               reg_replace(
                 reg_replace(
                   reg_replace(txt,
                               '[^0-9]+',
                               ',',
                               TRUE,
                               1, -- Min match length
                               0 -- No max match length
                               ),
                             '([0-9]{3,}|,[0-9],)',
                             '',
                             TRUE,
                             1, -- Min match length
                             0 -- No max match length
                             ),
                           '^[0-9],',
                           '',
                           TRUE,
                           1, -- Min match length
                           0 -- No max match length
                           ),
                         ',[0-9]
,
                         '',
                         TRUE,
                         1, -- Min match length
                         0 -- No max match length
                         ),
                       ',{2,}',
                       ',',
                       TRUE,
                       1, -- Min match length
                       0 -- No max match length
                       ),
                     '^,',
                     '',
                     TRUE,
                     1, -- Min match length
                     0 -- No max match length
                     ),
                   ',
,
                   '',
                   TRUE,
                   1, -- Min match length
                   0 -- No max match length
                   ) AS `csv`
FROM tbl;

I know it's been quite a while since this question was asked but came across it and thought it would be a good challenge for my custom regex replacer - see this blog post.

...And the good news is it can, although it needs to be called quite a few times. See this online rextester demo, which shows the workings that got to the SQL below.

SELECT reg_replace(
         reg_replace(
           reg_replace(
             reg_replace(
               reg_replace(
                 reg_replace(
                   reg_replace(txt,
                               '[^0-9]+',
                               ',',
                               TRUE,
                               1, -- Min match length
                               0 -- No max match length
                               ),
                             '([0-9]{3,}|,[0-9],)',
                             '',
                             TRUE,
                             1, -- Min match length
                             0 -- No max match length
                             ),
                           '^[0-9],',
                           '',
                           TRUE,
                           1, -- Min match length
                           0 -- No max match length
                           ),
                         ',[0-9]
,
                         '',
                         TRUE,
                         1, -- Min match length
                         0 -- No max match length
                         ),
                       ',{2,}',
                       ',',
                       TRUE,
                       1, -- Min match length
                       0 -- No max match length
                       ),
                     '^,',
                     '',
                     TRUE,
                     1, -- Min match length
                     0 -- No max match length
                     ),
                   ',
,
                   '',
                   TRUE,
                   1, -- Min match length
                   0 -- No max match length
                   ) AS `csv`
FROM tbl;

回复收藏 0 原文

~没有更多了~