与 PHP strip_tags 等效的 MySQL 查询是什么?

发布于 2024-12-07 23:15:20 字数 166 浏览 0 评论 0原文

I have a large database which contains records that have <a> tags in them and I would like to remove them. Of course there is the method where I create a PHP script that selects all, uses strip_tags and updates the database, but this takes a long time. So how can I do this with a simple (or complicated) MySQL query?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

糖粟与秋泊 2024-12-14 23:15:20

MySQL >= 5.5 提供了 XML 函数来解决您的问题:

SELECT ExtractValue(field, '//text()') FROM table;

参考: https://dev.mysql.com/doc/refman/5.5/en/xml-functions.html

MySQL >= 5.5 provides XML functions to solve your issue:

SELECT ExtractValue(field, '//text()') FROM table;

Reference: https://dev.mysql.com/doc/refman/5.5/en/xml-functions.html

剧终人散尽 2024-12-14 23:15:20

给你:

CREATE FUNCTION `strip_tags`($str text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    LOOP
        SET $start = LOCATE("<", $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE(">", $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, "");
    END LOOP;
END;

我确保它删除了不匹配的左括号,因为它们很危险,尽管它忽略了任何不匹配的右括号,因为它们是无害的。

mysql> select strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.');
+----------------------------------------------------------------------+
| strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.') |
+----------------------------------------------------------------------+
| hello world again.                                                   |
+----------------------------------------------------------------------+
1 row in set

Here you go:

CREATE FUNCTION `strip_tags`($str text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    LOOP
        SET $start = LOCATE("<", $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE(">", $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, "");
    END LOOP;
END;

I made sure it removes mismatched opening brackets because they're dangerous, though it ignores any unpaired closing brackets because they're harmless.

mysql> select strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.');
+----------------------------------------------------------------------+
| strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.') |
+----------------------------------------------------------------------+
| hello world again.                                                   |
+----------------------------------------------------------------------+
1 row in set
另类 2024-12-14 23:15:20

我不相信有任何有效的方法可以单独在 MySQL 中做到这一点。

MySQL 确实有 REPLACE() 函数,但它只能替换常量字符串,而不能替换模式。您可以编写一个 MySQL 存储函数来搜索和替换标签,但此时您最好编写一个 PHP 脚本来完成这项工作。它可能不会那么快,但写入速度可能会更快。

I don't believe there's any efficient way to do this in MySQL alone.

MySQL does have a REPLACE() function, but it can only replace constant strings, not patterns. You could possibly write a MySQL stored function to search for and replace tags, but at that point you're probably better off writing a PHP script to do the job. It might not be quite as fast, but it will probably be faster to write.

美胚控场 2024-12-14 23:15:20

我正在传递这段代码,看起来与上面的非常相似。为我工作,希望有帮助。

BEGIN
  DECLARE iStart, iEnd, iLength   INT;

  WHILE locate('<', Dirty) > 0 AND locate('>', Dirty, locate('<', Dirty)) > 0
  DO
    BEGIN
      SET iStart = locate('<', Dirty), iEnd = locate('>', Dirty, locate('<', Dirty));
      SET iLength = (iEnd - iStart) + 1;
      IF iLength > 0 THEN
        BEGIN
          SET Dirty = insert(Dirty, iStart, iLength, '');
        END;
      END IF;
    END;
  END WHILE;
  RETURN Dirty;
END

I am passing this code on, seems very similar to the above. Worked for me, hope it helps.

BEGIN
  DECLARE iStart, iEnd, iLength   INT;

  WHILE locate('<', Dirty) > 0 AND locate('>', Dirty, locate('<', Dirty)) > 0
  DO
    BEGIN
      SET iStart = locate('<', Dirty), iEnd = locate('>', Dirty, locate('<', Dirty));
      SET iLength = (iEnd - iStart) + 1;
      IF iLength > 0 THEN
        BEGIN
          SET Dirty = insert(Dirty, iStart, iLength, '');
        END;
      END IF;
    END;
  END WHILE;
  RETURN Dirty;
END
小ぇ时光︴ 2024-12-14 23:15:20

与 MySQL 8+ 和 MariaDB 10.0.5+ 兼容

SELECT REGEXP_REPLACE(body, '<[^>]*>+', '') FROM app_cms_sections

Compatible with MySQL 8+ and MariaDB 10.0.5+

SELECT REGEXP_REPLACE(body, '<[^>]*>+', '') FROM app_cms_sections

还给你自由 2024-12-14 23:15:20

我只是扩展了答案@boann 以允许定位任何特定标签,以便我们可以在每次函数调用时一一替换标签。您只需传递标签参数,例如 'a' 即可替换所有开始/结束锚标记。这回答了OP提出的问题,与接受的答案不同,后者删除了所有标签。

# MySQL function to programmatically replace out specified html tags from text/html fields

# run this to drop/update the stored function
DROP FUNCTION IF EXISTS `strip_tags`;

DELIMITER |

# function to nuke all opening and closing tags of type specified in argument 2
CREATE FUNCTION `strip_tags`($str text, $tag text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    SET $str = COALESCE($str, '');
    LOOP
        SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE('>', $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, '');
        SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
    END LOOP;
END;

| DELIMITER ;

# test select to nuke all opening <a> tags
SELECT 
    STRIP_TAGS(description, 'a') AS stripped
FROM
    tmpcat;

# run update query to replace out all <a> tags
UPDATE tmpcat
SET 
    description = STRIP_TAGS(description, 'a');

I just extended the answer @boann to allow targetting of any specific tag so that we can replace out the tags one by one with each function call. You just need pass the tag parameter, e.g. 'a' to replace out all opening/closing anchor tags. This answers the question asked by OP, unlike the accepted answer, which strips out ALL tags.

# MySQL function to programmatically replace out specified html tags from text/html fields

# run this to drop/update the stored function
DROP FUNCTION IF EXISTS `strip_tags`;

DELIMITER |

# function to nuke all opening and closing tags of type specified in argument 2
CREATE FUNCTION `strip_tags`($str text, $tag text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    SET $str = COALESCE($str, '');
    LOOP
        SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE('>', $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, '');
        SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
    END LOOP;
END;

| DELIMITER ;

# test select to nuke all opening <a> tags
SELECT 
    STRIP_TAGS(description, 'a') AS stripped
FROM
    tmpcat;

# run update query to replace out all <a> tags
UPDATE tmpcat
SET 
    description = STRIP_TAGS(description, 'a');
回忆那么伤 2024-12-14 23:15:20

一旦我添加了 SET $str = COALESCE($str, '');,Boann 就可以工作了。

来自此帖子

另请注意,您可能需要放置 SET $str = COALESCE($str, '');
就在循环之前,否则空值可能会导致崩溃/永远不会
结束查询。 – Tom C 8 月 17 日 9:51

Boann's works once I added SET $str = COALESCE($str, '');.

from this post:

Also to note, you may want to put a SET $str = COALESCE($str, '');
just before the loop otherwise null values may cause a crash/never
ending query. – Tom C Aug 17 at 9:51

太阳男子 2024-12-14 23:15:20

我正在使用 lib_mysqludf_preg 库和这样的正则表达式:

SELECT PREG_REPLACE('#<[^>]+>#',' ',cell) FROM table;

对于行也这样做带有编码的 html 实体:

SELECT PREG_REPLACE('#<.+?>#',' ',cell) FROM table;

在某些情况下,这些可能会失败,但我还没有遇到过,而且它们相当快。

I'm using the lib_mysqludf_preg library for this and a regex like this:

SELECT PREG_REPLACE('#<[^>]+>#',' ',cell) FROM table;

Also did it like this for rows which with encoded html entities:

SELECT PREG_REPLACE('#<.+?>#',' ',cell) FROM table;

There are probably cases where these might fail but I haven't encountered any and they're reasonably fast.

愚人国度 2024-12-14 23:15:20

REPLACE() 效果很好。

微妙的方法:

 REPLACE(REPLACE(node.body,'<p>',''),'</p>','') as `post_content`

...以及不那么微妙的方法:(将字符串转换为段)

 LOWER(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(node.title), ':', ''), 'é', 'e'), ')', ''), '(', ''), ',', ''), '\\', ''), '\/', ''), '\"', ''), '?', ''), '\'', ''), '&', ''), '!', ''), '.', ''), '–', ''), ' ', '-'), '--', '-'), '--', '-'), '’', '')) as `post_name`

REPLACE() works pretty well.

The subtle approach:

 REPLACE(REPLACE(node.body,'<p>',''),'</p>','') as `post_content`

...and the not so subtle: (Converting strings into slugs)

 LOWER(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(node.title), ':', ''), 'é', 'e'), ')', ''), '(', ''), ',', ''), '\\', ''), '\/', ''), '\"', ''), '?', ''), '\'', ''), '&', ''), '!', ''), '.', ''), '–', ''), ' ', '-'), '--', '-'), '--', '-'), '’', '')) as `post_name`
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文