从 HTML/ MS Word 标签中清理数据库

发布于 2024-11-28 23:01:36 字数 104 浏览 0 评论 0原文

我有非常大的数据库需要清理。在旧系统上,用户被允许使用 HTML(并从 Word 文件粘贴)。现在我在数据库的不同位置有大量的开放标签。

有人知道执行这种清理的应用程序/脚本吗?

I have very large database to cleanup. On the old system user was permitted to use HTML (and paste from Word files). Now I've large amount of open tags in different places in DB.

Anyone know a application/script to perform this kind of cleanup?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

相权↑美人 2024-12-05 23:01:36

您可以使用 php 的一些位(或其他后端语言)来执行此操作,

这是一个在 php 上编写的示例,

$link = mysql_connect($host, $username, $password); //connecting to database 
mysql_select_db($database_name,$link); // select a database
$q = mysql_query('SELECT id_row,cell_with_tags FROM tablename WHERE some_condition_if_it_needed'); 
if(mysql_num_rows($q) !== 0){   //check if result of our query not empty
    while($res = mysql_fetch_assoc($q)){ // gets all rows-cells as associative array
         $result[] = array('id_row'=>$res['id_row'],
                           'cell_with_tags'=>strip_tags($res['cell_with_tags']);
    }
}
if(!empty($result)){ 
    foreach($result as $k=>&$v){ // update our database. Write there values without tags
         mysql_query('UPDATE tablename SET cell_with_tags = '.$result[$k]['cell_with_tags'].' WHERE id='.$result[$k]['id_row']);
    }
}

如果您想删除并非所有 html 标签,您还可以修改 strip_tags 函数的调用。 (http://ru.php.net/manual/en/function. strip-tags.php

如果无法执行php脚本,则只能使用mysql来执行此操作。我从 this 得到它

DROP FUNCTION IF EXISTS strip_tags || 
CREATE FUNCTION strip_tags( x longtext) RETURNS longtext
LANGUAGE SQL NOT DETERMINISTIC READS SQL DATA 
BEGIN
   DECLARE sstart INT UNSIGNED;
   DECLARE ends INT UNSIGNED;
   SET sstart = LOCATE('<', x, 1);
   REPEAT
      SET ends = LOCATE('>', x, sstart);
      SET x = CONCAT(SUBSTRING( x, 1 ,sstart -1) ,SUBSTRING(x, ends +1 )) ; 
      SET sstart = LOCATE('<', x, 1);
   UNTIL sstart < 1 END REPEAT;
return x;
END;
||
delimiter ;

mysql> SELECT strip_tags('<a href="HelloWorld.html"><B>Hi, mate!</B></a>') as strip_tags;

You may do this with some bits of php (or another backend language)

Here is an example written on php

$link = mysql_connect($host, $username, $password); //connecting to database 
mysql_select_db($database_name,$link); // select a database
$q = mysql_query('SELECT id_row,cell_with_tags FROM tablename WHERE some_condition_if_it_needed'); 
if(mysql_num_rows($q) !== 0){   //check if result of our query not empty
    while($res = mysql_fetch_assoc($q)){ // gets all rows-cells as associative array
         $result[] = array('id_row'=>$res['id_row'],
                           'cell_with_tags'=>strip_tags($res['cell_with_tags']);
    }
}
if(!empty($result)){ 
    foreach($result as $k=>&$v){ // update our database. Write there values without tags
         mysql_query('UPDATE tablename SET cell_with_tags = '.$result[$k]['cell_with_tags'].' WHERE id='.$result[$k]['id_row']);
    }
}

You also can modify call of strip_tags function, if you want to remove not all html tags. (http://ru.php.net/manual/en/function.strip-tags.php)

If executing of php script is not possible, you can do this only with mysql. I get it from this

DROP FUNCTION IF EXISTS strip_tags || 
CREATE FUNCTION strip_tags( x longtext) RETURNS longtext
LANGUAGE SQL NOT DETERMINISTIC READS SQL DATA 
BEGIN
   DECLARE sstart INT UNSIGNED;
   DECLARE ends INT UNSIGNED;
   SET sstart = LOCATE('<', x, 1);
   REPEAT
      SET ends = LOCATE('>', x, sstart);
      SET x = CONCAT(SUBSTRING( x, 1 ,sstart -1) ,SUBSTRING(x, ends +1 )) ; 
      SET sstart = LOCATE('<', x, 1);
   UNTIL sstart < 1 END REPEAT;
return x;
END;
||
delimiter ;

mysql> SELECT strip_tags('<a href="HelloWorld.html"><B>Hi, mate!</B></a>') as strip_tags;
怀中猫帐中妖 2024-12-05 23:01:36

如果您有 <和>字符作为文本的一部分,而不是 HTML 的一部分。即使您的字符串末尾包含损坏的 HTML 标记。那么你应该使用这个版本的函数:

SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS fnStripTags;
DELIMITER |
CREATE FUNCTION fnStripTags( Dirty varchar(4096) )
RETURNS varchar(4096)
DETERMINISTIC 
BEGIN
  DECLARE iStart, iEnd, iLength, DirtyLength, DirtyTail int;
  DECLARE iNextLessThenAngleBracket, iLengthToNextLessThenAngleBracket int;
    WHILE Locate( '<', Dirty ) > 0 And Locate( '>', Dirty, Locate( '<', Dirty )) > 0 DO
      BEGIN
        SET iStart = Locate( '<', Dirty ), iEnd = Locate( '>', Dirty, Locate('<', Dirty ));
        SET iNextLessThenAngleBracket = Locate( '<', Dirty, Locate('<', Dirty ) + 1);
        SET iLength = ( iEnd - iStart) + 1;
        SET iLengthToNextLessThenAngleBracket = ( iNextLessThenAngleBracket - iStart) + 1;

        IF iLength < iLengthToNextLessThenAngleBracket THEN
            BEGIN
            IF iLength > 0 THEN
              BEGIN
                SET Dirty = Insert( Dirty, iStart, iLength, '');
              END;
            END IF;
            END;
        ELSE
            BEGIN
                IF iNextLessThenAngleBracket != 0 THEN
                    BEGIN
                        SET Dirty = Insert( Dirty, iStart, 1, '*/*');
                    END;
                ELSE
                    BEGIN
                        SET Dirty = Insert( Dirty, iStart, iLength, '');
                    END;
                END IF;
            END;
        END IF;

      END;
    END WHILE;
    IF Locate( '<', Dirty ) > 0 THEN
        BEGIN
            SET DirtyLength = CHAR_LENGTH(Dirty);
            SET DirtyTail = DirtyLength - Locate( '<', Dirty ) + 1;
            SET Dirty = Insert( Dirty, Locate( '<', Dirty ), DirtyTail, '');
        END;
    END IF;
    BEGIN
        SET Dirty = REPLACE(Dirty, '*/*', '<');
    END;
    RETURN Dirty;
END;
|
DELIMITER ;

What if you have < and > characters as a part of the text, not a part of HTML. Even if your string contains broken HTML tag at the end of the string. So then you should use this version of the function:

SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS fnStripTags;
DELIMITER |
CREATE FUNCTION fnStripTags( Dirty varchar(4096) )
RETURNS varchar(4096)
DETERMINISTIC 
BEGIN
  DECLARE iStart, iEnd, iLength, DirtyLength, DirtyTail int;
  DECLARE iNextLessThenAngleBracket, iLengthToNextLessThenAngleBracket int;
    WHILE Locate( '<', Dirty ) > 0 And Locate( '>', Dirty, Locate( '<', Dirty )) > 0 DO
      BEGIN
        SET iStart = Locate( '<', Dirty ), iEnd = Locate( '>', Dirty, Locate('<', Dirty ));
        SET iNextLessThenAngleBracket = Locate( '<', Dirty, Locate('<', Dirty ) + 1);
        SET iLength = ( iEnd - iStart) + 1;
        SET iLengthToNextLessThenAngleBracket = ( iNextLessThenAngleBracket - iStart) + 1;

        IF iLength < iLengthToNextLessThenAngleBracket THEN
            BEGIN
            IF iLength > 0 THEN
              BEGIN
                SET Dirty = Insert( Dirty, iStart, iLength, '');
              END;
            END IF;
            END;
        ELSE
            BEGIN
                IF iNextLessThenAngleBracket != 0 THEN
                    BEGIN
                        SET Dirty = Insert( Dirty, iStart, 1, '*/*');
                    END;
                ELSE
                    BEGIN
                        SET Dirty = Insert( Dirty, iStart, iLength, '');
                    END;
                END IF;
            END;
        END IF;

      END;
    END WHILE;
    IF Locate( '<', Dirty ) > 0 THEN
        BEGIN
            SET DirtyLength = CHAR_LENGTH(Dirty);
            SET DirtyTail = DirtyLength - Locate( '<', Dirty ) + 1;
            SET Dirty = Insert( Dirty, Locate( '<', Dirty ), DirtyTail, '');
        END;
    END IF;
    BEGIN
        SET Dirty = REPLACE(Dirty, '*/*', '<');
    END;
    RETURN Dirty;
END;
|
DELIMITER ;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文