如何使用 PHP 解析内容以将假列表替换为真实列表?

发布于 2024-12-02 13:42:22 字数 985 浏览 1 评论 0原文

可能的重复:
使用 PHP 解析 HTML 的最佳方法

所以我有大量的条目在我的数据库中输入列表,但它们不是真正的列表,我需要将它们转换为实际列表。

这就是我所拥有的:

Other HTML data here.

<p>&ntilde; Line of data</p>
<p>&ntilde; Another line of data</p>
<p>&ntilde; Yet another line of data</p>
<p>&ntilde; Still more data</p>

More HTML data here.

需要更改为:

Other HTML data here.

<ul>
    <li>Line of data</li>
    <li>Another line of data</li>
    <li>Yet another line of data</li>
    <li>Still more data</li>
</ul>

More HTML data here.

它不必像那样格式化,可以全部粉碎在一起。我不在乎。

谢谢。


忘了提及 will be 列表的两侧都有 HTML 数据。

我还有 SimpleDOM 解析器。不太有兴趣再买一个,但如果有一个非常容易使用的可以解决这个问题,那将会很有帮助。

再次感谢。

Possible Duplicate:
Best methods to parse HTML with PHP

So I have a ton of entries in my database where lists where entered, but they're not real lists and i need to convert them to actual lists.

Here's what I have:

Other HTML data here.

<p>ñ Line of data</p>
<p>ñ Another line of data</p>
<p>ñ Yet another line of data</p>
<p>ñ Still more data</p>

More HTML data here.

Needs to change to:

Other HTML data here.

<ul>
    <li>Line of data</li>
    <li>Another line of data</li>
    <li>Yet another line of data</li>
    <li>Still more data</li>
</ul>

More HTML data here.

It doesn't have to be formatted like that, could just be all smashed together. I don't care.

Thanks.


Forgot to mention there is HTML data on both sides of the would be list.

Also I've got the SimpleDOM parser. Not really interested in getting another one, but if there's a really easy one to use that would take care of this it would be helpful.

Thanks, again.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

殊姿 2024-12-09 13:42:22

我会因为不使用 DOM 解析器而受到谴责,但事情就是这样。这只是一个简单的字符串操作,不需要正则表达式。

您只需将

打开/关闭标记替换为

  • 打开/关闭标记,并将其包装在
      < ;/ul>.
  • 更新修复了问题、之前的内容和内容的更新。列表之后...:

    $original = "Stuff here
    
    <p>ñ Line of data</p>
    <p>ñ Another line of data</p>
    <p>ñ Yet another line of data</p>
    <p>ñ Still more data</p>
    
    Other stuff";
    
    // Store stuff before & after the list
    $stuffbefore = substr($original, 0, stripos($original, "<p>"));
    $stuffafter = substr($original, strripos($original, "</p>") + strlen("</p>"));
    
    // Cut off the stuff before the list
    $listpart = substr($original, strlen($stuffbefore));
    // Cut off stuff after the list
    $listpart = substr($listpart, 0, strlen($listpart) - strlen($stuffafter));
    
    $fixed = str_replace("<p>ñ ", "<li>", $listpart);
    $fixed = str_replace("</p>", "</li>", $fixed);
    
    // Stick it all back together
    $fixed = "$stuffbefore\n<ul>$fixed</ul>\n$stuffafter";
    

    I'm going to get reprimands for not using a DOM parser, but here goes. This is just a simple string operation, no regex needed.

    You just need to replace the <p> open/close tags with <li> open/close tags, and wrap it in <ul></ul>.

    Updated Fixed to account for updates to question, stuff before & after the list...:

    $original = "Stuff here
    
    <p>ñ Line of data</p>
    <p>ñ Another line of data</p>
    <p>ñ Yet another line of data</p>
    <p>ñ Still more data</p>
    
    Other stuff";
    
    // Store stuff before & after the list
    $stuffbefore = substr($original, 0, stripos($original, "<p>"));
    $stuffafter = substr($original, strripos($original, "</p>") + strlen("</p>"));
    
    // Cut off the stuff before the list
    $listpart = substr($original, strlen($stuffbefore));
    // Cut off stuff after the list
    $listpart = substr($listpart, 0, strlen($listpart) - strlen($stuffafter));
    
    $fixed = str_replace("<p>ñ ", "<li>", $listpart);
    $fixed = str_replace("</p>", "</li>", $fixed);
    
    // Stick it all back together
    $fixed = "$stuffbefore\n<ul>$fixed</ul>\n$stuffafter";
    
    半暖夏伤 2024-12-09 13:42:22

    你可以只使用
    Str_replace
    将所有

    替换为

  • 以及所有带有
  • You could just use
    Str_replace
    where you replace all the <p> with <li>
    and all the </p> with </li>

    淡淡離愁欲言轉身 2024-12-09 13:42:22

    更新:
    我之前遇到过这个问题,其中有一堆带有“假”列表的数据,使用缩进和不同的字符作为项目符号,所以我只是做了这个小函数。

    function make_real_list($regex, $content, $type="unordered"){
    
        preg_match_all($regex, $content, $matches);
    
        $matches    = $matches[0];
        $count  = sizeof($matches);
    
        if($type=="unordered"):
            $outer_start    = "<ul>";
            $outer_end      = "</ul>";
    
        else:
            $outer_start    = "<ol>";
            $outer_end      = "</ol>";
    
        endif;
    
        $i = 1;
        foreach($matches as $match):
    
            if($i==1):
                $replace    = preg_replace($regex, '<li>$1</li>', $match, 1);
                $match  = preg_quote($match, "/");
                $content    = preg_replace("/$match/", ($outer_start?$outer_start:'').$replace, $content);
    
            elseif($i==$count):
                $replace    = preg_replace($regex, '<li>$1</li>', $match, 1);
                $match  = preg_quote($match, "/");
                $content    = preg_replace("/$match/", $replace.($outer_end?$outer_end:''), $content);
    
            else:
                $content    = preg_replace($regex, '<li>$1</li>', $content, 1);
    
            endif;
            $i++;
    
        endforeach;
    
        return $content;
    
    }
    
    $content = "<p>STUFF BEFORE</p>
    <p>ñ FIRST LIST ITEM</p>
    <p>ñ MIDDLE LIST ITEM</p>
    <p>ñ LAST LIST ITEM</p>
    <p>STUFF AFTER</p>";
    
    echo make_real_list("/\<p\>ñ (.*?)\<\/p\>/", $content);
    
    //OUTPUT
    <p>STUFF BEFORE</p> 
    <ul>
        <li>FIRST LIST ITEM</li> 
        <li>MIDDLE LIST ITEM</li> 
        <li>LAST LIST ITEM</li>
    </ul> 
    <p>STUFF AFTER</p>
    

    UPDATE:
    I've run in to this problem before where there's a bunch of data with 'fake' lists using indenting and different chars as the bullet so I just made this little function.

    function make_real_list($regex, $content, $type="unordered"){
    
        preg_match_all($regex, $content, $matches);
    
        $matches    = $matches[0];
        $count  = sizeof($matches);
    
        if($type=="unordered"):
            $outer_start    = "<ul>";
            $outer_end      = "</ul>";
    
        else:
            $outer_start    = "<ol>";
            $outer_end      = "</ol>";
    
        endif;
    
        $i = 1;
        foreach($matches as $match):
    
            if($i==1):
                $replace    = preg_replace($regex, '<li>$1</li>', $match, 1);
                $match  = preg_quote($match, "/");
                $content    = preg_replace("/$match/", ($outer_start?$outer_start:'').$replace, $content);
    
            elseif($i==$count):
                $replace    = preg_replace($regex, '<li>$1</li>', $match, 1);
                $match  = preg_quote($match, "/");
                $content    = preg_replace("/$match/", $replace.($outer_end?$outer_end:''), $content);
    
            else:
                $content    = preg_replace($regex, '<li>$1</li>', $content, 1);
    
            endif;
            $i++;
    
        endforeach;
    
        return $content;
    
    }
    
    $content = "<p>STUFF BEFORE</p>
    <p>ñ FIRST LIST ITEM</p>
    <p>ñ MIDDLE LIST ITEM</p>
    <p>ñ LAST LIST ITEM</p>
    <p>STUFF AFTER</p>";
    
    echo make_real_list("/\<p\>ñ (.*?)\<\/p\>/", $content);
    
    //OUTPUT
    <p>STUFF BEFORE</p> 
    <ul>
        <li>FIRST LIST ITEM</li> 
        <li>MIDDLE LIST ITEM</li> 
        <li>LAST LIST ITEM</li>
    </ul> 
    <p>STUFF AFTER</p>
    
    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文