如何用段落标签包围所有文本片段?

发布于 2024-11-06 05:37:45 字数 1436 浏览 3 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

晒暮凉 2024-11-13 05:37:45

这里有几个函数可以帮助您完成您想做的事情:

// nl2p
// This function will convert newlines to HTML paragraphs
// without paying attention to HTML tags. Feed it a raw string and it will
// simply return that string sectioned into HTML paragraphs

function nl2p($str) {
    $arr=explode("\n",$str);
    $out='';

    for($i=0;$i<count($arr);$i++) {
        if(strlen(trim($arr[$i]))>0)
            $out.='<p>'.trim($arr[$i]).'</p>';
    }
    return $out;
}



// nl2p_html
// This function will add paragraph tags around textual content of an HTML file, leaving
// the HTML itself intact
// This function assumes that the HTML syntax is correct and that the '<' and '>' characters
// are not used in any of the values for any tag attributes. If these assumptions are not met,
// mass paragraph chaos may ensue. Be safe.

function nl2p_html($str) {

    // If we find the end of an HTML header, assume that this is part of a standard HTML file. Cut off everything including the
    // end of the head and save it in our output string, then trim the head off of the input. This is mostly because we don't
    // want to surrount anything like the HTML title tag or any style or script code in paragraph tags. 
    if(strpos($str,'</head>')!==false) {
        $out=substr($str,0,strpos($str,'</head>')+7);
        $str=substr($str,strpos($str,'</head>')+7);
    }

    // First, we explode the input string based on wherever we find HTML tags, which start with '<'
    $arr=explode('<',$str);

    // Next, we loop through the array that is broken into HTML tags and look for textual content, or
    // anything after the >
    for($i=0;$i<count($arr);$i++) {
        if(strlen(trim($arr[$i]))>0) {
            // Add the '<' back on since it became collateral damage in our explosion as well as the rest of the tag
            $html='<'.substr($arr[$i],0,strpos($arr[$i],'>')+1);

            // Take the portion of the string after the end of the tag and explode that by newline. Since this is after
            // the end of the HTML tag, this must be textual content.
            $sub_arr=explode("\n",substr($arr[$i],strpos($arr[$i],'>')+1));

            // Initialize the output string for this next loop
            $paragraph_text='';

            // Loop through this new array and add paragraph tags (<p>...</p>) around any element that isn't empty
            for($j=0;$j<count($sub_arr);$j++) {
                if(strlen(trim($sub_arr[$j]))>0)
                    $paragraph_text.='<p>'.trim($sub_arr[$j]).'</p>';
            }

            // Put the text back onto the end of the HTML tag and put it in our output string
            $out.=$html.$paragraph_text;
        }

    }

    // Throw it back into our program
    return $out;
}

第一个函数 nl2p() 将字符串作为输入,并在有换行符的地方将其转换为数组 ("\ n") 字符。然后它会遍历每个元素,如果找到一个不为空的元素,它将用

标签包裹它,并将其添加到一个字符串中,该字符串在以下位置返回函数结束。

第二个,nl2p_html(),是前者的更复杂的版本。将 HTML 文件的内容作为字符串传递给它,它会将

标记包裹在任何非 HTML 文本周围。它通过将字符串分解为数组来实现此目的,其中分隔符是 < 字符,它是任何 HTML 标记的开头。然后,代码将遍历每个元素,查找 HTML 标记的末尾,并将其后面的所有内容放入新字符串中。
这个新字符串本身将分解为一个数组,其中分隔符是换行符 ("\n")。代码循环遍历这个新数组,查找非空元素。当它找到一些数据时,它将把它包装在段落标签中并将其添加到输出字符串中。当此循环完成时,该字符串将被添加回 HTML 代码中,并且将一起修改为输出缓冲区字符串,该字符串将在函数完成后返回。

tl;dr:nl2p() 会将字符串转换为 HTML 段落,而不留下任何空段落,nl2p_html() 会将段落标签包裹在 HTML 文档正文的内容周围。

我在几个小型 HTML 示例文件上对此进行了测试,以确保间距和其他因素不会破坏输出。 nl2p_html() 生成的代码也可能不符合 W3C,因为它会将锚点包裹在段落等周围,而不是相反。

希望这有帮助。

Here are a couple of functions that should help you to do what you want to do:

// nl2p
// This function will convert newlines to HTML paragraphs
// without paying attention to HTML tags. Feed it a raw string and it will
// simply return that string sectioned into HTML paragraphs

function nl2p($str) {
    $arr=explode("\n",$str);
    $out='';

    for($i=0;$i<count($arr);$i++) {
        if(strlen(trim($arr[$i]))>0)
            $out.='<p>'.trim($arr[$i]).'</p>';
    }
    return $out;
}



// nl2p_html
// This function will add paragraph tags around textual content of an HTML file, leaving
// the HTML itself intact
// This function assumes that the HTML syntax is correct and that the '<' and '>' characters
// are not used in any of the values for any tag attributes. If these assumptions are not met,
// mass paragraph chaos may ensue. Be safe.

function nl2p_html($str) {

    // If we find the end of an HTML header, assume that this is part of a standard HTML file. Cut off everything including the
    // end of the head and save it in our output string, then trim the head off of the input. This is mostly because we don't
    // want to surrount anything like the HTML title tag or any style or script code in paragraph tags. 
    if(strpos($str,'</head>')!==false) {
        $out=substr($str,0,strpos($str,'</head>')+7);
        $str=substr($str,strpos($str,'</head>')+7);
    }

    // First, we explode the input string based on wherever we find HTML tags, which start with '<'
    $arr=explode('<',$str);

    // Next, we loop through the array that is broken into HTML tags and look for textual content, or
    // anything after the >
    for($i=0;$i<count($arr);$i++) {
        if(strlen(trim($arr[$i]))>0) {
            // Add the '<' back on since it became collateral damage in our explosion as well as the rest of the tag
            $html='<'.substr($arr[$i],0,strpos($arr[$i],'>')+1);

            // Take the portion of the string after the end of the tag and explode that by newline. Since this is after
            // the end of the HTML tag, this must be textual content.
            $sub_arr=explode("\n",substr($arr[$i],strpos($arr[$i],'>')+1));

            // Initialize the output string for this next loop
            $paragraph_text='';

            // Loop through this new array and add paragraph tags (<p>...</p>) around any element that isn't empty
            for($j=0;$j<count($sub_arr);$j++) {
                if(strlen(trim($sub_arr[$j]))>0)
                    $paragraph_text.='<p>'.trim($sub_arr[$j]).'</p>';
            }

            // Put the text back onto the end of the HTML tag and put it in our output string
            $out.=$html.$paragraph_text;
        }

    }

    // Throw it back into our program
    return $out;
}

The first of these, nl2p(), takes a string as an input and converts it to an array wherever there is a newline ("\n") character. Then it goes through each element and if it finds one that isn't empty, it will wrap <p></p> tags around it and add it to a string, which is returned at the end of the function.

The second, nl2p_html(), is a more complicated version of the former. Pass an HTML file's contents to it as a string and it will wrap <p> and </p> tags around any non-HTML text. It does this by exploding a string into an array where the delimiter is the < character, which is the start of any HTML tag. Then, iterating through each of these elements, the code will look for the end of the HTML tag and take anything that comes after it into a new string.
This new string will itself be exploded into an array where the delimiter is a newline ("\n"). Looping through this new array, the code looks for elements that are not empty. When it finds some data, it will wrap it in paragraph tags and add it to an output string. When this loop is finished, this string will be added back onto the HTML code and this together will be amended to an output buffer string which is returned once the function has completed.

tl;dr: nl2p() will convert a string to HTML paragraphs without leaving any empty paragraphs and nl2p_html() will wrap paragraph tags around the contents of the body of an HTML document.

I tested this on a couple of small example HTML files to make sure that spacing and other things don't ruin the output. The code that's generated by nl2p_html() may not be W3C-compliant, either, as it will wrap anchors around paragraphs and the like rather than the other way around.

Hope this helps.

青萝楚歌 2024-11-13 05:37:45

由于使用正则表达式很难知道哪个在标签内,哪个不在标签内,我建议使用 DOM 解析器并处理生成的 DOM 对象:

$doc = new DOMDocument();
$doc->loadHTML("<body>Test<br><p>Test 2</p>Test 3</body>");
$content = $doc->documentElement->getElementsByTagName('body')[0]->childNodes;
for($i = 0; $i < $content->length; $i++) {
    $node = $content->item($i);
    if ($node->nodeType === XML_TEXT_NODE) { // '#text'
        $element = $doc->createElement('p');
        $node->parentNode->replaceChild($element, $node);
        $element->appendChild($node);
    }
}

Since it's hard to know by using regex which is inside a tag and which is not, I propose using a DOM parser and work on the resulting DOM object:

$doc = new DOMDocument();
$doc->loadHTML("<body>Test<br><p>Test 2</p>Test 3</body>");
$content = $doc->documentElement->getElementsByTagName('body')[0]->childNodes;
for($i = 0; $i < $content->length; $i++) {
    $node = $content->item($i);
    if ($node->nodeType === XML_TEXT_NODE) { // '#text'
        $element = $doc->createElement('p');
        $node->parentNode->replaceChild($element, $node);
        $element->appendChild($node);
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文