保留换行符 - 简单的 HTML DOM 解析器
使用 PHP Simple HTML DOM Parser 时,换行符
标签被删除是否正常?
When using PHP Simple HTML DOM Parser, is it normal that line breaks
tags are stripped out?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我知道这很旧,但我也在寻找这个,并意识到实际上有一个内置选项可以关闭换行符的删除。无需去编辑源代码。
PHP Simple HTML Dom Parser 的
load
函数支持多个有用的参数:调用
load
函数时,只需传递false
作为第三个参数。如果使用
file_get_html
,它是第九个参数。编辑:对于
str_get_html
,它是第五个参数(感谢yitwail)I know this is old, but I was looking for this as well, and realized there was actually a built in option to turn off the removal of line breaks. No need to go editing the source.
The PHP Simple HTML Dom Parser's
load
function supports multiple useful parameters:When calling the
load
function, simply passfalse
as the third parameter.If using
file_get_html
, it's the ninth parameter.Edit: For
str_get_html
, it's the fifth parameter (Thanks yitwail)我也在努力解决这个问题,因为我需要 HTML 在处理后能够轻松编辑。
显然,
SimpleHTMLDOM
脚本$stripRN
中有一个布尔值,默认设置为true
。它会去除 HTML 中的\r
、\n
或\r\n
标记。将 var 设置为
false
(脚本中出现了几次......),您的问题就解决了。Was struggling with this as well, since I needed the HTML to be easily editable after processing.
Apparently there's a boolean in the
SimpleHTMLDOM
script$stripRN
, that's set totrue
on default. It strips the\r
,\n
or\r\n
tags in the HTML.Set the var to
false
(several occurences in the script..) and your problem is solved.您不必将所有
$stripRN
更改为 false,影响此行为的唯一一个是第 816 行``:还要考虑更改第 988 行,因为计算机上通常不安装多字节函数不涉及非西欧语言。 v1.5 中的原始行立即中断了脚本:
You don't have to change all
$stripRN
to false, the only one that affects this behavior is at line 816 ``:Also consider to change line 988, because multibyte functions often are not installed on machines that do not deal with non-wester-european languages. Original line in v1.5 breaks the script immediately:
如果您路过这里,想知道是否可以在 DomDocument 中做同样的事情,那么我很高兴地说您可以! - 但它有点脏:(
我有一段代码想要整理,但保留它包含的确切换行符 (\n)。
这就是我所做的......
重要的是要注意,我毫无疑问地知道我的输入仅包含 \n。如果需要考虑 \r\n 或 \t,您可能需要自己的变体。例如斜线.T 或斜线.RN 等
If you were passing by here wondering if you can do the same thing in DomDocument then I'm please to say you can! - but it's a bit dirty :(
I had a snippet of code I wanted to tidy but retain the exact line breaks it contained (\n).
This is what I did....
It's important to note that I know, without a shadow of a doubt that my input contained only \n. You may want your own variations if \r\n or \t needs to be accounted for. eg slash.T or slash.RN etc
另一种选择是希望保留其他格式,例如段落和段落。 headers 的方法是使用
innertext
而不是plaintext
,然后对结果执行您自己的字符串清理。我意识到这会影响性能,但它确实允许更精细的控制。
Another option should one wish to preserve other formatting such as paragraphs & headings is to use
innertext
rather thanplaintext
then perform your own string cleaning with the result.I realise there is a performance hit but it does allow for more granular control.