在java中将分隔符和段落分隔符转换为新行
基本上我有一个 HTML 片段,里面有
和 。我能够删除所有 HTML 标签,但这样做会使文本格式错误。
我想要 PHP 中类似 nl2br()
的东西,除了反转输入和输出,并考虑
标签。 Java 中有一个库吗?
Basically I have an HTML fragment with <br>
and <p></p>
inside. I was able to remove all the HTML tags but doing so leaves the text in a bad format.
I want something like nl2br()
in PHP except reverse the input and output and also takes into account <p>
tags. is there a library for it in Java?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您基本上需要将每个
替换为\n
并将每个替换为
\n\n
代码>.因此,在成功删除它们的地方,您需要分别插入\n
和\n\n
。这是一个在 Jsoup HTML 解析器的帮助下的启动示例(HTML 示例是故意这样编写的,因此即使不是几乎也很难不可能为此使用正则表达式)。
(注意:
replaceAll()
是不必要的,因为我们只想在这里进行简单的逐个字符序列替换,而不是逐个字符序列替换 regexpattern)输出:
有点 hacky,但它有效。
You basically need to replace each
<br>
with\n
and each<p>
with\n\n
. So, at the points where you succeed to remove them, you need to insert the\n
and\n\n
respectively.Here's a kickoff example with help of the Jsoup HTML parser (the HTML example is intentionally written that way so that it's hard if not nearly impossible to use regex for this).
(note:
replaceAll()
is unnecessary as we just want a simple charsequence-by-charsequence replacement here, not regexpattern-by-charsequence replacement)Output:
A bit hacky, but it works.
br2nl
和p2nl
并不太复杂。尝试一下:br2nl
andp2nl
are not too complicated. Give this a try:您应该能够使用replaceAll。请参阅http://www.rgagnon.com/javadetails/java-0454.html 为例。只有其中 2 个,一个用于 p,一个用于 br。这个例子是相反的,但你可以改变它,用斜杠 n 替换 html
You should be able to use replaceAll. See http://www.rgagnon.com/javadetails/java-0454.html for an example. Just 2 of those, one for p and one for br. The example is going the other way, but you can change it around to replace the html with slash n