在java中将分隔符和段落分隔符转换为新行

发布于 2024-09-06 22:29:14 字数 283 浏览 4 评论 0原文

基本上我有一个 HTML 片段,里面有

。我能够删除所有 HTML 标签,但这样做会使文本格式错误。

我想要 PHP 中类似 nl2br() 的东西,除了反转输入和输出,并考虑

标签。 Java 中有一个库吗?

Basically I have an HTML fragment with <br> and <p></p> inside. I was able to remove all the HTML tags but doing so leaves the text in a bad format.

I want something like nl2br() in PHP except reverse the input and output and also takes into account <p> tags. is there a library for it in Java?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

糖果控 2024-09-13 22:29:14

您基本上需要将每个
替换为 \n 并将每个

替换为 \n\n代码>.因此,在成功删除它们的地方,您需要分别插入 \n\n\n

这是一个在 Jsoup HTML 解析器的帮助下的启动示例(HTML 示例是故意这样编写的,因此即使不是几乎也很难不可能为此使用正则表达式)。

public static void main(String[] args) throws Exception {
    String originalHtml = "<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>";
    String text = br2nl(originalHtml);
    String newHtml = nl2br(text);

    System.out.println("-------------");
    System.out.println(text);
    System.out.println("-------------");
    System.out.println(newHtml);
}

public static String br2nl(String html) {
    Document document = Jsoup.parse(html);
    document.select("br").append("\\n");
    document.select("p").prepend("\\n\\n");
    return document.text().replace("\\n", "\n");
}

public static String nl2br(String text) {
    return text.replace("\n\n", "<p>").replace("\n", "<br>");
}

(注意:replaceAll() 是不必要的,因为我们只想在这里进行简单的逐个字符序列替换,而不是逐个字符序列替换 regexpattern)

输出:

<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>
-------------


p1l1 
p1l2 



p2l1 
p2l2
-------------
<p>p1l1 <br>p1l2 <br> <br> <p>p2l1 <br>p2l2

有点 hacky,但它有效。

You basically need to replace each <br> with \n and each <p> with \n\n. So, at the points where you succeed to remove them, you need to insert the \n and \n\n respectively.

Here's a kickoff example with help of the Jsoup HTML parser (the HTML example is intentionally written that way so that it's hard if not nearly impossible to use regex for this).

public static void main(String[] args) throws Exception {
    String originalHtml = "<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>";
    String text = br2nl(originalHtml);
    String newHtml = nl2br(text);

    System.out.println("-------------");
    System.out.println(text);
    System.out.println("-------------");
    System.out.println(newHtml);
}

public static String br2nl(String html) {
    Document document = Jsoup.parse(html);
    document.select("br").append("\\n");
    document.select("p").prepend("\\n\\n");
    return document.text().replace("\\n", "\n");
}

public static String nl2br(String text) {
    return text.replace("\n\n", "<p>").replace("\n", "<br>");
}

(note: replaceAll() is unnecessary as we just want a simple charsequence-by-charsequence replacement here, not regexpattern-by-charsequence replacement)

Output:

<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>
-------------


p1l1 
p1l2 



p2l1 
p2l2
-------------
<p>p1l1 <br>p1l2 <br> <br> <p>p2l1 <br>p2l2

A bit hacky, but it works.

风吹短裙飘 2024-09-13 22:29:14

br2nlp2nl 并不太复杂。尝试一下:

String plain = htmlText.replaceAll("<br>","\\n").replaceAll("<p>","\\n\\n").replaceAll("</p>","");

br2nl and p2nl are not too complicated. Give this a try:

String plain = htmlText.replaceAll("<br>","\\n").replaceAll("<p>","\\n\\n").replaceAll("</p>","");
ゞ花落谁相伴 2024-09-13 22:29:14

You should be able to use replaceAll. See http://www.rgagnon.com/javadetails/java-0454.html for an example. Just 2 of those, one for p and one for br. The example is going the other way, but you can change it around to replace the html with slash n

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文