避免与 JSoup 进行无空格连接

发布于 2024-11-30 16:35:46 字数 463 浏览 0 评论 0原文

假设我有一个这样的 div:

<div>
This is a paragraph
written by someone
on the internet.
</div>

问题是,当 JSoup 解析它时,它会将其全部放在一行上,这样当我调用 text() 时,它会这样读:

This is a paragraphwritten by someoneon the internet.

现在,我意识到这并不是真正的 JSoup 问题,因为实际的 html 不包含空格。但是,有没有什么方法可以使用 JSoup (也许是一些覆盖,或者可能是我没有见过的选项),以便在解析时会在行之间添加空格?我想这一定是可能的(因为我可以检查 Chrome 中的元素并取消选择自动换行,它会得到我想要的)但我不确定 JSoup 可以做到这一点。

有什么想法吗?

Suppose I have a div as such:

<div>
This is a paragraph
written by someone
on the internet.
</div>

The problem is that when JSoup parses this, it puts it all on one line, so that when I call text() it reads as such:

This is a paragraphwritten by someoneon the internet.

Now, I realize this isn't really a JSoup problem, in that the actual html doesn't contain a space. However, is there any way to use JSoup (perhaps some override or maybe an option I haven't seen) so that as it parses it will add a space between lines? I imagine it must be possible (as I can inspect element in Chrome and unselect word wrap and it gets what I want) but I'm not sure JSoup can do this.

Any thoughts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

土豪 2024-12-07 16:35:46

您能提供完整的代码示例吗?你使用什么版本的jsoup?

在当前版本 (1.6.1) 中,此代码:

Document doc = Jsoup.parse("<div>\n" +
    "This is a paragraph\n" +
    "written by someone\n" +
    "on the internet.\n" +
    "</div>");
System.out.println(doc.text());

生成:

这是互联网上某人编写的段落。

即,\n (和 \ r\n 等)被转换为文本作为空格。

如果我可以复制的话,很乐意修复或改进它:)

Can you provide a full example of your code? What version of jsoup are you using?

In the current version (1.6.1), this code:

Document doc = Jsoup.parse("<div>\n" +
    "This is a paragraph\n" +
    "written by someone\n" +
    "on the internet.\n" +
    "</div>");
System.out.println(doc.text());

Produces:

This is a paragraph written by someone on the internet.

I.e., \n (and \r\n etc) are converted to text as spaces.

Happy to fix or improve it, if I can replicate :)

以歌曲疗慰 2024-12-07 16:35:46

下面的文章展示了如何获取所有内容,包括换行符

删除 HTML实体,同时使用 JSoup 保留换行符

下面的答案和注释还有另一种方式(阅读其中的注释)

从字符串中删除 HTML 标签

如果您检查所有答案和评论

如何保留使用jsoup将html转换为纯文本时换行吗?

the following post shows how you get everything including the line break

Removing HTML entities while preserving line breaks with JSoup

the answer and comment in the following also has another way (read the comment in it)

Remove HTML tags from a String

and this one has even another way if you check all the answers and the comments

How do I preserve line breaks when using jsoup to convert html to plain text?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文