避免与 JSoup 进行无空格连接
假设我有一个这样的 div:
<div>
This is a paragraph
written by someone
on the internet.
</div>
问题是,当 JSoup 解析它时,它会将其全部放在一行上,这样当我调用 text() 时,它会这样读:
This is a paragraphwritten by someoneon the internet.
现在,我意识到这并不是真正的 JSoup 问题,因为实际的 html 不包含空格。但是,有没有什么方法可以使用 JSoup (也许是一些覆盖,或者可能是我没有见过的选项),以便在解析时会在行之间添加空格?我想这一定是可能的(因为我可以检查 Chrome 中的元素并取消选择自动换行,它会得到我想要的)但我不确定 JSoup 可以做到这一点。
有什么想法吗?
Suppose I have a div as such:
<div>
This is a paragraph
written by someone
on the internet.
</div>
The problem is that when JSoup parses this, it puts it all on one line, so that when I call text() it reads as such:
This is a paragraphwritten by someoneon the internet.
Now, I realize this isn't really a JSoup problem, in that the actual html doesn't contain a space. However, is there any way to use JSoup (perhaps some override or maybe an option I haven't seen) so that as it parses it will add a space between lines? I imagine it must be possible (as I can inspect element in Chrome and unselect word wrap and it gets what I want) but I'm not sure JSoup can do this.
Any thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您能提供完整的代码示例吗?你使用什么版本的jsoup?
在当前版本 (1.6.1) 中,此代码:
生成:
这是互联网上某人编写的段落。
即,
\n
(和\ r\n
等)被转换为文本作为空格。如果我可以复制的话,很乐意修复或改进它:)
Can you provide a full example of your code? What version of jsoup are you using?
In the current version (1.6.1), this code:
Produces:
This is a paragraph written by someone on the internet.
I.e.,
\n
(and\r\n
etc) are converted to text as spaces.Happy to fix or improve it, if I can replicate :)
下面的文章展示了如何获取所有内容,包括换行符
删除 HTML实体,同时使用 JSoup 保留换行符
下面的答案和注释还有另一种方式(阅读其中的注释)
从字符串中删除 HTML 标签
如果您检查所有答案和评论
如何保留使用jsoup将html转换为纯文本时换行吗?
the following post shows how you get everything including the line break
Removing HTML entities while preserving line breaks with JSoup
the answer and comment in the following also has another way (read the comment in it)
Remove HTML tags from a String
and this one has even another way if you check all the answers and the comments
How do I preserve line breaks when using jsoup to convert html to plain text?