如何使用 Java 创建 XHTML 的基本人类可读纯文本表示形式?
给定一些简单的 XHTML,我想创建它的人类可读的纯文本版本。这将涉及删除所有 HTML 标签,但添加或保留一些空白。
例如,此输入:
<div>
<p>This is some text, some is <b>bold</b>.</p>
<ul>
<li>Point one</li>
<li>Point two</li>
</ul>
</div>
将变为:
"This is some text, some is bold. Point one Point two"
(LI 之间的逗号是理想的...:)
Given some simple XHTML, I'd like to create a human readable plain text version of it. This would involve removing all HTML tags, but adding or preserving some whitespace.
For example, this input:
<div>
<p>This is some text, some is <b>bold</b>.</p>
<ul>
<li>Point one</li>
<li>Point two</li>
</ul>
</div>
would become:
"This is some text, some is bold. Point one Point two"
(commas between the LIs would be ideal... :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Jericho HTML 解析器。您可以删除所有标签或调用尝试模仿外观的“渲染器”类(例如,您的项目符号列表将被标记)
Jericho HTML Parser. You can either strip all the tags or call on a "renderer" class that tries to mimick the look (eg your bulleted lists would be tabbed)