使用 Jsoup 保留行
我正在使用 Jsoup 从 html 获取一些数据,我有这样的代码:
System.out.println("nie jest");
StringBuffer url=new StringBuffer("http://www.darklyrics.com/lyrics/");
url.append(args[0]);
url.append("/");
url.append(args[1]);
url.append(".html");
//wyciaganie odpowiednich klas z naszego htmla
Document doc=Jsoup.connect(url.toString()).get();
Element lyrics=doc.getElementsByClass("lyrics").first();
Element tracks=doc.getElementsByClass("albumlyrics").first();
//Jso
//lista sciezek
int numberOfTracks=tracks.getElementsByTag("a").size();
一切都会好的,我提取我想要的数据,但是当我这样做时:
lyrics.text()
我得到的文本没有换行符,所以我想知道如何留下换行符在显示的文本中,我在 stackoverflow 上阅读了有关此事的其他线程,但它们没有帮助,我尝试做这样的事情:
TextNode tex=TextNode.createFromEncoded(lyrics.text(), lyrics.baseUri());
但我无法通过换行符获得我想要的文本。我查看了之前关于此问题的帖子, 使用 JSoup 删除 HTML 实体,同时保留换行符 但我达不到我想要的效果。我应该怎么办?
编辑:我得到了我想要的效果,但我认为这不是很好的解决方案:
for (Node nn:listOfNodes)
{
String s=Jsoup.parse(nn.toString()).text();
if ((nn.nodeName()=="#text" || nn.nodeName()=="h3"))
{
buf.append(s+"\n");
}
}
有人有更好的主意吗?
I am using Jsoup to get some data from html, I have this code:
System.out.println("nie jest");
StringBuffer url=new StringBuffer("http://www.darklyrics.com/lyrics/");
url.append(args[0]);
url.append("/");
url.append(args[1]);
url.append(".html");
//wyciaganie odpowiednich klas z naszego htmla
Document doc=Jsoup.connect(url.toString()).get();
Element lyrics=doc.getElementsByClass("lyrics").first();
Element tracks=doc.getElementsByClass("albumlyrics").first();
//Jso
//lista sciezek
int numberOfTracks=tracks.getElementsByTag("a").size();
Everything would be fine, I extracthe data I want, but when I do:
lyrics.text()
I get the text with no line breaks, so I am wondering how to leave line breaks in displayed text, I read other threads on stackoverflow on this matter but they weren't helpful, I tried to do something like this:
TextNode tex=TextNode.createFromEncoded(lyrics.text(), lyrics.baseUri());
but I can't get the text I want with line breaks. I looked at previous threads about this like,
Removing HTML entities while preserving line breaks with JSoup
but I can't get the effect I want. What should I do?
Edit: I got the effect I wanted but I don't think it is very good solution:
for (Node nn:listOfNodes)
{
String s=Jsoup.parse(nn.toString()).text();
if ((nn.nodeName()=="#text" || nn.nodeName()=="h3"))
{
buf.append(s+"\n");
}
}
Anyone got better idea?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以通过检查节点是否是
TextNode
的实例来获取文本节点(
之间的文本)。这应该适合您:(请注意,比较对象的内部值应该通过
equals()
方法完成,而不是==
;字符串是对象,不是原始类型)哦,我还建议阅读他们的隐私政策。
You could get the text nodes (the text between
<br />
s) by checking if the node is an instance ofTextNode
. This should work out for you:(please note that comparing the object's internal value should be done by
equals()
method, not==
; strings are objects, not primitives)Oh, I also suggest to read their privacy policy.