Jsoup 停止解析网页
Jsoup.parse(String html) 停止工作。当我使用 jsoup 几次来解析不同的页面时,我有一个应用程序,但是当我想解析一个大页面时,jsoup 就会停止,仅此而已。页面大小是否有限制或最大?
java.lang.OutOfMemoryError
at java.lang.Object.internalClone(Native Method)
at java.lang.Object.clone(Object.java:82)
at java.lang.AbstractStringBuilder.append0(AbstractStringBuilder.java:172)
at java.lang.StringBuilder.append(StringBuilder.java:224)
at org.jsoup.parser.Tokeniser.emit(Tokeniser.java:76)
at org.jsoup.parser.TokeniserState$1.read(TokeniserState.java:26)
at org.jsoup.parser.Tokeniser.read(Tokeniser.java:42)
at org.jsoup.parser.TreeBuilder.runParser(TreeBuilder.java:101)
at org.jsoup.parser.TreeBuilder.parse(TreeBuilder.java:53)
at org.jsoup.parser.Parser.parse(Parser.java:24)
at org.jsoup.Jsoup.parse(Jsoup.java:44)
...
编辑: 我获取了页面的子字符串,其中包含数千个第一个字符,然后它设法解析它。 所以看来 Jsoup 有它可以管理的字符限制。可能数据类型类型在这里很重要。
编辑:,编辑: 在分析了一些可能的错误并尝试编写自己的 HTML 解析器(这导致了很大的压力)之后,我发现 Dalvik VM 在堆上仅分配了 4,3 MB,我认为这与 PC 不同到电脑..会尝试增加它..
Jsoup.parse(String html) stops working. I have an application when i use jsoup for few times to parse different pages, but when i want to parse a big page, jsoup just stops and that is all. Does it have a limit or a maximum size of a page?
java.lang.OutOfMemoryError
at java.lang.Object.internalClone(Native Method)
at java.lang.Object.clone(Object.java:82)
at java.lang.AbstractStringBuilder.append0(AbstractStringBuilder.java:172)
at java.lang.StringBuilder.append(StringBuilder.java:224)
at org.jsoup.parser.Tokeniser.emit(Tokeniser.java:76)
at org.jsoup.parser.TokeniserState$1.read(TokeniserState.java:26)
at org.jsoup.parser.Tokeniser.read(Tokeniser.java:42)
at org.jsoup.parser.TreeBuilder.runParser(TreeBuilder.java:101)
at org.jsoup.parser.TreeBuilder.parse(TreeBuilder.java:53)
at org.jsoup.parser.Parser.parse(Parser.java:24)
at org.jsoup.Jsoup.parse(Jsoup.java:44)
...
EDIT:
I took the substring of a page for some thousand first characters and then it managed to parse it.
So it seems that Jsoup has a limit of characters that it can manage.. Probably Datatype type is important here.
EDIT:, EDIT:
After analysing a little about what could be an error and trying to write my own HTML parser, which led to a lots of stress, i found out that Dalvik VM assigns only 4,3 MB on the Heap, which i assume is different from pc to pc.. Gonna try to increase it..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试使用其他方法(如 HttpClient)获取页面内容,然后调用
Try getting the page content with another method like HttpClient and then call