我正在从网络上检索 HTML。我得到 "java.lang.OutOfMemoryError: Java heap space (repl-1:3)"
;; fetch: URL -> String
;; fetch returns the string of the HTML url
(defn fetch [url]
(with-open [stream (. url openStream)]
(let [buffer (BufferedReader. (InputStreamReader. stream))]
(apply str (line-seq buffer)))))
我认为问题是 "apply str" 。有没有更简单的方法将
编辑:我需要检索
http://fiji4.ccs.neu.edu/~zerg/lemurcgi/lemur.cgi?g=p&v=or&v=measures&v=being&v=taken&v=反对,&v=腐败&v=公共&v=官员&v=of&v=任何&v=政府&v=司法管辖区&v=全世界。
I am retrieving the HTML from the web. I get "java.lang.OutOfMemoryError: Java heap space (repl-1:3)"
;; fetch: URL -> String
;; fetch returns the string of the HTML url
(defn fetch [url]
(with-open [stream (. url openStream)]
(let [buffer (BufferedReader. (InputStreamReader. stream))]
(apply str (line-seq buffer)))))
I think the problem is the "apply str" . Is there an easier way to
- Convert the buffered reader to string?
- or retrieve the web page?
Edit: I need to retrieve
http://fiji4.ccs.neu.edu/~zerg/lemurcgi/lemur.cgi?g=p&v=or&v=measures&v=being&v=taken&v=against,&v=corrupt&v=public&v=officials&v=of&v=any&v=governmental&v=jurisdiction&v=worldwide.
发布评论
评论(4)
哎呀。 line-seq 将为每一行创建一个 String 对象,然后您最终将其连接并丢弃,这将是缓慢而痛苦的。像这样使用
apply
会将所有这些字符串放入一个巨大的列表中,并对其调用str
,这也会很痛苦。试试这个:
slurp*
使用StringBuilder
这是在 Java 中构建大字符串的更好方法。Yikes.
line-seq
is going to create oneString
object per line, which you then eventually concatenate and discard, which is going to be slow and painful. Usingapply
like that is going to put all of those Strings into an enormous list and callstr
on that, which is also going to be painful.Try this instead:
slurp*
uses aStringBuilder
which is a better way to build up a large string in Java.你说的“太慢”是什么意思?我无法想象语言会有多大影响,因为这里的瓶颈是互联网。
What do you mean by it "being too slow"? I can't imagine the language would matter much since the bottleneck here would be the internet.
堆的当前大小是多少?您可以使用 JVM 参数通过 -X arg 指定更多堆空间。
有关详细信息,请参阅JVM 调整。如果您有更多时间,请尝试使用 Java Profiler 来查看应用程序内存不足的原因。尽管您可以调整堆空间的大小,但这只是一个临时解决方案。
What is the current size of the heap? You can use the JVM arguments to specify more heap space with -X arg.
See JVM Tuning for more information. If you have more time, try using a Java Profiler to see why you're application is running out of memory. Although, you can resize the heap space, it's a temporary solution.
有两种可能性:
您要获取的内容的大小占可用堆空间的很大一部分,并且您的算法在读取/串联过程中需要工作存储大小的 2 或 3 倍。在这种情况下,增加堆空间是一个合理的解决方法。
该算法实际上是使用 O(N^2) 空间通过
apply
进行串联。apply
的实现是递归的,并且 clojure 编译器/JIT 编译器生成带有大量中间字符串引用的递归代码,这并非不可想象。在这种情况下,增加堆空间是一个糟糕的解决方法。不管怎样,我首先将
(apply str (line-seq buffer))
替换为更有效的替代方案(请参阅@Brian的答案,以及我对@tomjen的答案的评论)......并且仅如果堆使用仍然是一个问题,请担心。 (我怀疑不会。)There are two possibilities:
The size of the content that you are fetching is a significant proportion of the available heap space, and your algorithm requires 2 or 3 times the size in working storage during the reading / concatenation process. In this case, increasing the heap space is a reasonable workaround.
The algorithm is actually using O(N^2) space to do the concatenation using
apply
. It is not inconceivable that the implementation ofapply
is recursive and that the clojure compiler / JIT compiler are producing recursive code with lots of references to intermediate strings. In this case, increasing the heap space is a poor workaround.Either way, I'd start by replacing
(apply str (line-seq buffer))
with a more efficient alternative (see @Brian's answer, and my comment on @tomjen's answer) ... and only worry about the heap usage if it is still an issue. (I suspect that it won't be.)