帮助!我在检索网页时出现内存不足的情况

发布于 2024-08-06 14:40:30 字数 895 浏览 3 评论 0 原文

我正在从网络上检索 HTML。我得到 "java.lang.OutOfMemoryError: Java heap space (repl-1:3)"

;; fetch: URL -> String
;; fetch returns the string of the HTML url
(defn fetch [url]
   (with-open [stream (. url openStream)]
      (let [buffer (BufferedReader. (InputStreamReader. stream))]
        (apply str (line-seq buffer)))))

我认为问题是 "apply str" 。有没有更简单的方法将

  • 缓冲读取器转换为字符串?
  • 或检索网页?

编辑:我需要检索

http://fiji4.ccs.neu.edu/~zerg/lemurcgi/lemur.cgi?g=p&v=or&v=measures&v=being&v=taken&v=反对,&v=腐败&v=公共&v=官员&v=of&v=任何&v=政府&v=司法管辖区&v=全世界。

I am retrieving the HTML from the web. I get "java.lang.OutOfMemoryError: Java heap space (repl-1:3)"

;; fetch: URL -> String
;; fetch returns the string of the HTML url
(defn fetch [url]
   (with-open [stream (. url openStream)]
      (let [buffer (BufferedReader. (InputStreamReader. stream))]
        (apply str (line-seq buffer)))))

I think the problem is the "apply str" . Is there an easier way to

  • Convert the buffered reader to string?
  • or retrieve the web page?

Edit: I need to retrieve

http://fiji4.ccs.neu.edu/~zerg/lemurcgi/lemur.cgi?g=p&v=or&v=measures&v=being&v=taken&v=against,&v=corrupt&v=public&v=officials&v=of&v=any&v=governmental&v=jurisdiction&v=worldwide.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

烟凡古楼 2024-08-13 14:40:30

哎呀。 line-seq 将为每一行创建一个 String 对象,然后您最终将其连接并丢弃,这将是缓慢而痛苦的。像这样使用 apply 会将所有这些字符串放入一个巨大的列表中,并对其调用 str ,这也会很痛苦。

试试这个:

(use 'clojure.contrib.duck-streams)  ;SO's syntax highlighting sucks
(slurp* (reader url))

slurp* 使用 StringBuilder 这是在 Java 中构建大字符串的更好方法。

Yikes. line-seq is going to create one String object per line, which you then eventually concatenate and discard, which is going to be slow and painful. Using apply like that is going to put all of those Strings into an enormous list and call str on that, which is also going to be painful.

Try this instead:

(use 'clojure.contrib.duck-streams)  ;SO's syntax highlighting sucks
(slurp* (reader url))

slurp* uses a StringBuilder which is a better way to build up a large string in Java.

再可℃爱ぅ一点好了 2024-08-13 14:40:30

你说的“太慢”是什么意思?我无法想象语言会有多大影响,因为这里的瓶颈是互联网。

What do you mean by it "being too slow"? I can't imagine the language would matter much since the bottleneck here would be the internet.

分分钟 2024-08-13 14:40:30

堆的当前大小是多少?您可以使用 JVM 参数通过 -X arg 指定更多堆空间。

有关详细信息,请参阅JVM 调整。如果您有更多时间,请尝试使用 Java Profiler 来查看应用程序内存不足的原因。尽管您可以调整堆空间的大小,但这只是一个临时解决方案。

What is the current size of the heap? You can use the JVM arguments to specify more heap space with -X arg.

See JVM Tuning for more information. If you have more time, try using a Java Profiler to see why you're application is running out of memory. Although, you can resize the heap space, it's a temporary solution.

浊酒尽余欢 2024-08-13 14:40:30

有两种可能性:

  1. 您要获取的内容的大小占可用堆空间的很大一部分,并且您的算法在读取/串联过程中需要工作存储大小的 2 或 3 倍。在这种情况下,增加堆空间是一个合理的解决方法。

  2. 该算法实际上是使用 O(N^2) 空间通过 apply 进行串联。 apply 的实现是递归的,并且 clojure 编译器/JIT 编译器生成带有大量中间字符串引用的递归代码,这并非不可想象。在这种情况下,增加堆空间是一个糟糕的解决方法。

不管怎样,我首先将 (apply str (line-seq buffer)) 替换为更有效的替代方案(请参阅@Brian的答案,以及我对@tomjen的答案的评论)......并且仅如果堆使用仍然是一个问题,请担心。 (我怀疑不会。)

There are two possibilities:

  1. The size of the content that you are fetching is a significant proportion of the available heap space, and your algorithm requires 2 or 3 times the size in working storage during the reading / concatenation process. In this case, increasing the heap space is a reasonable workaround.

  2. The algorithm is actually using O(N^2) space to do the concatenation using apply. It is not inconceivable that the implementation of apply is recursive and that the clojure compiler / JIT compiler are producing recursive code with lots of references to intermediate strings. In this case, increasing the heap space is a poor workaround.

Either way, I'd start by replacing (apply str (line-seq buffer)) with a more efficient alternative (see @Brian's answer, and my comment on @tomjen's answer) ... and only worry about the heap usage if it is still an issue. (I suspect that it won't be.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文