同一多方法的不同方法之间的递归

发布于 2024-10-15 19:30:34 字数 2788 浏览 0 评论 0原文

我正在编写一个 Clojure 库来解析 Mac OS X 的基于 XML 的 属性列表文件。该代码工作正常,除非您给它一个大的输入文件,此时您会收到 java.lang.OutOfMemoryError: Java heap space

下面是一个示例输入文件(足够小,可以正常工作):

<plist version="1.0">
<dict>
    <key>Integer example</key>
    <integer>5</integer>
    <key>Array example</key>
    <array>
        <integer>2</integer>
        <real>3.14159</real>
    </array>
    <key>Dictionary example</key>
    <dict>
        <key>Number</key>
        <integer>8675309</integer>
    </dict>
</dict>
</plist>

clojure.xml/parse 将其转换为:

{:tag :plist, :attrs {:version "1.0"}, :content [
    {:tag :dict, :attrs nil, :content [
        {:tag :key, :attrs nil, :content ["Integer example"]}
        {:tag :integer, :attrs nil, :content ["5"]}
        {:tag :key, :attrs nil, :content ["Array example"]}
        {:tag :array, :attrs nil, :content [
            {:tag :integer, :attrs nil, :content ["2"]}
            {:tag :real, :attrs nil, :content ["3.14159"]}
        ]}
        {:tag :key, :attrs nil, :content ["Dictionary example"]}
        {:tag :dict, :attrs nil, :content [
            {:tag :key, :attrs nil, :content ["Number"]}
            {:tag :integer, :attrs nil, :content ["8675309"]}
        ]}
    ]}
]}

我的代码将其转换为 Clojure 数据结构

{"Dictionary example" {"Number" 8675309},
 "Array example" [2 3.14159],
 "Integer example" 5}

我的代码的相关部分看起来

; extract the content contained within e.g. <integer>...</integer>
(defn- first-content
  [c]
  (first (c :content)))

; return a parsed version of the given tag
(defmulti content (fn [c] (c :tag)))

(defmethod content :array
  [c]
  (apply vector (for [item (c :content)] (content item))))

(defmethod content :dict
  [c]
  (apply hash-map (for [item (c :content)] (content item))))

(defmethod content :integer
  [c]
  (Long. (first-content c)))

(defmethod content :key
  [c]
  (first-content c))

(defmethod content :real
  [c]
  (Double. (first-content c)))

; take a java.io.File (or similar) and return the parsed version
(defn parse-plist
  [source]
  (content (first-content (clojure.xml/parse source))))

像code 是 content 函数,这是一种在 :tag(XML 标记的名称)上分派的多方法。我想知道我是否应该做一些不同的事情才能使这个递归工作得更好。我尝试将对 content 的所有三个调用替换为 trampoline content,但这不起作用。我应该做些什么来让这种相互递归更有效地工作吗?或者我采取了根本错误的方法?

编辑:顺便说一句,此代码可在 GitHub 上找到,位于哪种形式可能更容易使用。

I'm writing a Clojure library to parse Mac OS X's XML-based property list files. The code works fine unless you give it a large input file, at which point you get java.lang.OutOfMemoryError: Java heap space.

Here's an example input file (small enough to work fine):

<plist version="1.0">
<dict>
    <key>Integer example</key>
    <integer>5</integer>
    <key>Array example</key>
    <array>
        <integer>2</integer>
        <real>3.14159</real>
    </array>
    <key>Dictionary example</key>
    <dict>
        <key>Number</key>
        <integer>8675309</integer>
    </dict>
</dict>
</plist>

clojure.xml/parse turns this into:

{:tag :plist, :attrs {:version "1.0"}, :content [
    {:tag :dict, :attrs nil, :content [
        {:tag :key, :attrs nil, :content ["Integer example"]}
        {:tag :integer, :attrs nil, :content ["5"]}
        {:tag :key, :attrs nil, :content ["Array example"]}
        {:tag :array, :attrs nil, :content [
            {:tag :integer, :attrs nil, :content ["2"]}
            {:tag :real, :attrs nil, :content ["3.14159"]}
        ]}
        {:tag :key, :attrs nil, :content ["Dictionary example"]}
        {:tag :dict, :attrs nil, :content [
            {:tag :key, :attrs nil, :content ["Number"]}
            {:tag :integer, :attrs nil, :content ["8675309"]}
        ]}
    ]}
]}

My code turns this into the Clojure data structure

{"Dictionary example" {"Number" 8675309},
 "Array example" [2 3.14159],
 "Integer example" 5}

The relevant part of my code looks like

; extract the content contained within e.g. <integer>...</integer>
(defn- first-content
  [c]
  (first (c :content)))

; return a parsed version of the given tag
(defmulti content (fn [c] (c :tag)))

(defmethod content :array
  [c]
  (apply vector (for [item (c :content)] (content item))))

(defmethod content :dict
  [c]
  (apply hash-map (for [item (c :content)] (content item))))

(defmethod content :integer
  [c]
  (Long. (first-content c)))

(defmethod content :key
  [c]
  (first-content c))

(defmethod content :real
  [c]
  (Double. (first-content c)))

; take a java.io.File (or similar) and return the parsed version
(defn parse-plist
  [source]
  (content (first-content (clojure.xml/parse source))))

The meat of the code is the content function, a multimethod that dispatches on the :tag (the name of the XML tag). I'm wondering whether there is something different I should be doing in order to make this recursion work better. I tried replacing all three calls to content with trampoline content, but that didn't work. Is there anything fancy I should do to get this mutual recursion to work more efficiently? Or am I taking a fundamentally wrong approach?

Edit: By the way, this code is available on GitHub, in which form it might be easier to play around with.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

零度° 2024-10-22 19:30:34

您有来自单个方法的多个(每个子级)递归调用,因此您的代码不是(并且不能没有大量的重组)尾递归。 Trampoline 旨在用于相互尾递归函数。

您的大型 XML 文件有多深、多长?我这么问是因为你得到的是 OoM 而不是 SO。

不管怎样,为了解决你的递归问题(这不太可能是导致异常的问题),你必须遍历你的 XML 数据结构(例如使用 xml-zip),同时维护一个堆栈(向量或列表)代表正在构建的结果树。
具有讽刺意味的是,XML 数据结构的遍历在某种程度上等同于用于构建该结构的 sax 事件。

You have multiple (one per child) recursive calls from a single method so your code isn't (and can't be without a heavy reorg)tail-recursive. trampoline is intended for mutual tail-recursive functions.

How deep, how long is your large XML file? I'm asking because you are getting an OoM not a SO.

Anyway, to solve your recursion problem (which is unlikely to be the one causing the exception) you have to walk down your XML datastructure (eg with xml-zip) while maintaining a stack (vector or list) representing your result tree under construction.
It's ironic that the traversal of the XML datastructure is somewhat equivalent to the sax events which were used to build the structure.

╭ゆ眷念 2024-10-22 19:30:34

大量递归将导致 StackOverflowException,而不是 OutOfMemoryError。此外,这里的递归似乎不是很深(根据示例中的 XML 文件,只有 3 个级别)。

我的猜测是,抛出 OutOfMemoryError 是因为大型 XML 文件被解析成的数据结构太大,无法放入 JVM 堆中。您可以尝试使用 -Xms-Xmx 选项增加堆大小。然而,解析大型 XML 文件的正确方法是使用 SAX 事件而不是构建树(DOM 或 Clojure 数据结构)。

Heavy recursion will cause a StackOverflowException, not an OutOfMemoryError. Also the recursion does not seem to be very deep here (only 3 levels as per the XML file in your example).

My guess is, the OutOfMemoryError is being thrown because the data structure your large XML files are being parsed into are too large to fit in the JVM heap. You can try increasing the heap size using -Xms and -Xmx options. However, the correct way to parse huge XML files is to use SAX events rather than building a tree (DOM or Clojure data structure).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文