解组散列数据的技术选择
似乎有相当多的民间传说知识在有限的圈子里流传,关于哈希构造与数据编组-解组相结合的陷阱。我正在寻找这些花絮的可引用参考文献。
例如,有人曾经向我指出图书馆 aterm 并提到作者已经清楚地考虑到了这一点,并且磁盘上的表示是自下而上的(节点的子节点在数据流中位于节点本身之前)。当您需要重新共享每个节点(内存中可能存在相同的节点)时,这确实是正确的做法。这种重新共享过程需要自下而上地完成,因此解组本身也可能是自下而上的,这样就可以在一次过程中完成所有操作。
我正在描述我们自己遇到的困难以及我们找到的解决方案。我将不胜感激任何对上述民俗知识的引用。显然有些人以前遇到过这些问题(aterm 库只是一个例子)。但我没有找到任何书面内容。就连我所掌握的关于 aterm 的一点点信息也都是道听途说。我并不担心它不可靠(你无法弥补),但“个人交流”和“看看它在源代码中是如何完成的”在引用中被认为是糟糕的形式。
仅关于散列consing我就有足够的参考资料。我只对干扰编程其他方面(例如编组或分发)的参考感兴趣。
There seems to be quite a bit of folklore knowledge floating about in restricted circles about the pitfalls of hash-consing combined with marshalling-unmarshalling of data. I am looking for citable references to these tidbits.
For instance, someone once pointed me to library aterm and mentioned that the authors had clearly thought about this and that the representation on disk was bottom-up (children of a node come before the node itself in the data stream). This is indeed the right way to do things when you need to re-share each node (with a possible identical node already in memory). This re-sharing pass needs to be done bottom-up, so the unmarshalling itself might as well be, too, so that it's possible to do everything in a single pass.
I am in the process of describing difficulties encountered in our own context, and the solutions we found. I would appreciate any citable reference to the kind of aforementioned folklore knowledge. Some people obviously have encountered the problems before (the aterm library is only one example). But I didn't find anything in writing. Even the little piece of information I have about aterm is hear-say. I am not worried it's not reliable (you can't make this up), but "personal communication" and "look how it's done in the source code" are considered poor form in citations.
I have enough references on hash-consing alone. I am only interested in references where it interferes with other aspects of programming, such as marshalling or distribution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
好吧,这并没有多大用处,但是 Andrew Kennedy 写了一个函数式珍珠,简称为“Pickling Combinators”,它出现在《函数式编程杂志》(Journal of Function Program) (2004),14:6:727-739 中。对结构共享以及如何在 pickles 中处理它进行了广泛的讨论,但没有直接讨论这个问题如何与该语言的实现中的哈希构造相关。但这篇文章确实讨论了内存中和 pickle 中的结构共享,所以我希望它总比没有好。
Martin Elsman 于 2005 年在函数式编程趋势中发表了一篇后续论文;标题是具有共享的类型专用序列化。本文主要讨论 unpickler(反序列化器)的哈希处理,而不是实现中的哈希处理,但它可能还是有价值的。
JFP 论文是专有的,但似乎有一个 安德鲁网页上的预印本。
Elsman 的论文似乎可以通过 Google Scholar 获取,网址为 http://tinyurl.com/yd5tw2b。
(在前世,我参与了一个创建人们可以阅读和编辑的 ASCII pickle 的项目。我愚蠢地未能发布它,但我保留了兴趣。)
OK, this is not much more use, but Andrew Kennedy wrote a functional pearl called simply Pickling Combinators, which appears in the Journal of Functional Programming, (2004), 14:6:727-739. There is extensive discussion of structure sharing and how it is handled in pickles, but no direct discussion of how this problem might relate to hash-consing in the implementation of the language. But the article does discuss structure sharing in memory as well as in a pickle, so I hope it is better than nothing.
Martin Elsman had a follow-on paper in 2005 in Trends in Functional Programming; the title is Type-specialized serialization with sharing. The article deals primarily with hash-consing by the unpickler (deserializer), not with hash-consing in the impelementation, but again it may be worth something.
The JFP paper is proprietary, but there appears to be a preprint on Andrew's web page.
Elsman's paper appears to be available through Google Scholar at http://tinyurl.com/yd5tw2b.
(In a previous life, I worked on a project to create ASCII pickles that people could read and edit. I stupidly failed to publish it, but I have retained an interest.)
我找到了一篇关于函数式语言编组的参考文献;不确定它是否有用,但作者很聪明: http://tinyurl.com/yc3hob9
我相信马蒂亚斯·布鲁姆和/或安德鲁·阿佩尔在这方面做了一些事情,但我找不到这篇论文。我还相信我曾为函数式编程杂志审阅过一些内容,但我不记得这篇论文是否被接受或是谁写的。
我建议您询问 Matthias Blume、Andrew Appel 和 Phil Wadler,他们是否可以提供帮助。
I found one reference on marshalling in functional languages; not sure if it will be useful, but the authors are smart: http://tinyurl.com/yc3hob9
I believe that Matthias Blume and/or Andrew Appel did something on this, but I can't find the paper. I also believe I reviewed something once for the Journal of Functional Programming, but I can't remember if the paper was accepted or who wrote it.
I suggest you ask Matthias Blume, Andrew Appel, and Phil Wadler if they can help.
Coq V5.10 具有散列构造和编组/解组功能。我没有找到任何已发布的形式,但解组步骤将在源代码中被称为“重新驻留”。 Coq 解组值,然后遍历它们以重新创建共享,当所有语言都提供时,显而易见且唯一的解决方案是
int_channel -> 类型的
.unmarshal
函数。 'aCoq V5.10 had hash-consing and marshaling/unmarshaling. I didn't find anything in published form but the unmarshaling steps would be referenced as "reinterning" in the source code. Coq unmarhsaled values and then traversed them in order to re-create sharing, the obvious and only solution when all the language provides is an
unmarshal
function of typeint_channel -> 'a
.