Java 中图的空间有效表示?
我想要一个无向图,其中节点用一对标记(当前使用 String[] )并且可以任意链接到其他节点。我从 Hashtable 类型开始。事实证明,这对我来说空间效率不够——我打算拥有大约 60,000 个节点(最终远远超过这个数字)。
我应该如何实现这种图才能提高内存效率?相反,我应该考虑某种关系数据库吗?
I want to have an undirected graph where the nodes are labelled with a pair (currently using String[] for this) and can be arbitraryly linked to other nodes. I have started with the type Hashtable. It turns out that this is not space efficient enough for me - I intend on having around 60,000 nodes (eventually, well in excess of that number).
How should I implement this kind of graph so as to be more memory efficient? Should I, instead, be considering some kind of Relational Database?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果空间效率是您的首要任务,那么您可以牺牲图形操作的时间效率并取消哈希表(我假设您使用哈希表来存储节点的标记链接)。只需切换到数组并产生比较图操作上的标签值的成本:
如果您希望进一步压缩内存使用量并且标签空间是有限的(即标签对于给定节点不是唯一的;例如“父级”是一次又一次出现的标签)然后考虑使用每个 flyweight 模式的自定义
Label
类 因此您不会重复String
的实例。If space efficiency is your priority, then you can sacrifice time efficiency on graph operations and do away with the Hashtable (which I assume you are using for storing a node's labeled links). Simply switch to an array and incur the cost of comparing label values on graph operations:
If you wish to further squeeze the memory usage and your space of labels is finite (i.e. labels are not unique for a given node; e.g. "parent" is a label that occurs again and again) then consider using a custom
Label
class per flyweight pattern so you do not duplicate instances ofString
.您主要关心的是序列化时磁盘上的大小,还是内存中的大小?
如果您担心内存大小,并且不一定需要同时将每个节点保存在内存中,您可能需要考虑使用某种类型的延迟加载,例如 透明激活 带有 db4o
Is your main concern the size on disk when serialized, or the size in memory?
If you are concerned about size in memory, and if you do not necessarily need to hold each node in memory at the same time, you may want to look into using some type of lazy loading using something like transparent activation with db4o
如果您需要持续的可扩展性,请考虑使用现有的图形数据库,例如 Neo4J,它可以处理您描述的更大的图形(数百万或数十亿的关系)。我已经将它用于大约 2500 万个节点的图表,并取得了良好的结果。
If you need ongoing scalability, consider using an existing Graph Database such as Neo4J, which can handle MUCH larger graphs that you describe (millions or billions of relationships). I have used it for graphs of about 25 million nodes with good results.