如何序列化图表?
这是一个面试问题< /a>: 如何序列化图表?我看到了这个答案,但我不确定这是否足够。
这看起来像是一个非常令人困惑的“开放式问题”,候选人可能会提出更多有关需求的问题:节点和边是什么,它们本身如何序列化,这个图是否加权,有向等,有多少个节点/边在图中。基础设施怎么样?它是一个普通的文件系统还是我们应该/可以使用数据库?
那么,你会如何回答这个问题呢?
This is an interview question: How to serialize a graph ? I saw this answer but I am not sure if this is enough.
It looks like a very confusing "open question" and the candidates are probably expected to ask more questions about the requirements: what the nodes and edges are, how they are serialized themselves, is this graph weighted, directed, etc., how many nodes/edges are in the graph.What about the infrastructure ? Is it a plain file system or we should/can use a database ?
So, how would you answer this question ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为您提供的答案是相当合理的。 IMO,基本上你需要了解应用背景,我至少会问:
最简单的方法是将其存储为边缘列表。
然而,在不同的应用中,有一些经典的方法可以做到这一点。
例如,如果您正在进行电路仿真,那么图形是稀疏的并且
生成的图形/矩阵可以存储为列压缩形式。如果您正在解决(最小成本)最大流量问题,那么已经有一个 DIMACS 格式,以便公共求解器可以读取和写入它。如果你想要人类可读,结构化方式也是一个不错的选择,XML可以提供自我验证(已经有一个 GraphML 作为标准)。顺便说一句, dot 格式非常独立。
I think the answer you provided is quite reasonable. IMO, basically you need to know the application background, I will ask at least:
The simplest way will be storing it as an edge list.
However, in different application there are some classical ways to do it.
For example if you are doing circuit simulation then the graph is sparse and
the resulting graph/matrix can be stored as column-compressed form. If you are solving a (min-cost) max-flow problem then there are already a DIMACS format, such that public solvers can read it and write it. Structured way is also a good choice if you want human readable, XML can provide self-validation (there is already a GraphML as the standard). By the way, the dot format is quite self-contained.
嗯。无论您将其存储在什么位置,它基本上都是:
输出图中的每个顶点。如果您没有首先获得所有顶点,那么当您重新读入图形时,PITA 会重建图形。
现在您可以存储顶点之间的边。希望您的顶点具有某种形式的 ID,以唯一地标识它们。我见过的版本是“在数据库中存储(图|树)”。因此,读取节点,存储在哈希表或类似的表中以进行 O(1) 摊销查找。然后,对于每个边,查找 ID-source 和 ID-dest,并链接。
瞧,你已经反序列化了它。如果它不是数据库,则通常具有相同的想法 - 首先序列化节点,然后序列化边缘。
Meh. Whatever you store it in, it's basically:
Output each vertex in the graph. If you don't have all the vertices first, it's a PITA to rebuild the graph when you're reading it back in.
Now you can store edges between vertices. Hopefully your vertices have some form of ID, to uniquely identify them. The version of this I've seen is "store a (graph|tree) in a database". So, read in the nodes, store in a hashtable or similar for O(1) amortized lookup. Then, foreach edge, lookup ID-source and ID-dest, and link.
Voila, you've deserialized it. If it's not a DB, the same idea generally holds - serialize nodes first, then edges.