XML 元素层次结构引用

发布于 2024-07-26 10:13:35 字数 1127 浏览 4 评论 0原文

我正在研究可被视为用于数据交换目的的 XML 格式的有限深度图的可能表示形式。 问题点是如何引用边缘标签中的节点。 我看到的两种策略是 a) 使用唯一标识符或 b) 使用路径。

唯一 ID:

<graph id="g0">
  <node id="n0"/>
  <node id="n1"/>
  <edge from="n1" to="n0"/>
</graph>
<graph id="g1">
  <node id="n2"/>
</graph>
<edge from="n2" to="n1"/>

路径:

<graph id="0">
  <node id="0"/>
  <node id="1"/>
  <node id="2"/>
  <edge from="1" to="0"/>
  <edge from="2" to="1"/>
</graph>
<graph id="1">
  <node id="0"/>
</graph>
<edge from="1:0" to="0:2"/>

此类事情的标准程序是什么? 据我所知,唯一标识符方法似乎更普遍。 我的问题是,当图形变得非常大时,存在以下问题:

  • 需要一个非常大的哈希表,将对象映射到其 ID,以便从 XML 文件读取/写入边缘
  • 文件本身比使用路径写入的文件大因为如果边位于图形内部,则无法省略冗余路径组件。

想法?

更新 1

请注意,它不是一个平面图表; 它的一个或多个图相互连接。 它们每个都有本地索引元素,但是将它们全部展平并跟踪它们之间的边缘有点麻烦。

更新1.1: 注意到 GraphML 中的子图实际上使用了复杂的键,这使得可以将本地节点 ID 与全局节点 ID 分开。

更新 2

是的,显然这不是一个格式良好的 XML,并且缺少标签和各种架构声明。

I'm looking through possible representations for what can be considered a finite depth graph in XML format for data exchange purposes. The problematic point is how to reference nodes in edge tags. Two strategies I see are a) using unique identifiers or b) using paths.

Unique IDs:

<graph id="g0">
  <node id="n0"/>
  <node id="n1"/>
  <edge from="n1" to="n0"/>
</graph>
<graph id="g1">
  <node id="n2"/>
</graph>
<edge from="n2" to="n1"/>

Paths:

<graph id="0">
  <node id="0"/>
  <node id="1"/>
  <node id="2"/>
  <edge from="1" to="0"/>
  <edge from="2" to="1"/>
</graph>
<graph id="1">
  <node id="0"/>
</graph>
<edge from="1:0" to="0:2"/>

What is the standard procedure for these kinds of things? From what I've gathered, unique identifier approach seems to be more widespread. My issue with that is when graphs are becoming very large, there's:

  • necessity for a really large hash table that maps objects to their IDs for purposes of reading/writing edges from/to XML files
  • the file itself is larger than the one written using paths because you cannot omit redundant path components if edge is internal to the graph.

Thoughts?

Update 1:

Note that its not one flat graph; its one or more graphs interconnected. They each have locally indexed elements, but flattening them all out and keeping track of edges across them is a bit of a nuisance.

Update 1.1:
Noticed that with sub-graphs in GraphML, they do in fact use complex keys that makes it possible to separate local node id out of the global one.

Update 2:

Yes, obviously this is not a well formed XML and missing tags and all sorts of schema declarations.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

浮世清欢 2024-08-02 10:13:35

有一个描述此类图的模式:请参阅 GraphML

示例:

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"  
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
     http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
  <graph id="G" edgedefault="undirected">
    <node id="n0"/>
    <node id="n1"/>
    <node id="n2"/>
    <node id="n3"/>
    <node id="n4"/>
    <node id="n5"/>
    <node id="n6"/>
    <node id="n7"/>
    <node id="n8"/>
    <node id="n9"/>
    <node id="n10"/>
    <edge source="n0" target="n2"/>
    <edge source="n1" target="n2"/>
    <edge source="n2" target="n3"/>
    <edge source="n3" target="n5"/>
    <edge source="n3" target="n4"/>
    <edge source="n4" target="n6"/>
    <edge source="n6" target="n5"/>
    <edge source="n5" target="n7"/>
    <edge source="n6" target="n8"/>
    <edge source="n8" target="n7"/>
    <edge source="n8" target="n9"/>
    <edge source="n8" target="n10"/>
  </graph>
</graphml>

There is a schema describing such graph: see GraphML

Example:

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"  
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
     http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
  <graph id="G" edgedefault="undirected">
    <node id="n0"/>
    <node id="n1"/>
    <node id="n2"/>
    <node id="n3"/>
    <node id="n4"/>
    <node id="n5"/>
    <node id="n6"/>
    <node id="n7"/>
    <node id="n8"/>
    <node id="n9"/>
    <node id="n10"/>
    <edge source="n0" target="n2"/>
    <edge source="n1" target="n2"/>
    <edge source="n2" target="n3"/>
    <edge source="n3" target="n5"/>
    <edge source="n3" target="n4"/>
    <edge source="n4" target="n6"/>
    <edge source="n6" target="n5"/>
    <edge source="n5" target="n7"/>
    <edge source="n6" target="n8"/>
    <edge source="n8" target="n7"/>
    <edge source="n8" target="n9"/>
    <edge source="n8" target="n10"/>
  </graph>
</graphml>
凡尘雨 2024-08-02 10:13:35

文件本身比使用路径编写的文件大,因为如果边位于图形内部,则无法省略冗余路径组件。

这一点是不成熟的优化。 XML 解析器/编写器不会因大文件而阻塞,如果存储大小是一个问题,XML 通常可以通过 ZIP 很好地压缩。

需要一个非常大的哈希表,将对象映射到它们的 ID,以便从 XML 文件读取边或向 XML 文件写入边

这是一个实现问题。 如果您将 XML 读/写例程写入图、节点和边类本身,而不是尝试在单独的结构中维护映射,那么您当然可以避免使用这样的大型哈希表。 图很容易序列化和反序列化。

唯一的 ID 可能是最佳选择。 如果您以类似于您建议的分层方式的方式构建 ID,那么它也将相对易于人类阅读,这是 XML 的目标之一。

the file itself is larger than the one written using paths because you cannot omit redundant path components if edge is internal to the graph.

This point is premature optimization. XML parsers/writers aren't going to choke on large files, and if storage size is a concern, XML usually compresses very well with ZIP.

necessity for a really large hash table that maps objects to their IDs for purposes of reading/writing edges from/to XML files

That's an implementation concern. You can certainly avoid having a large hash table like this if you write your XML read/write routines into the graph, node and edge classes themselves rather than trying to maintain the mapping in a separate structure. Graphs are pretty easy to serialize and deserialize.

Unique IDs are probably the way to go. If you structure the IDs in a manner similar to the hierarchal way you proposed it will be relatively human readable as well, which is one of XML's goals.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文