如何区分分层数据?
有没有可以区分层次结构的工具?
IE,考虑以下层次结构:
A has child B.
B has child C.
它与以下内容进行比较:
A has child B.
A has child C.
我想要一个工具,显示 C 已从 B 的子级移动到 A 的子级。是否存在此类实用程序?如果没有特定的工具,我不反对自己编写,那么有哪些适用于这个问题的好算法呢?
Are there any tools which diff hierarchies?
IE, consider the following hierarchy:
A has child B.
B has child C.
which is compared to:
A has child B.
A has child C.
I would like a tool that shows that C has moved from a child of B to a child of A. Do any such utilities exist? If there are no specific tools, I'm not opposed to writing my own, so what are some good algorithms which are applicable to this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
用于比较层次结构(不是特别是 XML、HTML 等)的一个很好的通用资源是 Hierarchical-Diff 基于达特茅斯研究的 github 项目。他们有一个相当广泛的列表 相关工作,范围从 XML 比较、配置文件比较到 HTML 比较。
一般来说,在树结构上实际执行差异/补丁是一个相当好的解决问题,但以对人类有意义的方式显示这些差异仍然是狂野的西部。当您的数据结构已经具有一些语义(如 HTML)时,这是双重正确的。
A great general resource for diffing hierarchies (not specifically XML, HTML, etc) is the Hierarchical-Diff github project based on a bit of Dartmouth research. They have a pretty extensive list of related work ranging from XML diffing, to configuration file diffing to HTML diffing.
In general, actually performing diffs/patches on tree structures is a fairly well-solved problem, but displaying those diffs in a manner that makes sense to humans is still the wild west. That's double true when your data structure already has some semantic meaning like with HTML.
您可以考虑使用我们的 SmartDifferencer 工具。
这些工具以类似差异的方式比较计算机源代码文件。与面向行的 diff 不同,这些工具根据代码结构(变量名、表达式、语句、块、函数、类等)将更改视为合理的编辑(“移动、插入、删除、替换、复制、重命名”) ),产生对程序员有意义的答案。
这些计算机源代码完全具有您所建议的“层次结构”结构;各种结构筑巢。具体到您的主题,通常代码块可以嵌套在代码块内。 SmartDifferencer 工具使用目标语言精确解析器将源文本“解构”为这些分层实体。我们有一个用于 XML 的智能差异器,您显然可以在其中编写嵌套标签。
答案并未报告为“M 的第 N 个子级已移动”,尽管它实际上是通过对解析器生成的解析树进行操作来计算的。相反,它被报告为“x col y 行到 a col b 行的代码片段已移动/...”
You might consider our SmartDifferencer tools.
These tools compare computer source code files in a diff-like way. Unlike diff, which is line oriented, these tools see changes according to code structure (variable name, expression, statement, block, function, class, etc.) as plausible edits ("move, insert, delete, replace, copy, rename"), producing answers that makes sense to programmers.
These computer source codes have exactly the "hierarchy" structure you are suggesting; the various constructs nest. Specifically to your topic, typically code blocks can nest inside code blocks. The SmartDifferencer tools use target-language accurate parsers to "deconstruct" the source text into these hierarchical entities. We have a Smart Differencer for XML in which you can obviously write nested tags.
The answer isn't reported as "Nth child of M has moved" although it is actually computed that way, by operating on the parse trees produced by the parsers. Rather it is reported as "code fragment of type at line x col y to line a col b has moved/..."
我的好先生的答案是:深度优先搜索,也称为深度优先遍历。您可能会发现访问者模式的一些用途。
在处理比较 XML 树时,如果不使用某种实现方法,就无法摆脱死猫。以 diffxml 为例。
The answer my good sir is: Depth-first search, also known as Depth-first traversal. You might find some use of the Visitor pattern.
You can't swing a dead cat without hitting some sort of implementation for this when dealing with comparing XML trees. Take a gander at diffxml for an example.