树匹配算法?

发布于 2024-09-08 11:37:25 字数 913 浏览 1 评论 0原文

我正在开发 tree 库,所需功能的一部分是能够在节点中搜索与模式匹配的子节点。

“模式”是一种规范(或标准),它列出了要匹配的子树中的结构以及节点的属性。

例如,假设一棵树代表有关特定鸟类物种的数据。进一步假设这样一棵树的节点具有以下属性:

  • 位置
  • 性别
  • 翼展
  • 重量
  • brood_size

给定一个父节点,我想用简单的英语发出搜索:

“给我所有的雄鸟 这只鸟的后代,居住在 XXX城市且有体重> 100克。任何被发现的此类鸟都应该有至少 2 个兄弟和 1 个姐妹,并且本身必须至少有一个孩子”

只是澄清一下,我不希望能够使用正如我上面所做的那样,我只使用“简单英语查询”来说明我想要在树上执行的匹配类型,我完全希望使用符号进行匹配(而不是纯文本)。

< /note >

我正在考虑使用正则表达式类型模式匹配来匹配树,一种方法是使用每个节点的字符串表示,因此我可以使用普通的正则表达式。 - 但这可能效率很低,因为会有很多重复的数据 - 即子节点的字符串表示将是其父表示的超集,而父表示将是其父表示字符串的超集,依此类推,递归地,在树上 - 对于中等大小的树来说,这很容易变得笨拙 - 必须有更好的方法。

有谁知道一种算法可以让我根据模式选择节点中的节点(子树)?

虽然我要求一个通用算法,但我正在用 Python 实现它。任何进一步说明这种算法的片段(如果确实可以编写的话)都将非常有用。

I am working on a tree library, and part of the required functionality, is to be able to search a node for child nodes that match a pattern.

A 'pattern' is a specification (or criteria) that lays out the structure, as well as attributes of nodes in the subtree(s) to be matched.

For example, suppose a tree represents data regarding a particular species of bird. Further assume that the nodes of such a tree have the following attributes:

  • location
  • sex
  • wingspan
  • weight
  • brood_size

Given a parent node, I would like to issue a search in plain English thus:

"Fetch me all male birds that are
descendants of this bird, and live in
XXX city and have a weight > 100g. Any such bird found should also have at least 2 brothers and one sister, and must itself have at least one child"

< note >

Just to clarify, I do not expect to be able to query using plain English as I have done above. I only used the "plain English query" to illustrate the type of matching I would like to be performing on the tree. I fully expect to use symbols for the matching (as opposed to plain text) in practice.

< /note >

I am thinking of possibly using a regex type pattern matching to match trees. One way would be to have a string representation of each node, so I could use a normal regex - but this is likely of be quite inefficient, as there will be a lot of repeated data - i.e. string representation of child nodes will be supersets of their parent representation, which will be supersets of their parents representational string, and so on, recursively, up the tree - this could very easily become unwieldy for event modestly sized trees - there has to be a better way.

Is anyone aware of an algorithm that will allow me to select nodes (subtrees) in a node, based on a pattern?

Although I asked for a general algorithm, I am implementing this in Python. any snippets that further illustrate such an algorithm (if one can indeed be written), would be immensely useful.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

记忆里有你的影子 2024-09-15 11:37:25

编写带有通配符的 Lisp Sexpression 来描述树匹配有什么问题?括号将节点分组。从左到右的元素与根元素匹配,后跟子元素。子树匹配使用嵌套的 S 表达式来描述子树。

以下将匹配具有任意根节点的树,第一个子节点是叶 A,第三个子节点是以 X 为根的子树,第一个子节点为 1,第三个子节点为 A:

(?root A ? (X 1 A))

这个想法不是我独有的;自 20 世纪 60 年代初以来,Lisp 开发人员就一直在编写此类模式。

这是一个仅可追溯到 20 年前的 LISP 模式匹配器(作为您想要的示例):
http://norvig.com/paip/patmatch.lisp

但是,自己编写代码非常漂亮简单的。这通常被布置为学习 LISP 的人的家庭作业。

What's wrong with writing a Lisp Sexpression with wildcards to describe the tree match? Parentheses group a node. Elements from left to right match the root followed by the children. Subtree matches use nested Sexpressions to describe the subtree.

The following would match a tree with arbitrary root node, first child being a leaf A, third child being a subtree rooted with X, first child 1 and third child A:

(?root A ? (X 1 A))

This idea isn't unique to me; the Lisp guys have been writing such patterns since the early sixties.

Here's a LISP pattern matcher (as an example you wanted) that only goes back 20 years:
http://norvig.com/paip/patmatch.lisp

However, coding this yourself is pretty easy. This is typically assigned as a homework exercise for people learning LISP.

神仙妹妹 2024-09-15 11:37:25

这取决于你的树。如果您的树是有根且有序的,您应该能够检查亚线性时间中的精确匹配,如果不是,您应该能够检查线性时间中的匹配。还存在几种更快的近似匹配算法。

要查找此类主题的材料和算法,Google Scholar 是您的好朋友。搜索子树匹配或类似的内容应该可以到达那里。

编辑:根据您更新的条目判断,我建议您了解一下 XPath 和类似查询语言的实现方式。 XML 是一棵有根树,XPath 可以使用复杂的匹配运算符(如示例中的运算符)搜索该树中的子树。

我还建议您不要自己实现这一点,而是使用现有的库(例如 PyLucene或其他一些搜索引擎,考虑到您给出的示例,这似乎是合适的)。

This depends on your tree. If your tree is rooted and ordered, you should be able to check for an exact match in sublinear time, and if not, you should be able to check for a match in linear time. Several faster algorithms also exist for approximate matching.

For finding material and algorithms for topics like this, Google Scholar is your friend. A search for subtree matching or similar should get you there.

EDIT: Judging by your updated entry, I suggest you take a look at how XPath and similar query languages are implemented. XML is a rooted tree, and XPath can search for sub trees in that tree with complex matching operators like the ones in your example.

I also advice you not to implement this on your own, but rather use an existing library (like PyLucene or some other search engine, which seems appropriate given the example you put out).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文