出于缓存原因标准​​化布尔表达式。有没有比真值表更有效的方法?

发布于 2024-11-04 08:52:41 字数 702 浏览 3 评论 0原文

我当前的项目是一个具有布尔检索功能的高级标签数据库。正在使用这样的布尔表达式来查询记录(例如在音乐数据库中):

funky-music and not (live or cover)

这应该产生音乐数据库中的所有时髦音乐,但不是歌曲的现场或翻唱版本。

当涉及到缓存时,问题是存在等效但结构不同的查询。例如,应用德摩根规则,上面的查询可以这样写:

funky-music and not live and not cover

这将产生完全相同的记录,但是当通过散列查询字符串来实现缓存时,会导致缓存中断。

因此,我的第一个意图是创建查询的真值表,然后将其用作缓存键,因为等效表达式形成相同的真值表。不幸的是,这是不切实际的,因为真值表随着输入(标签)的数量呈指数增长,并且我不想限制一个查询中使用的标签数量。

另一种方法可能是应用布尔代数定义的规则遍历语法树以形成(最小)标准化表示,这似乎也很棘手。

因此,总体问题是:是否有一种可行的方法来实现等效查询的识别,而不需要电路最小化 或真值表(编辑:或任何其他 NP 难算法)?

ne plus ultra 将识别已经缓存的子查询,但这不是主要目标。

My current project is an advanced tag database with boolean retrieval features. Records are being queried with boolean expressions like such (e.g. in a music database):

funky-music and not (live or cover)

which should yield all funky music in the music database but not live or cover versions of the songs.

When it comes to caching, the problem is that there exist queries which are equivalent but different in structure. For example, applying de Morgan's rule the above query could be written like this:

funky-music and not live and not cover

which would yield exactly the same records but of cause break caching when caching would be implemented by hashing the query string, for example.

Therefore, my first intention was to create a truth table of the query which could then be used as a caching key as equivalent expressions form the same truth table. Unfortunately, this is not practicable as the truth table grows exponentially with the number of inputs (tags) and I do not want to limit the number of tags used in one query.

Another approach could be traversing the syntax tree applying rules defined by the boolean algebra to form a (minimal) normalized representation which seems to be tricky too.

Thus the overall question is: Is there a practicable way to implement recognition of equivalent queries without the need of circuit minimization or truth tables (edit: or any other algorithm which is NP-hard)?

The ne plus ultra would be recognizing already cached subqueries but that is no primary target.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

蓬勃野心 2024-11-11 08:52:41

一种确定查询是否等价于“False”的通用且高效的算法可用于有效地解决 NP 完全问题,因此您不太可能找到一个算法。

您可以尝试将查询转换为规范形式。由于上述原因,总会有一些查询转换为任何给定的形式都非常昂贵,但您可能会发现,在实践中,某些形式在大多数情况下都工作得很好 - 并且您总是可以中途放弃如果转型变得太难。

您可以查看http://en.wikipedia.org/wiki/Conjunctive_normal_formhttp://en.wikipedia.org/wiki/Disjunctive_normal_form, http://en.wikipedia.org/wiki/Binary_decision_diagram

A general and efficient algorithm to determine whether a query is equivalent to "False" could be used to solve NP-complete problems efficiently, so you are unlikely to find one.

You could try transforming your queries into a canonical form. Because of the above, there will be always be queries that are very expensive to transform into any given form, but you might find that, in practice, some form works pretty well most of the time - and you can always give up halfway through a transformation if it is becoming too hard.

You could look at http://en.wikipedia.org/wiki/Conjunctive_normal_form, http://en.wikipedia.org/wiki/Disjunctive_normal_form, http://en.wikipedia.org/wiki/Binary_decision_diagram.

流殇 2024-11-11 08:52:41

您可以将查询转换为连接范式 (CNF)。它是布尔公式的规范、简单表示,通常是 SAT 求解器的基础。

最有可能的“大型”查询将有大量的连词(而不是大量的析取),因此 CNF 应该可以很好地工作。

You can convert the queries into conjunctive normal form (CNF). It is a canonical, simple representation of boolean formulae that is normally the basis for SAT solvers.

Most likely "large" queries are going to have lots of conjunctions (rather than lots of disjunctions) so CNF should work well.

南冥有猫 2024-11-11 08:52:41

Quine-McCluskey 算法应该可以实现您正在寻找的内容。它与卡诺地图类似,但更容易在软件中实现

The Quine-McCluskey algorithm should achieve what you are looking for. It is similiar to Karnaugh's Maps, but easier to implement in software.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文