出于缓存原因标准化布尔表达式。有没有比真值表更有效的方法？

发布于 2024-11-04 08:52:41 字数 702 浏览 3 评论 0原文

我当前的项目是一个具有布尔检索功能的高级标签数据库。正在使用这样的布尔表达式来查询记录（例如在音乐数据库中）：

funky-music and not (live or cover)

这应该产生音乐数据库中的所有时髦音乐，但不是歌曲的现场或翻唱版本。

当涉及到缓存时，问题是存在等效但结构不同的查询。例如，应用德摩根规则，上面的查询可以这样写：

funky-music and not live and not cover

这将产生完全相同的记录，但是当通过散列查询字符串来实现缓存时，会导致缓存中断。

因此，我的第一个意图是创建查询的真值表，然后将其用作缓存键，因为等效表达式形成相同的真值表。不幸的是，这是不切实际的，因为真值表随着输入（标签）的数量呈指数增长，并且我不想限制一个查询中使用的标签数量。

另一种方法可能是应用布尔代数定义的规则遍历语法树以形成（最小）标准化表示，这似乎也很棘手。

因此，总体问题是：是否有一种可行的方法来实现等效查询的识别，而不需要电路最小化或真值表（编辑：或任何其他 NP 难算法）？

ne plus ultra 将识别已经缓存的子查询，但这不是主要目标。

原文

My current project is an advanced tag database with boolean retrieval features. Records are being queried with boolean expressions like such (e.g. in a music database):

funky-music and not (live or cover)

which should yield all funky music in the music database but not live or cover versions of the songs.

When it comes to caching, the problem is that there exist queries which are equivalent but different in structure. For example, applying de Morgan's rule the above query could be written like this:

funky-music and not live and not cover

which would yield exactly the same records but of cause break caching when caching would be implemented by hashing the query string, for example.

Therefore, my first intention was to create a truth table of the query which could then be used as a caching key as equivalent expressions form the same truth table. Unfortunately, this is not practicable as the truth table grows exponentially with the number of inputs (tags) and I do not want to limit the number of tags used in one query.

Another approach could be traversing the syntax tree applying rules defined by the boolean algebra to form a (minimal) normalized representation which seems to be tricky too.

Thus the overall question is: Is there a practicable way to implement recognition of equivalent queries without the need of circuit minimization or truth tables (edit: or any other algorithm which is NP-hard)?

The ne plus ultra would be recognizing already cached subqueries but that is no primary target.

分享到QQ

分享到微博