如何快速将一组布尔值与许多其他布尔值集进行比较(顺序很重要)?

发布于 2024-12-12 10:21:12 字数 1092 浏览 0 评论 0原文

我在业余时间从事的一个项目遇到了问题。我正在使用 Google App Engine(Java 版本),但这个问题并不特定于该平台,如果其他语言/平台可以解决问题,我会考虑其他语言/平台。

下面说明了这个问题:

假设我有一个包含数千个食谱以及每个食谱的成分的数据存储。 (为了便于说明,请忘记测量。)我希望能够输入我手头上的成分列表,然后快速检索我至少拥有 XX% 成分的所有食谱(假设75%)。我愿意为了速度而牺牲一些准确性和一些结果,但确实想要一定程度的准确性。得到“快速结果”后我可以做更彻底的比较。

我尝试解决方案:通过分析食谱数据库,我编制了一份清单,其中包含 200 种常见食品成分(鸡蛋、面粉、盐、糖、迷迭香等)。几乎所有食谱的成分都包含在这个主列表中:

Common Food Ingredients: [ eggs , flour , salt , sugar , cinnamon ... ]

然后,我浏览每个单独的食谱并将成分与这个主列表进行比较,最后为每个食谱提供一组 200 个布尔值:

Recipe #106: [ T , T , F , T , F ... ]
Recipe #107: [ F , T , T , T , F ... ]

我会将此信息存储为食谱。 (到目前为止,这都是数据准备工作,我有足够的时间来做这些工作。)

现在,我输入手头的配料清单。我会与主列表进行相同的比较:

My ingredients on hand: [ F , F , T , T , F ... ]

这就是我陷入困境的地方。如何快速将这组布尔值与食谱集进行比较,以便我可以识别出我至少拥有 75% 成分的食谱?

或者(这将是圣杯),在数据准备期间,不是将布尔值集本身与每个配方一起存储,而是可以执行计算来给我一个稍后可以过滤掉的单个值? (例如,“从食谱中选择*,其中master_list_boolean_metric <= 29”)

或者我是否以错误的方式处理这个问题? (任何一般或具体的指导,我们将不胜感激。)我想避免的是在每个食谱和我的“现有”成分列表之间逐个成分地进行缓慢的比较。

或者……也许不可能很快做到这一点?

I'm running into a problem with a project I'm working on in my spare time. I'm using Google App Engine (Java version), but this question is not specific to that platform, and I would consider other languages/platforms if they could solve the problem.

The following illustrates the problem:

Suppose I have a datastore with thousands of recipes, and the ingredients for each recipe. (For the sake of this illustration, forget about measurements.) I want to be able to enter a list of ingredients that I have on hand, and then quickly retrieve all recipes for which I have at least XX% of the ingredients (let's say 75%). I'm willing to sacrifice some accuracy and some results for speed, but do want a certain degree of accuracy. I can do a more thorough comparison after I get the "quick results."

My attempt at a solution: Analyzing the database of recipes, I compile a list of, say, 200 common food ingredients (eggs, flour, salt, sugar, rosemary, etc). Almost all the ingredients for the recipes are contained within this master list:

Common Food Ingredients: [ eggs , flour , salt , sugar , cinnamon ... ]

Then, I go through each individual recipe and compare the ingredients to this master list, and end up with a set of 200 booleans for each recipe:

Recipe #106: [ T , T , F , T , F ... ]
Recipe #107: [ F , T , T , T , F ... ]

I would store this information with the recipes. (Up to this point, it's all data prep work, which I have all the time in the world to do.)

Now, I enter my list of ingredients on hand. I would do the same comparison with the master list:

My ingredients on hand: [ F , F , T , T , F ... ]

And this is where I'm stuck. How can I quickly compare this set of booleans against the sets for the recipes so I can identify recipes for which I have at least 75% of the ingredients?

Or (and this would be the holy grail), during the data preparation, instead of storing the set of booleans themselves with each recipe, is there a calculation I can perform that will give me a single value I can later filter off of? (E.g., "SELECT * FROM recipes WHERE master_list_boolean_metric <= 29")

Or am I going about this the wrong way? (Any guidance, general or specific, would be appreciated.) What I want to avoid is doing a slow comparison, ingredient by ingredient, between each recipe and my list of "on-hand" ingredients.

Or... perhaps it isn't possible to do this quickly?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

荒路情人 2024-12-19 10:21:12

使用 BitSet

将每种成分存储为一位,与您拥有的成分进行 AND 运算,然后根据基数进行过滤()

use BitSet.

store each ingredient as one bit, do an AND with the ingredients you have, and then filter on cardinality()

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文