如何快速将一组布尔值与许多其他布尔值集进行比较（顺序很重要）？

发布于 2024-12-12 10:21:12 字数 1092 浏览 2 评论 0原文

我在业余时间从事的一个项目遇到了问题。我正在使用 Google App Engine（Java 版本），但这个问题并不特定于该平台，如果其他语言/平台可以解决问题，我会考虑其他语言/平台。

下面说明了这个问题：

假设我有一个包含数千个食谱以及每个食谱的成分的数据存储。（为了便于说明，请忘记测量。）我希望能够输入我手头上的成分列表，然后快速检索我至少拥有 XX% 成分的所有食谱（假设75%）。我愿意为了速度而牺牲一些准确性和一些结果，但确实想要一定程度的准确性。得到“快速结果”后我可以做更彻底的比较。

我尝试解决方案：通过分析食谱数据库，我编制了一份清单，其中包含 200 种常见食品成分（鸡蛋、面粉、盐、糖、迷迭香等）。几乎所有食谱的成分都包含在这个主列表中：

Common Food Ingredients: [ eggs , flour , salt , sugar , cinnamon ... ]

然后，我浏览每个单独的食谱并将成分与这个主列表进行比较，最后为每个食谱提供一组 200 个布尔值：

Recipe #106: [ T , T , F , T , F ... ]
Recipe #107: [ F , T , T , T , F ... ]

我会将此信息存储为食谱。（到目前为止，这都是数据准备工作，我有足够的时间来做这些工作。）

现在，我输入手头的配料清单。我会与主列表进行相同的比较：

My ingredients on hand: [ F , F , T , T , F ... ]

这就是我陷入困境的地方。如何快速将这组布尔值与食谱集进行比较，以便我可以识别出我至少拥有 75% 成分的食谱？

或者（这将是圣杯），在数据准备期间，不是将布尔值集本身与每个配方一起存储，而是可以执行计算来给我一个稍后可以过滤掉的单个值？（例如，“从食谱中选择*，其中master_list_boolean_metric <= 29”）

或者我是否以错误的方式处理这个问题？（任何一般或具体的指导，我们将不胜感激。）我想避免的是在每个食谱和我的“现有”成分列表之间逐个成分地进行缓慢的比较。

或者……也许不可能很快做到这一点？

原文

I'm running into a problem with a project I'm working on in my spare time. I'm using Google App Engine (Java version), but this question is not specific to that platform, and I would consider other languages/platforms if they could solve the problem.

The following illustrates the problem:

Suppose I have a datastore with thousands of recipes, and the ingredients for each recipe. (For the sake of this illustration, forget about measurements.) I want to be able to enter a list of ingredients that I have on hand, and then quickly retrieve all recipes for which I have at least XX% of the ingredients (let's say 75%). I'm willing to sacrifice some accuracy and some results for speed, but do want a certain degree of accuracy. I can do a more thorough comparison after I get the "quick results."

My attempt at a solution: Analyzing the database of recipes, I compile a list of, say, 200 common food ingredients (eggs, flour, salt, sugar, rosemary, etc). Almost all the ingredients for the recipes are contained within this master list:

Common Food Ingredients: [ eggs , flour , salt , sugar , cinnamon ... ]

Then, I go through each individual recipe and compare the ingredients to this master list, and end up with a set of 200 booleans for each recipe:

Recipe #106: [ T , T , F , T , F ... ]
Recipe #107: [ F , T , T , T , F ... ]

I would store this information with the recipes. (Up to this point, it's all data prep work, which I have all the time in the world to do.)

Now, I enter my list of ingredients on hand. I would do the same comparison with the master list:

My ingredients on hand: [ F , F , T , T , F ... ]

And this is where I'm stuck. How can I quickly compare this set of booleans against the sets for the recipes so I can identify recipes for which I have at least 75% of the ingredients?

Or (and this would be the holy grail), during the data preparation, instead of storing the set of booleans themselves with each recipe, is there a calculation I can perform that will give me a single value I can later filter off of? (E.g., "SELECT * FROM recipes WHERE master_list_boolean_metric <= 29")

Or am I going about this the wrong way? (Any guidance, general or specific, would be appreciated.) What I want to avoid is doing a slow comparison, ingredient by ingredient, between each recipe and my list of "on-hand" ingredients.

Or... perhaps it isn't possible to do this quickly?

分享到QQ

分享到微博