如何从数组中随机选择一个符合特定条件的元素 O(1)

发布于 2024-12-29 16:52:22 字数 996 浏览 0 评论 0原文

我有一个从数据库获取的食谱列表,如下所示:

List<RecipeNode> _recipeList;

RecipeNode 除其他外,还有一个引用一个或多个标签的属性(例如 Dinner、< em>早餐、配菜素食假期以及大约 60 个其他)。

   public sealed class RecipeNode
   {
      public Guid RecipeId;
      public Byte[] Tags; //Tags such as 1, 5, 6, 8, 43
      //... More stuff
   }

在 O(1) 中从 _recipeList 中查找随机食谱当然很容易,但是我需要做的是找到一个随机食谱,例如在 Tags 在 O(1) 内。

现在,我唯一的想法是创建一个由标签键控的 List 数组。例如:

List<RecipeNode>[] _recipeListByTag;

然后,_recipeListByTag[5] 将包含 Tags 数组中具有 5 的所有食谱的列表。然后我可以在 O(1) 中选择一个随机允许的标签和该标签内的一个随机配方。

这种方法的缺点是这个多维数组的大小将是Recipes * Tags(例如,所有食谱中Tags.length的总和),这开始占用大量内存,因为我'我在此数组中存储了潜在的大量食谱。当然,由于 RecipeNode 是一个引用类型,我只是重复指向菜谱的 4 字节指针,所以这仍然可能是最好的方法。

是否有更有效的数据结构或算法可以用来让我找到包含某个允许标签的随机配方?谢谢!

I have a list of recipes obtained from a database that looks like this:

List<RecipeNode> _recipeList;

RecipeNode, among other things, has a property that references one or more tags (Such as Dinner, Breakfast, Side, Vegetarian, Holiday, and about 60 others).

   public sealed class RecipeNode
   {
      public Guid RecipeId;
      public Byte[] Tags; //Tags such as 1, 5, 6, 8, 43
      //... More stuff
   }

Finding a random recipe from _recipeList in O(1) would of course be easy, however what I need to do is find a random recipe that has, say, 5 in the Tags in O(1).

Right now, my only idea is to make an array of List<RecipeNodes>, keyed by tag. For example:

List<RecipeNode>[] _recipeListByTag;

Then, _recipeListByTag[5] would contain a list of all the recipes that have a 5 in the Tags array. I could then choose a random allowed tag and a random recipe within that tag in O(1).

The drawback of this approach is the size of this multidimensional array would be Recipes * Tags (eg, the sum of Tags.length across all recipes), which starts to take up a lot of memory since I'm storing a potentially huge number of recipes in this array. Of course, since RecipeNode is a reference type, I'm only repeating the 4byte pointers to the recipes, so this still might be the best way to go.

Is there a more efficient data structure or algorithm I could use to allow me to find a random recipe that contains a certain allowed tag? Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

十年九夏 2025-01-05 16:52:22

List[] _recipeListByTag 可能是最适合您的方法,其大小不是 Recipes * Tags 因为数组中的每个列表将仅包含与标签匹配的食谱,而不是更多。如果每个食谱都包含每个标签,那么它的大小将变成Recipes * Tags

如果数据结构占用的内存量对您来说非常重要,请不要忘记在填充每个列表后调用 List.TrimExcess()。

List<RecipeNode>[] _recipeListByTag is probably the best approach for you, and its size is not Recipes * Tags because each list in the array will only contain as many recipes as match a tag, and not more. Its size would become Recipes * Tags if every single recipe contained every single tag.

If the amount of memory occupied by your data structures is so very dear to you, do not forget to call List.TrimExcess() after you have populated each list.

疯了 2025-01-05 16:52:22

这是作业吗?我怀疑任何现实世界的食谱程序都需要 O(1) 访问标签,并且对于使用数据库来说太慢。我也怀疑任何现实世界的食谱都会有数字标签。了解真实域有助于提供更好的答案。

然而,如果它真的是关于食谱和数字标签,并且如果你只有 256 个标签,为什么不随机选择一个食谱 100 万次呢?找不到具有所需标签的菜谱的可能性很小,并且复杂度仍然是 O(1)。如果您不喜欢这种可能性,请随机选择一个食谱 10^20 次。复杂度仍然是 O(1)。

更新:

由于您担心的不是 O(1),而是选择随机食谱所需的时间,我建议您让数据库为您处理这个问题 - 无论如何,同一个数据库保存所有食谱,并且您将访问以显示随机食谱的同一个数据库。

您可以通过以下方式SELECT SQL Server 中的随机记录:SQL Server 随机排序< /a> .如果您使用其他数据库,还有其他方法:http://www.petefreitag。 com/item/466.cfm。只需确保您的 WHERE 子句中包含 Tag=17 即可。

Is this homework? I doubt any real-world recipe program would require O(1) access to tags, and be too slow for using a database. I also doubt any real-world recipe would have numeric tags. Understanding the real domain can help provide a better answer.

However, if it really is about recipes and numeric tags, and if you only have 256 tags, why don't you just choose a random recipe 1 million times? The odds of not finding a recipe with the required tag are minimal, and the complexity is still O(1). If you don't like the odds, choose a random recipe 10^20 times. The complexity is still O(1).

UPDATE:

Since it's not the O(1) you're worried about, but rather the time it takes to pick a random recipe, I suggest you let your database handle this for you - the same database that holds all the recipes anyway, and the same database you're going to access to show the random recipe.

You can SELECT a random record in SQL Server this way: SQL Server Random Sort . If you're using some other database, there are other ways: http://www.petefreitag.com/item/466.cfm . Just make sure your WHERE clause has Tag=17 in it.

一页 2025-01-05 16:52:22

如果您想将数据保存在内存中,那么为每个标签提供一个(4 字节)指针列表就再好不过了。如果您可以使用数据库...那么,其他人已经发布了相关内容。根据“巨大”的大小,您可能只需花费一些美元即可将 RAM 添加到目标计算机。

如果您确实想将数据保留在内存中,但又想对内存非常节省,您可以尝试压缩每个标签-配方组合的 4 个字节。例如,将所有食谱保存在一个大数组中,并且(假设不会超过一百万)将数组索引存储在每个 3 个字节中。

更进一步,您可以将具有给定标签的食谱划分为大小相等的“桶”(每个桶占据大数组的连续区域),为每个“桶”存储一个起始索引(3-4 个字节),然后存储具有给定标签的连续配方的索引之间的增量值列表。使用字节数组对增量值进行编码,以便单个增量值可以根据需要使用 1-4 个字节的任何值。

因为“桶”中的菜谱数量将被限制为一个常数,所以使用这种方法的检索仍然是 O(1)。

(我已经在只有 256 字节 RAM 的微控制器上完成了嵌入式编程……当你这样做时,你会开始考虑非常有创意的方法来节省字节甚至位。我确信没有必要达到这样的长度在你的申请中,但我认为这是一个有趣的想法。)

If you want to keep the data in memory, you won't do much better than a list of (4 byte) pointers for each tag. If you can use a DB... well, others have already posted about that. Depending on how huge is "huge", you might just fork out some $$$ to add RAM to the target machine.

If you do want to keep the data in memory, but want to be ridiculously parsimonious with memory, you could try to squeeze down the 4 bytes per tag-recipe combination. For example, keep all the recipes in a big array, and (assuming you won't have more than about a million) store array indexes in 3 bytes each.

To go even further, you could divide the recipes with a given tag into equally-sized "buckets" (each occupying a contiguous area of the big array), store a starting index for each "bucket" (in 3-4 bytes), and then store a list of delta values between indexes of consecutive recipes with the given tag. Encode the delta values using an array of bytes, in such a way that a single delta value can use anything from 1-4 bytes, as required.

Because the number of recipes in a "bucket" will be limited to a constant number, retrieval using this approach is still O(1).

(I have done embedded programming on micros with as little as 256 bytes of RAM... when you do that you start thinking of very creative ways to save bytes or even bits. I'm sure that going to such lengths will not be necessary in your application, but I thought this was an interesting idea.)

め七分饶幸 2025-01-05 16:52:22

我将从源列表导出到另一个,并引用适合您的所有元素。根据参考文献,进行随机选择并从源列表中取出一个元素。
如果您有可能再次使用相同的派生列表,请将此类列表放入更大的列表中。
(当然,所选择的算法取决于列表的实际统计数据。)

如果您仅使用一个参数,您可以通过此参数对列表进行排序,并记住在另一个列表 B 中它引用了具有相同参数值的元素开始。稍后您可以简单地在区间 (B[4];b[5]-1) 中随机取值。这将使随机选择的速度等于 O(const)。

I would make an export from the source list to another with references to all elements that suit you. There make a random choosing and take an element from the source list, according to the reference.
If there is a possibility that you again coud use the same derived list, put such lists into a greater list of them.
(Of cource, the chosen algorithm depends on the real statistics of your list.)

If you use only one parameter, you could order your list by this parameter and remember in another list B of it references to where the elements with the same parameter value start. Later you could simply take random in the interval (B[4];b[5]-1). This would make the speed of a random choosing equal to O(const).

灯角 2025-01-05 16:52:22

在这种情况下,我个人会选择 SQlite 解决方案(因为我个人知道它是比其他人更好)。我看到您担心空间,而不是速度方面的性能,而是在恒定恢复时间方面,您也担心数据访问的灵活性。在我看来,在这种情况下,SQlite 是不错的选择。

以您喜欢的方式设计您的小型数据库,并以您想要的方式执行查询和联接。

这是一个古老但始终有效的示例你能用它吗?
当然,还有 ORM 解决方案(例如 LINQ 驱动程序),但对我个人而言,这似乎有点开销。

希望这有帮助。

In this case, I would, personally, go for SQlite solution (as I, personally, know it's better then others). I see that you worry about a space, and not performance in terms of speed, but in terms of constant recovery time, you worry about flexibility of data access too. Imo, in this case, SQlite is way to go.

Design your small DB in a way you like and execute queries and joins in a way you want.

This is old but always valid example of how can you use it.
There is also, naturally, ORM solution (for example LINQ driver), but to me personally it seems kind of overhead.

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文