用于查找多个集合中哪一个是另一个集合的子集的算法/数据结构

发布于 2024-12-23 16:40:47 字数 1229 浏览 2 评论 0 原文

摘要描述：

我有一组字符串，称之为“活动集”，还有一组字符串——称之为“可能集”。当将新字符串添加到活动集中时，可能集中的集合可能突然成为活动集的子集，因为活动集仅缺少该字符串作为可能之一的超集。当我向活动集中添加新字符串时，我需要一种算法来有效地找到这些字符串。如果相同的数据结构允许我在从活动集中删除字符串时有效地找到哪些可能的集合无效（不再是子集），那就加分了。

（我在上面的摘要中用字符串的集合和子集来构建下面描述的问题的原因是，我用（Io）编写的语言是动态类型的。对象确实有一个“类型”字段，但它是一个字符串，其中包含对象类型的名称。）

背景：

在我的游戏引擎中，我有 GameObjects，它可以添加多种类型的 Representation 对象。例如，如果游戏对象具有物理存在，则可能会添加物理表示（如果它不是固体对象，则不会）。它可能添加了各种类型的 GraphicsRepresentations，例如网格或粒子效果（如果您将多个视觉效果附加到同一游戏对象，则可以有多个视觉效果）。

这样做的目的是分离子系统，但你不能完全分离所有东西：例如，当游戏对象同时具有PhysicsRepresentation和GraphicsRepresentation时，需要创建第三个对象，将GraphicsRepresentation的位置连接到图形表示的位置。物理表示。为了达到这个目的，同时仍然保持所有组件独立，我有 Interaction 对象。 Interaction 对象封装了有关两个系统组件如何交互的横切知识。

但为了保护 GameObject 不必了解太多有关表示和交互的信息，GameObject 仅提供一个通用注册表，其中交互原型对象可以注册，以便在 GameObject 中存在特定的表示组合时调用。当将新的表示添加到游戏对象时，游戏对象应该在其注册表中查找并仅激活那些因新表示的存在而新启用的交互对象，以及现有的表示。

我只是卡在这个注册表应该使用什么数据结构以及如何搜索它。

勘误表：

字符串集不一定是排序的，但我可以选择将它们存储为排序的。

尽管交互最常见的是在两个表示之间，但我不想将其限制于此；我应该能够触发 3 个或更多不同表示的交互，甚至仅基于 1 个表示触发的交互。

我想对此进行优化，以使其尽可能快地添加/删除表示。

我将有许多活动集（每个游戏对象都有一个活动集），但我只有一个可能的集（所有已注册交互类型的集）。所以我不在乎构建表示可能集的数据结构需要多长时间，因为只要比较不同活动集的算法不破坏可能集的数据结构，就只需要完成一次。

原文

Abstract Description:

I have a set of strings, call it the "active set", and a set of sets of strings - call that the "possible set". When a new string is added to the active set, sets from the possible set may suddenly be subsets of the active set because the active set lacked only that string to be a superset of one of the possibles. I need an algorithm to efficiently find these when I add a new string to the active set. Bonus points if the same data structure allows me to efficiently find which of these possible sets are invalidated (no longer a subset) when a string is removed from the active set.

(The reason I framed the problem described below in terms of sets and subsets of strings in the abstract above is that the language I'm writing this in (Io) is dynamically typed. Objects do have a "type" field but it is a string with the name of the object type in it.)

Background:

In my game engine I have GameObjects which can have several types of Representation objects added to them. For instance if a GameObject has physical presence it might have a PhysicsRepresentation added to it (or not if it's not a solid object). It might have various kinds of GraphicsRepresentations added to it, such as a mesh or particle effect (and you can have more than one if you have multiple visual effects attached to the same game object).

The point of this is to separate subsystems, but you can't completely separate everything: for instance when a GameObject has both a PhysicsRepresentation and a GraphicsRepresentation, something needs to create a 3rd object which connects the position of the GraphicsRepresentation to the location of the PhysicsRepresentation. To serve this purpose while still keeping all the components separate, I have Interaction objects. The Interaction object encapsulates the cross-cutting knowledge about how two system components have to interact.

But in order to protect GameObject from having to know too much about Representations and Interactions, GameObject just provides a generic registry where Interaction prototype objects can register to be called when a particular combination of Representations is present in the GameObject. When a new Representation is added to the GameObject, GameObject should look in it's registry and activate just those Interaction objects which are newly enabled by the presence of the new Representation, plus the existing Representations.

I'm just stuck on what data structure should be used for this registry and how to search it.

Errata:

The sets of strings are not necessarily sorted, but I can choose to store them sorted.

Although an Interaction most commonly will be between two Representations, I do not want to limit it to that; I should be able to have Interactions that trigger with 3 or more different representations, or even interactions that trigger based on just 1 representation.

I want to optimize this for the case of making it as fast as possible to add/remove representations.

I will have many active sets (each game object has an active set), but I have only one possible set (the set of all registered interaction types). So I don't care how long it takes to build the data structure that represents the possible set, because it only needs to be done once provided the algorithm for comparing different active sets is non-destructive of the possible set data structure.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最偏执的依靠 2024-12-30 16:40:47

如果您的集合非常小，最好的表示方法是使用位集。首先，构建从字符串到连续整数 0..N 的映射，其中 N 是不同字符串的数量。然后，通过将 1< 按位或运算转换为数字来构建集合。这使您可以将集合运算转换为按位运算，速度非常快（交集是 &；并集是 |，等等）。

下面是一个示例：假设您有两个集合，A={quick, Brown, Fox} 和 B={brown,lazy,dog}。首先，您构建一个字符串到数字的映射，如下所示：

quick - 0
brown - 1
fox   - 2
lazy  - 3
dog   - 4

然后您的集合将变为 A=00111b 和 B=11010b。它们的交集是 A&B = 00010b，它们的并集是 A|B = 11111b。如果X == X&Y，则您知道集合X 是集合Y 的子集。

If your sets are really small, the best representation is using bit sets. First, you build a map from strings to consecutive integers 0..N, where N is the number of distinct strings. Then you build your sets by bitwise OR-ing of 1<<k into a number. This lets you turn your set operations into bitwise operations, which are extremely fast (an intersection is an &; a union is an |, and so on).

Here is an example: Let's say you have two sets, A={quick, brown, fox} and B={brown, lazy, dog}. First, you build a string-to-number map, like this:

quick - 0
brown - 1
fox   - 2
lazy  - 3
dog   - 4

Then your sets would become A=00111b and B=11010b. Their intersection is A&B = 00010b, and their union is A|B = 11111b. You know a set X is a subset of set Y if X == X&Y.

回复收藏 0 原文

遥远的绿洲 2024-12-30 16:40:47

实现此目的的一种方法是，对于每个子集，保留其字符串数量不在主集中的计数，以及从字符串到包含该字符串的子集列表的映射，以便您可以在以下情况下更新计数：您向活动集中添加或删除一个新字符串，并注意计数何时降至零。

这个问题让我想起当事实变为真时基于规则的系统中的触发规则，这对应于将新字符串添加到活动集中。其中许多系统都使用http://en.wikipedia.org/wiki/Rete_algorithm。 http://www.jboss.org/drools/drools-expert.html是一个基于规则的开源系统 - 尽管现在看起来有很多企业系统围绕它。