C# algorithm performance cartesian-product

高效的笛卡尔积算法

发布于 2024-08-11 10:20:01 字数 1417 浏览 14 评论 0原文

有人可以为我演示一种比我当前使用的算法更有效的笛卡尔积算法吗（假设有一个）。我环顾四周并用谷歌搜索了一下，但看不到任何明显的东西，所以我可能会错过一些东西。

foreach (int i in is) {
   foreach (int j in js) {
      //Pair i and j
   }
}

这是我在代码中所做的高度简化的版本。这两个整数是查找键，用于检索一个/多个对象，并且两次查找中的所有对象都会配对成新对象。

随着数据集规模的扩大，在更大、更复杂的系统中的这一小代码块成为主要的性能瓶颈。其中一些问题可能可以通过改进用于存储对象和所涉及的查找的数据结构来缓解，但我认为主要问题仍然是笛卡尔积本身的计算。

编辑

因此，我有更多关于我对算法的具体使用的背景知识，看看是否有任何技巧可以用来回应 Marc 的评论。整个系统是一个 SPARQL 查询引擎，它处理对图数据集的 SPARQL 查询，SPARQL 是一种基于模式的语言，因此每个查询都包含一系列与图匹配的模式。在两个后续模式没有公共变量（它们不相交）的情况下，有必要计算两个模式产生的解决方案的笛卡尔积，以获得整个查询的可能解决方案集。可能有任意数量的模式，并且我可能必须多次计算笛卡尔积，如果查询由一系列不相交的模式组成，这可能会导致可能的解决方案呈指数级扩展。

不知何故，从现有的答案来看，我怀疑是否有任何技巧可以应用

更新

所以我想我会发布我实现的更新，以最大限度地减少笛卡尔积的需要，从而优化查询引擎一般来说。请注意，并不总是可以完全消除对产品的需求，但几乎总是可以进行优化，以便连接的两个集合的大小要小得多。

由于每个 BGP（基本图形模式）是一组三重模式，都作为一个块（本质上）执行，因此引擎可以自由地对 BGP 内的模式进行重新排序，以获得最佳性能。例如，考虑以下 BGP：

?a :someProperty ?b .
?c :anotherProperty ?d .
?b a :Class .

按原样执行，查询需要笛卡尔积，因为第一个模式的结果与第二个模式不相交，因此前两个模式的结果是其各自结果的笛卡尔积。这个结果将包含比我们实际需要的更多的结果，因为第三个模式限制了第一个模式的可能结果，但我们直到之后才应用此限制。但是，如果我们像这样重新排序：

?b a :Class .
?a :someProperty ?b .
?c :anotherProperty ?d .

我们仍然需要笛卡尔积来获得最终结果，因为第二个和第三个模式仍然不相交，但通过重新排序，我们限制了第二个模式的结果的大小，这意味着我们的笛卡尔积的大小会小很多。

我们还进行了一些其他优化，但我不打算将它们发布在这里，因为它开始对 SPARQL 引擎内部结构进行相当详细的讨论。如果有人对更多细节感兴趣，请发表评论或向我发送推文@RobVesse

原文

Can somebody please demonstrate for me a more efficient Cartesian product algorithm than the one I am using currently (assuming there is one). I've looked around SO and googled a bit but can't see anything obvious so I could be missing something.

foreach (int i in is) {
   foreach (int j in js) {
      //Pair i and j
   }
}

This is a highly simplified version of what I do in my code. The two integers are lookup keys which are used to retrieve one/more objects and all the objects from the two lookups are paired together into new objects.

This small block of code in a much larger more complex system becomes a major performance bottleneck as the dataset it's operating over scales. Some of this could likely be mitigated by improving the data structures used to store the objects and the lookups involved but the main issue I feel is still the computation of the Cartesian product itself.

Edit

So some more background on my specific usage of the algorithm to see if there may be any tricks that I can use in response to Marc's comment. The overall system is a SPARQL query engine which processes SPARQL queries over sets of Graph data, SPARQL is a pattern based language so each query consists of a series of patterns which are matched against the Graph(s). In the case where two subsequent patterns have no common variables (they are disjoint) it is necessary to compute the Cartesian product of the solutions produced by the two patterns to get the set of possible solutions for the overall query. There may be any number of patterns and I may have to compute Cartesian products multiple times which can lead to a fairly exponential expansion in possible solutions if the query is composed of a series of disjoint patterns.

Somehow from the existing answers I doubt whether there any tricks that could apply

Update

So I thought I would post an update on what I implemented in order to minimise the need to do Cartesian products and thus optimise the query engine generally. Note that it's not always possible to completely eliminate the need for products but it's nearly always possible to optimise so the size of the two sets being joined is much smaller.

Since each BGP (Basic Graph Pattern) which is a set of Triple Patterns is executed as a block (in essence) the engine is free to reorder patterns within a BGP for optimal performance. For example consider the following BGP:

?a :someProperty ?b .
?c :anotherProperty ?d .
?b a :Class .

Executed as is the query requires a cartesian product since the results of the first pattern are disjoint from the second pattern so the results of the first two patterns is a cartesian product of their individual results. This result will contain far more results than we actually need since the third pattern restricts the possible results of the first pattern but we don't apply this restriction till afterwards. But if we reorder like so:

?b a :Class .
?a :someProperty ?b .
?c :anotherProperty ?d .

We'll still need a cartesian product to get the final results since the 2nd and 3rd patterns are still disjoint but by reordering we restrict the size of the results of the 2nd pattern meaning the size of our cartesian product will be much smaller.

There are some various other optimisations we make but I'm not going to post them here as it starts to get into fairly detailed discussion of SPARQL engine internals. If anyone is interested in further details just leave a comment or send me a tweet @RobVesse

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉城凉梦凉人心 2024-08-18 10:20:01

笛卡尔积的复杂度是O(n²)，没有捷径。

在特定情况下，迭代两个轴的顺序很重要。例如，如果您的代码正在访问数组中的每个槽 - 或图像中的每个像素 - 那么您应该尝试按自然顺序访问槽。图像通常布置在“扫描线”中，因此任何 Y 上的像素都是相邻的。因此，您应该迭代外循环上的 Y 和内循环上的 X。

您是否需要笛卡尔积或是否需要更有效的算法取决于您要解决的问题。

回复收藏 0 原文

冷夜 2024-08-18 10:20:01

如果没有一些额外的知识，您无法真正改变嵌套循环的性能，但这将是特定于用途的。如果您在 is 中有 n 个项目，在 js 中有 m 个项目，那么它总是 O( n*m）。

不过，您可以更改它的外观：

var qry = from i in is
          from j in js
          select /*something involving i/j */;

这仍然是 O(n*m)，但名义上有 LINQ 的额外开销（在正常使用中您不会注意到它），尽管）。

您在您的案例中做什么？可能有技巧......

绝对要避免的一件事是强制交叉连接进行缓冲的任何事情 - foreach 方法很好并且不会缓冲 - 但如果将每个项目添加到 List 中，然后注意内存影响。同上 OrderBy 等（如果使用不当）。

You can't really change the performance of a nested loop without some additional knowledge, but that would be use-specific. If you have got n items in is and m items in js, it is always going to be O(n*m).

You can change the look of it though:

var qry = from i in is
          from j in js
          select /*something involving i/j */;

This is still O(n*m), but has nominal extra overhead of LINQ (you won't notice it in normal usage, though).

What are you doing in your case? There may be tricks...

One thing to definitely avoid is anything that forces a cross-join to buffer - the foreach approach is fine and doesn't buffer - but if you add each item to a List<>, then beware the memory implications. Ditto OrderBy etc (if used inappropriately).

回复收藏 0 原文

悲喜皆因你 2024-08-18 10:20:01

我无法提出比 O(n^2) 更好的方案，因为这是输出的大小，因此该算法不可能更快。

我可以建议的是使用另一种方法来确定是否需要计算乘积。例如，如果您只是要查询某些元素是否属于产品集 P，那么您甚至可能不需要该产品集。您只需要有关其组成的集合的信息。

事实上（伪代码）

bool IsInSet(pair (x,y), CartesianProductSet P)
{
   return IsInHash(x,P.set[1]) && IsInHash(y,P.set[2])
}

，笛卡尔积的计算方式如下：

// Cartesian product of A and B is
P.set[1]=A; P.set[2]=B;

如果您将集合实现为散列，那么在 m 集合的笛卡尔积中查找只是在 m 散列中查找免费获得。构建笛卡尔积和 IsInSet 查找都需要 O(m) 时间，其中 m 是相乘的集合数< /strong>，并且它远小于 n——每组的大小。

I can't propose anything better, than O(n^2), because that's the size of the output, and the algorithm hence can't be any faster.

What I can suggest is using another approach to whether you need to compute product. For example, you may not even need the product set P if only you are going to query whether certain elements belong to it. You only need the information about the sets it's composed of.

Indeed (pseudocode)

bool IsInSet(pair (x,y), CartesianProductSet P)
{
   return IsInHash(x,P.set[1]) && IsInHash(y,P.set[2])
}

where Cartesian product is calculated like this:

// Cartesian product of A and B is
P.set[1]=A; P.set[2]=B;

If you implement sets as hashes, then lookup in a cartesian product of m sets is just a lookup in m hashes you get for free. Construction of the cartesian product and IsInSet lookup each take O(m) time, where m is a number of sets you multiply, and it's much less than n--size of each set.

回复收藏 0 原文

二智少女 2024-08-18 10:20:01

附加信息已添加到问题中。

如果您记录已经计算过的内容，以避免再次重复它们，则可以避免重复项 - 假设这种簿记的成本 - 哈希图或简单列表 - 小于计算重复项的成本。

C# 运行时确实非常快，但对于极其繁重的工作，您可能需要使用本机代码。

您可能还会注意到这个问题的本质相似性。如果一个产品的计算不影响任何其他产品的计算，您可以直接使用多个处理器核心来并行完成工作。查看线程池。< a href="http://msdn.microsoft.com/en-us/library/system.threading.threadpool.queueuserworkitem(VS.80).aspx" rel="nofollow noreferrer">QueueUserWorkItem。

回复收藏 0 原文

不可一世的女人 2024-08-18 10:20:01

如果缓存局部性（或维护 j 所需的本地内存）是一个问题，您可以通过递归地平分输入数组来使算法对缓存更加友好。类似于：

cartprod(is,istart,ilen, js,jstart,jlen) {
  if(ilen <= IMIN && jlen <= JMIN) { // base case
    for(int i in is) {
      for(int j in js) {
        // pair i and j
      }
    }
    return;
  }
  if(ilen > IMIN && jlen > JMIN) { // divide in 4
    ilen2= ilen>>1;
    jlen2= jlen>>1;
    cartprod(is,istart,ilen2,            js,jstart,jlen2);
    cartprod(is,istart+ilen2,ilen-ilen2, js,jstart,jlen2);
    cartprod(is,istart+ilen2,ilen-ilen2, js,jstart+jlen2,jlen-jlen2);
    cartprod(is,istart,ilen2,            js,jstart+jlen2,jlen-jlen2);
    return;
  }
  // handle other cases...
}

请注意，此访问模式将自动充分利用所有级别的自动缓存；这种技术称为cache-oblivious算法设计。

If cache locality (or local memory required to maintain the j's) is a problem, you can make your algorithm more cache-friendly by bisecting the input arrays recursively. Something like:

cartprod(is,istart,ilen, js,jstart,jlen) {
  if(ilen <= IMIN && jlen <= JMIN) { // base case
    for(int i in is) {
      for(int j in js) {
        // pair i and j
      }
    }
    return;
  }
  if(ilen > IMIN && jlen > JMIN) { // divide in 4
    ilen2= ilen>>1;
    jlen2= jlen>>1;
    cartprod(is,istart,ilen2,            js,jstart,jlen2);
    cartprod(is,istart+ilen2,ilen-ilen2, js,jstart,jlen2);
    cartprod(is,istart+ilen2,ilen-ilen2, js,jstart+jlen2,jlen-jlen2);
    cartprod(is,istart,ilen2,            js,jstart+jlen2,jlen-jlen2);
    return;
  }
  // handle other cases...
}

Note that this access pattern will automatically take fairly good advantage of all levels of automatic cache; this kind of technique is called cache-oblivious algorithm design.

回复收藏 0 原文