在列表中查找相同的模式

发布于 2025-01-05 18:26:26 字数 674 浏览 1 评论 0原文

假设我们有以下列表:

    List<int> Journey1 = new List<int>() { 1, 2, 3, 4, 5 };
    List<int> Journey2 = new List<int>() { 2, 3, 4, 6, 7, 3, 4 };
    List<int> Journey3 = new List<int>() { 6, 7, 1 };
    List<int> Journey4 = new List<int>() { 3, 1, 4 };

模式为:

2, 3, 4 -> Journey1, Journey2;
6, 7    -> Journey2, Journey3;
1       -> Journey2, Journey3, Journey4;
5       -> Journey1;
3, 4    -> Journey2;
3       -> Journey4;
4       -> Journey4;

我们有 5000 个列表,每个列表大约有 200 个项目,因此模式可以有 1-200 个项目,并且可以在 1-5000 个列表中看到。

因此我需要非常快速的模式匹配方法。

Assume we have below lists:

    List<int> Journey1 = new List<int>() { 1, 2, 3, 4, 5 };
    List<int> Journey2 = new List<int>() { 2, 3, 4, 6, 7, 3, 4 };
    List<int> Journey3 = new List<int>() { 6, 7, 1 };
    List<int> Journey4 = new List<int>() { 3, 1, 4 };

And the patterns are:

2, 3, 4 -> Journey1, Journey2;
6, 7    -> Journey2, Journey3;
1       -> Journey2, Journey3, Journey4;
5       -> Journey1;
3, 4    -> Journey2;
3       -> Journey4;
4       -> Journey4;

We have 5000 lists, and each has around 200 items, so the patterns can have between 1-200 items and can be seen in 1-5000 lists.

Therefore I need very fast way of pattern matching.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

眼眸里的快感 2025-01-12 18:26:26

无需预先计算并使用简单的即时搜索:

var matchedJourneys = journeys.Where(x => ContainsPattern(x, mypattern));

bool ContainsPattern(List<int> list, List<int> pattern)
{
    for(int i = 0; i < list.Count - (pattern.Count - 1); i++)
    {
        var match = true;
        for(int j = 0; j < pattern.Count; j++)  
            if(list[i + j] != pattern[j])
                     {
                         match = false;
                         break;
                     }
        if(match) return true;
    }
    return false;
}

这将最多执行 2 亿个等于检查您的“数字”。但由于预计不会对整个模式执行检查,因此如果检查所有列表,则可能(只是猜测)约 500 万次等于操作。那是几百毫秒。

这完全取决于什么对您来说“非常快”。如果这太慢,您将需要一种更复杂的方法......

Without precomputation and with a naive on-the-fly search:

var matchedJourneys = journeys.Where(x => ContainsPattern(x, mypattern));

bool ContainsPattern(List<int> list, List<int> pattern)
{
    for(int i = 0; i < list.Count - (pattern.Count - 1); i++)
    {
        var match = true;
        for(int j = 0; j < pattern.Count; j++)  
            if(list[i + j] != pattern[j])
                     {
                         match = false;
                         break;
                     }
        if(match) return true;
    }
    return false;
}

This will execute at max 200 million equals checks for your 'numbers'. But since checks are not expected to be executed for whole patterns, that could be (just a guess) ~5 million equals operations if checking all the lists. That's a few hundred milliseconds.

It all depends on what is 'very fast' for you. If that's too slow, you will need a much much more complicated approach ...

关于从前 2025-01-12 18:26:26

我不确定你想要什么作为输出。我刚刚尝试了一下。

我建议您创建一个列表列表,而不是声明单独的列表变量。

List<List<int>> journeys = new List<List<int>>();
journeys.Add(new List<int>() { 1, 2, 3, 4, 5 });
journeys.Add(new List<int>() { 2, 3, 4, 6, 7, 3, 4 });
journeys.Add(new List<int>() { 6, 7, 1 });
journeys.Add(new List<int>() { 3, 1, 4 });

我假设数字范围从 0 到 255。通过此查询

var result = Enumerable.Range(0, 256)
    .Select(number => new
    {
        number,
        listIndexes = journeys
            .Select((list, index) => new { index, list })
            .Where(a => a.list.Contains(number))
            .Select(a => a.index)
            .ToList()
    })
    .Where(b => b.listIndexes.Count > 0)
    .ToList();

和此测试循环,

foreach (var item in result) {
    Console.Write("Number {0} occurs in list # ", item.number);
    foreach (var index in item.listIndexes) {
        Console.Write("{0} ", index);
    }
    Console.WriteLine();
}

您将得到此结果

    Number 1 occurs in list # 0 2 3 
    Number 2 occurs in list # 0 1 
    Number 3 occurs in list # 0 1 3 
    Number 4 occurs in list # 0 1 3 
    Number 5 occurs in list # 0 
    Number 6 occurs in list # 1 2 
    Number 7 occurs in list # 1 2 

,其中列表编号从零开始。

I am not sure what you want as output. I just made a Try.

I suggest that you make a list of lists, instead of declaring individual list variables.

List<List<int>> journeys = new List<List<int>>();
journeys.Add(new List<int>() { 1, 2, 3, 4, 5 });
journeys.Add(new List<int>() { 2, 3, 4, 6, 7, 3, 4 });
journeys.Add(new List<int>() { 6, 7, 1 });
journeys.Add(new List<int>() { 3, 1, 4 });

I assumed that the numbers range from 0 to 255. With this query

var result = Enumerable.Range(0, 256)
    .Select(number => new
    {
        number,
        listIndexes = journeys
            .Select((list, index) => new { index, list })
            .Where(a => a.list.Contains(number))
            .Select(a => a.index)
            .ToList()
    })
    .Where(b => b.listIndexes.Count > 0)
    .ToList();

and this test loop

foreach (var item in result) {
    Console.Write("Number {0} occurs in list # ", item.number);
    foreach (var index in item.listIndexes) {
        Console.Write("{0} ", index);
    }
    Console.WriteLine();
}

you will get this result

    Number 1 occurs in list # 0 2 3 
    Number 2 occurs in list # 0 1 
    Number 3 occurs in list # 0 1 3 
    Number 4 occurs in list # 0 1 3 
    Number 5 occurs in list # 0 
    Number 6 occurs in list # 1 2 
    Number 7 occurs in list # 1 2 

Where the lists are numbered starting at zero.

べ繥欢鉨o。 2025-01-12 18:26:26

对于暴力方法,您可以尝试使用多项式哈希函数来加速子部分匹配。仍然需要疯狂的比较次数,但至少匹配可以几乎恒定,而不管子序列长度如何。

For brute force approach you can try to use polynomial hash-functions to speed up sub-section matches. Still insane number of comparisons required, but at least match could be almost constant irrespective of sub-sequence length.

撕心裂肺的伤痛 2025-01-12 18:26:26

在您的情况下,有机会从模式预处理和文本预处理中受益(http://en.wikipedia.org/wiki/String_searching_algorithm)。

例如,为列表中的所有子序列构建一个字典树将允许在与模式长度成比例的时间内查询该列表以查找给定模式。

In your case there are opportunities to benefit from pattern preprocessing as well as text preprocessing (http://en.wikipedia.org/wiki/String_searching_algorithm).

For instance, constructing a trie for all subsequences in a list will allow to query this list for a given pattern in time proportional to the pattern length.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文