正在寻找 C# 中的后缀树实现?

发布于 2024-07-06 21:26:58 字数 251 浏览 11 评论 0原文

我已经对一个研究项目进行了基本搜索。 我试图通过构建 后缀树 来提高搜索效率。 我对 Ukkonen 算法的 C# 实现感兴趣。 如果存在这样的实现,我不想浪费时间自己进行。

I've implemented a basic search for a research project. I'm trying to make the search more efficient by building a suffix tree. I'm interested in a C# implementation of the Ukkonen algorith. I don't want to waste time rolling my own if such implementation exists.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

靖瑶 2024-07-13 21:26:58

难以回答的问题。 这是我能找到的最接近的匹配: http://www.codeproject.com/KB/ Recipes/ahocorasick.aspx,它是 Aho-Corasick 字符串匹配算法的实现。 现在,该算法使用后缀树状结构: http://en.wikipedia.org /wiki/Aho-Corasick_algorithm

现在,如果您想要一个前缀树,本文声称为您提供了一个实现:http://www.codeproject.com/KB/recipes/prefixtree.aspx <

幽默> 现在我已经完成了你的作业,你来修剪我的草坪怎么样? (参考:http://flyingmoose.org/tolksarc/homework.htm/幽默>

编辑:我发现了一个 C# 后缀树实现,它是发布在博客上的 C++ 后缀树实现:http://code.google.com/p/csharsuffixtree/source/browse/#svn/trunk/suffixtree

编辑:Codeplex 有一个专注于后缀树的新项目:http://suffixtree.codeplex.com/

Hard question. Here's the closest to match I could find: http://www.codeproject.com/KB/recipes/ahocorasick.aspx, which is an implementation of the Aho-Corasick string matching algorithm. Now, the algorithm uses a suffix-tree-like structure per: http://en.wikipedia.org/wiki/Aho-Corasick_algorithm

Now, if you want a prefix tree, this article claims to have an implementation for you: http://www.codeproject.com/KB/recipes/prefixtree.aspx

<HUMOR> Now that I did your homework, how about you mow my lawn. (Reference: http://flyingmoose.org/tolksarc/homework.htm) </HUMOR>

Edit: I found a C# suffix tree implementation that was a port of a C++ one posted on a blog: http://code.google.com/p/csharsuffixtree/source/browse/#svn/trunk/suffixtree

Edit: There is a new project at Codeplex that is focused on suffix trees: http://suffixtree.codeplex.com/

痴意少年 2024-07-13 21:26:58

嘿,刚刚完成包含不同 trie 实现的 .NET (c#) 库的实现。 其中:

  • 经典 trie
  • Patricia trie
  • 后缀 trie
  • 使用 Ukkonen 算法的 trie

我试图使源代码易于阅读。 用法也非常简单:

using Gma.DataStructures.StringSearch;

...

var trie = new UkkonenTrie<int>(3);
//var trie = new SuffixTrie<int>(3);

trie.Add("hello", 1);
trie.Add("world", 2);
trie.Add("hell", 3);

var result = trie.Retrieve("hel");

该库经过了充分测试,并且还作为 TrieNet NuGet 包发布。

请参阅 github.com/gmamaladze/trienet

Hei, just finished implementing .NET (c#) library containing different trie implementations. Among them:

  • Classical trie
  • Patricia trie
  • Suffix trie
  • A trie using Ukkonen's algorithm

I tried to make source code easy readable. Usage is also very straight forward:

using Gma.DataStructures.StringSearch;

...

var trie = new UkkonenTrie<int>(3);
//var trie = new SuffixTrie<int>(3);

trie.Add("hello", 1);
trie.Add("world", 2);
trie.Add("hell", 3);

var result = trie.Retrieve("hel");

The library is well tested and also published as TrieNet NuGet package.

See github.com/gmamaladze/trienet

冰葑 2024-07-13 21:26:58

这是一个相当有效的后缀树的实现。 我没有研究过 Ukkonen 的实现,但我认为这个算法的运行时间是相当合理的,大约是 O(N Log N)。 请注意,创建的树中的内部节点数等于父字符串中的字母数。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using NUnit.Framework;

namespace FunStuff
{
    public class SuffixTree
    {
        public class Node
        {
            public int Index = -1;
            public Dictionary<char, Node> Children = new Dictionary<char, Node>();
        }

        public Node Root = new Node();
        public String Text;

        public void InsertSuffix(string s, int from)
        {             
            var cur = Root;
            for (int i = from; i < s.Length; ++i)
            {
                var c = s[i];
                if (!cur.Children.ContainsKey(c))
                {
                    var n = new Node() {Index = from};
                    cur.Children.Add(c, n);

                    // Very slow assertion. 
                    Debug.Assert(Find(s.Substring(from)).Any());

                    return;
                }
                cur = cur.Children[c];
            }
            Debug.Assert(false, "It should never be possible to arrive at this case");
            throw new Exception("Suffix tree corruption");
        }

        private static IEnumerable<Node> VisitTree(Node n)
        {
            foreach (var n1 in n.Children.Values)
                foreach (var n2 in VisitTree(n1))
                    yield return n2;
            yield return n;
        }

        public IEnumerable<int> Find(string s)
        {
            var n = FindNode(s);
            if (n == null) yield break;
            foreach (var n2 in VisitTree(n))
                yield return n2.Index;
        }

        private Node FindNode(string s)
        {
            var cur = Root;
            for (int i = 0; i < s.Length; ++i)
            {
                var c = s[i];
                if (!cur.Children.ContainsKey(c))
                {
                    // We are at a leaf-node.
                    // What we do here is check to see if the rest of the string is at this location. 
                    for (var j=i; j < s.Length; ++j)
                        if (cur.Index + j >= Text.Length || Text[cur.Index + j] != s[j])
                            return null;
                    return cur;
                }
                cur = cur.Children[c];
            }
            return cur;
        }

        public SuffixTree(string s)
        {
            Text = s;
            for (var i = s.Length - 1; i >= 0; --i)
                InsertSuffix(s, i);
            Debug.Assert(VisitTree(Root).Count() - 1 == s.Length);
        }
    }

    [TestFixture]
    public class TestSuffixTree
    {
        [Test]
        public void TestBasics()
        {
            var s = "banana";
            var t = new SuffixTree(s);
            var results = t.Find("an").ToArray();
            Assert.AreEqual(2, results.Length);
            Assert.AreEqual(1, results[0]);
            Assert.AreEqual(3, results[1]);
        }
    } 
}

Here is an implementation of a suffix tree that is reasonably efficient. I haven't studied Ukkonen's implementation, but the running time of this algorithm I believe is quite reasonable, approximately O(N Log N). Note the number of internal nodes in the tree created is equal to the number of letters in the parent string.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using NUnit.Framework;

namespace FunStuff
{
    public class SuffixTree
    {
        public class Node
        {
            public int Index = -1;
            public Dictionary<char, Node> Children = new Dictionary<char, Node>();
        }

        public Node Root = new Node();
        public String Text;

        public void InsertSuffix(string s, int from)
        {             
            var cur = Root;
            for (int i = from; i < s.Length; ++i)
            {
                var c = s[i];
                if (!cur.Children.ContainsKey(c))
                {
                    var n = new Node() {Index = from};
                    cur.Children.Add(c, n);

                    // Very slow assertion. 
                    Debug.Assert(Find(s.Substring(from)).Any());

                    return;
                }
                cur = cur.Children[c];
            }
            Debug.Assert(false, "It should never be possible to arrive at this case");
            throw new Exception("Suffix tree corruption");
        }

        private static IEnumerable<Node> VisitTree(Node n)
        {
            foreach (var n1 in n.Children.Values)
                foreach (var n2 in VisitTree(n1))
                    yield return n2;
            yield return n;
        }

        public IEnumerable<int> Find(string s)
        {
            var n = FindNode(s);
            if (n == null) yield break;
            foreach (var n2 in VisitTree(n))
                yield return n2.Index;
        }

        private Node FindNode(string s)
        {
            var cur = Root;
            for (int i = 0; i < s.Length; ++i)
            {
                var c = s[i];
                if (!cur.Children.ContainsKey(c))
                {
                    // We are at a leaf-node.
                    // What we do here is check to see if the rest of the string is at this location. 
                    for (var j=i; j < s.Length; ++j)
                        if (cur.Index + j >= Text.Length || Text[cur.Index + j] != s[j])
                            return null;
                    return cur;
                }
                cur = cur.Children[c];
            }
            return cur;
        }

        public SuffixTree(string s)
        {
            Text = s;
            for (var i = s.Length - 1; i >= 0; --i)
                InsertSuffix(s, i);
            Debug.Assert(VisitTree(Root).Count() - 1 == s.Length);
        }
    }

    [TestFixture]
    public class TestSuffixTree
    {
        [Test]
        public void TestBasics()
        {
            var s = "banana";
            var t = new SuffixTree(s);
            var results = t.Find("an").ToArray();
            Assert.AreEqual(2, results.Length);
            Assert.AreEqual(1, results[0]);
            Assert.AreEqual(3, results[1]);
        }
    } 
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文