使用 LINQ 进行简单语言识别
我第一次尝试 LINQ,并决定尝试基本的人类语言识别。输入文本会针对该语言中最常见的 10,000 个单词的 HashSet
进行测试,并获得分数。
我的问题是,是否有更好的 LINQ 查询方法?也许是我不知道的另一种形式?它有效,但我相信这里的专家将能够提供一个更干净的解决方案!
public PolyAnalyzer() {
Dictionaries = new Dictionary<string, AbstractDictionary>();
Dictionaries.Add("Bulgarian", new BulgarianDictionary());
Dictionaries.Add("English", new EnglishDictionary());
Dictionaries.Add("German", new GermanDictionary());
Dictionaries.Values.Select(n => new Thread(() => n.LoadDictionaryAsync())).ToList().ForEach(n => n.Start());
}
public string getResults(string text) {
int total = 0;
return string.Join(" ",
Dictionaries.Select(n => new {
Language = n.Key,
Score = new Regex(@"\W+").Split(text).AsQueryable().Select(m => n.Value.getScore(m)).Sum()
}).
Select(n => { total += n.Score; return n; }).
ToList().AsQueryable(). // Force immediate evaluation
Select(n =>
"[" + n.Score * 100 / total + "% " + n.Language + "]").
ToArray());
}
PS 我知道这是一种极其简单的语言识别方法,我只对 LINQ 方面感兴趣。
I'm experimenting with LINQ for the first time and decided to try basic human language identification. The input text gets tested against HashSet
s of the most common 10,000 words in the language and receives a score.
My question is, is there a better approach to the LINQ query? Maybe the other form that I don't know? It works, but I'm sure that the experts here will be able to provide a much cleaner solution!
public PolyAnalyzer() {
Dictionaries = new Dictionary<string, AbstractDictionary>();
Dictionaries.Add("Bulgarian", new BulgarianDictionary());
Dictionaries.Add("English", new EnglishDictionary());
Dictionaries.Add("German", new GermanDictionary());
Dictionaries.Values.Select(n => new Thread(() => n.LoadDictionaryAsync())).ToList().ForEach(n => n.Start());
}
public string getResults(string text) {
int total = 0;
return string.Join(" ",
Dictionaries.Select(n => new {
Language = n.Key,
Score = new Regex(@"\W+").Split(text).AsQueryable().Select(m => n.Value.getScore(m)).Sum()
}).
Select(n => { total += n.Score; return n; }).
ToList().AsQueryable(). // Force immediate evaluation
Select(n =>
"[" + n.Score * 100 / total + "% " + n.Language + "]").
ToArray());
}
P.S. I'm aware that this is an extremely simplistic approach to language identification, I'm just interested in the LINQ side of things.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会这样重构它:
几点:
AsQueryAble() 是不必要的 -
这就是 Linq to Objects 的全部内容,
是
IEnumerable
- 足够好。还删除了一些
ToList()
不必要并避免急切加载
不需要时的结果。
虽然只有一个 LINQ 很好
查询这不是一场比赛 - 目标
为了整体的可读性并思考如何
您(和其他人)必须维护代码。我将您的查询分成三个更易读 (imo) 的部分。
无论如何避免副作用
可能 - 我删除了你的那个
到变量
total
- 它是令人困惑 - LINQ 查询不应该
有副作用,因为运行相同的查询两次可能会产生不同的结果。在您的情况下,您可以只在单独的 Linq 查询中计算总数。
不要在 Linq 中重新新建或重新计算变量
如果不需要投影 - I
从 Linq 中删除了正则表达式
查询并初始化变量
一旦出去 - 否则你就是
重新更新 Regex 实例
N
次而不是仅仅一次。根据查询的不同,这可能会产生巨大的性能影响。
I would refactor it like this:
A few points:
The
AsQueryAble()
are unnecessary -this is all Linq to Objects, which
is
IEnumerable<T>
- good enough.Removed a few
ToList()
- alsounnecessary and avoids eager loading
of results when not needed.
While its nice having just one LINQ
query it's not a competition - aim
for readability overall and think about how
you (and others) have to maintain the code. I split up your query into three more readable (imo) parts.
Avoid side effects by all means
possible - I removed the one you had
to the variable
total
- it'sconfusing - LINQ queries shouldn't
have side effects, because running the same query twice might yield different results. In your case you can just calculate the total in a separate Linq query.
Don't re-new or re-calculate variables inside a Linq
projection if not necessary - I
removed the regex from the Linq
query and initialized the variable
once outside - otherwise you are
re-newing the Regex instance
N
timesinstead of just once. This might have huge performance implications depending on the query.
我认为您发布的代码非常混乱。我重写了它,我认为它给了你相同的结果(当然我无法测试它,实际上我认为你的代码有一些错误的部分),但现在应该更加简洁。如果这不正确,请告诉我。
I think the code you posted is very confusing. I've rewritten it and I think it gives you the same result (of course I couldn't test it and actually I think you're code has some wrong parts to it) but it should be much more concise now. Let me know if this is incorrect.