为我指明 NLP 数据结构和搜索算法的正确方向
我的学校作业是制作一个能够猜测输入语言的语言分析器。作业指出,这必须通过预先解析语言定义的文本并对所使用的字母、字母组合等进行统计,然后根据这些数据进行猜测来完成。
我们应该使用的数据结构是简单的多维哈希表,但我想借此机会学习更多有关实现结构等的知识。我想知道的是要阅读的内容。我对算法的了解非常有限,但我热衷于学习是否有人可以为我指明正确的方向。
在没有任何真正知识的情况下,只是阅读不同的帖子,我目前计划研究无向图作为字母组合的数据结构(并以某种方式将统计数据存储在图中)和博伊尔摩尔用于每个单词的搜索算法。
我是否完全走错了路,在这种情况下这些是不可能实现的,或者还有其他更好的方法来解决这个问题吗?
I've got a school assignment to make a language analyzer that's able to guess the language of an input. The assignment states this has to be done by pre-parsing language defined texts and making statistics about letters used, combinations of letter etc and then making a guess based on this data.
The data structure we're supposed to use is simple multi-dimensional hashtables but I'd like to take this opportunity to learn a bit more about implementing structures etc. What'd I'd like to know is what to read up about. My knowledge of algorithms is very limited but I'm keen on learning if someone could point me in the right direction.
Without any real knowledge and just reading up on different posts I'm currently planing on studying undirected graphs as a datastructure for letter combinations (and somehow storing the statistics within the graph as well) and boyer-moore for the per-word search algorithm.
Am I totally on the wrong track and these would be impossible to implement in this situation or is there something else superior for this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您能拿到 Cormen 等人的副本。 “算法简介”
http://www.amazon.com/Introduction-Algorithms-Second -Thomas-Cormen/dp/0262032937
这是一本非常非常好的关于数据结构和算法的书。
If you can get your hands on a copy of Cormen et al. "Introduction to Algorithms"
http://www.amazon.com/Introduction-Algorithms-Second-Thomas-Cormen/dp/0262032937
It's a very very good book to read up on data structures and algorithms.
使用字符三元组进行语言检测
Language detection using character trigrams