使用 Lingpipe 进行词级语言模型
我一直在尝试让单词级语言模型在 lingpipe 上工作。我遇到的所有示例和教程都展示了字符-n-语法模型。如何使用 lingpipe 训练单词级模型,然后使用该模型在其他文档上进行测试?
此外,我注意到 TokenizedLM 不可序列化。有没有办法可以保存它并稍后加载,而不必每次都进行重新训练?
最后,是否有任何其他框架/工具可以让我无需任何编码即可完成此操作?
I have been trying to get a word-level language model to work on lingpipe. All the examples and tutorials I have come across show the character-n-gram model. How to I go about using lingpipe to train a word-level model and then use that model to test it on other documents?
Additionally, I noticed that TokenizedLM is not serializable. Is there no way I can save it and load it later without having to go through re-training every time?
Lastly, are there any other frameworks/tools that will allow me to do this without any coding on my part?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不了解 Java,但如果您不局限于该编程语言,可以使用 Python NLTK,其中有分词器和ngram-models 和很多 < a href="http://nltk.googlecode.com/svn/trunk/doc/api/nltk-module.html" rel="nofollow">其他内容。还有一本书,可以用作介绍和学习获得概览。
I don't know about Java, but if you are not bound to that programming language there is the Python NLTK, which has tokenizers and ngram-models and lots of other stuff. There is also a book which can be used as introduction and to get an overview.