在 mallet 中处理 CRF 的整数值特征
我刚刚开始在 mallet 中使用 SimpleTagger 类。我的印象是它需要二进制功能。我想要实现的模型具有正整数值特征,我想知道如何在 mallet 中实现它。另外,我听说如果模型要有意义,就需要对非二元特征进行标准化。如果有任何有关如何执行此操作的建议,我将不胜感激。
附:是的,我知道有一个专门的木槌邮件列表,但我已经等待了近一天才能让我的订阅获得批准才能在那里发布。我只是很着急而已。
I am just starting to use the SimpleTagger class in mallet. My impression is that it expects binary features. The model that I want to implement has positive integer-valued features and I wonder how to implement this in mallet. Also, I heard that non-binary features need to be normalized if the model is to make sense. I would appreciate any suggestions on how to do this.
ps. yes, I know that there is a dedicated mallet mail list but I am waiting for nearly a day already to get my subscription approved to be able to post there. I'm simply in a hurry.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,现在已经是 6 年后的事了。如果您不再着急,可以查看 Java API 来创建实例。一个最小的例子:
或者,如果您想继续使用
SimpleTagger
,只需定义二进制功能,例如HAS_1_LETTER
、HAS_2_LETTER
等,尽管这看起来很乏味。Well it's 6 years later now. If you're not in a hurry anymore, you could check out the Java API to create your instances. A minimal example:
Or, if you want to keep using
SimpleTagger
, just define binary features likeHAS_1_LETTER
,HAS_2_LETTER
, etc, though this seems tedious.