Mallet CRF SimpleTagger 性能调优
对于任何使用 Java 库 Mallet 的 SimpleTagger 类进行条件随机字段 (CRF) 的人来说,这是一个问题。假设我已经使用多线程选项来获得可用的最大 CPU 数量(就是这种情况):我应该从哪里开始,如果我需要它运行得更快,我应该尝试哪些事情?
一个相关的问题是是否有一种方法可以做类似于随机梯度下降的事情,从而加快训练过程?
我想要做的训练类型很简单:(
Input:
Feature1 ... FeatureN SequenceLabel
...
Test Data:
Feature1 ... FeatureN
...
Output:
Feature1 ... FeatureN SequenceLabel
...
其中特征是我在自己的代码中对数据所做的处理的输出。)
我在让 Mallet 之外的任何 CRF 分类器大致工作时遇到了问题,但我可能会必须再次回溯并重新访问其他实现之一,或者尝试新的实现。
A question for anyone who has used the Java library Mallet's SimpleTagger class for Conditional Random Fields (CRF). Assume that I'm already using the multi-thread option for the maximum number of CPUs I have available (this is the case): where would I start, and would kind of things should I try if I need it to run faster?
A related question is whether there is a way to do something similar to Stochastic Gradient Descent, which would speed up the training process?
The type of training I want to do is simple:
Input:
Feature1 ... FeatureN SequenceLabel
...
Test Data:
Feature1 ... FeatureN
...
Output:
Feature1 ... FeatureN SequenceLabel
...
(Where features are the output of processing I have done on the data in my own code.)
I've had problems getting any CRF classifier other than Mallet to approximately work, but I may have to backtrack again and revisit one of the other implementations, or try a new one.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,随机梯度下降通常比 Mallet 中使用的 L-BFGS 优化器快得多。我建议您尝试 CRFSuite,您可以通过 SGD 或 L-BFGS 进行训练。您还可以尝试一下 Léon Bottou 的基于 SGD 的实现,但设置起来比较困难。
除此之外,我相信 CRF++ 是最常用的 CRF 软件。不过它是基于 L-BFGS 的,所以对你来说可能不够快。
CRFSuite 和 CRF++ 都应该很容易上手。
请注意,如果您有大量标签,所有这些都会很慢。至少 CRFSuite 可以配置为仅考虑观察到的 label-n-gram(在 (n-1) 阶模型中),这通常会使训练和预测更快。
Yes, stochastic gradient descent is usually way faster than the L-BFGS optimizer used in Mallet. I would suggest you try CRFSuite, which you can train either by SGD or L-BFGS. You could also give Léon Bottou's SGD-based implementation a try, but that is more difficult to setup.
Otherwise, I believe that CRF++ is the most used CRF software around. It is based on L-BFGS though, so it might not be fast enough for you.
Both CRFSuite and CRF++ should be easy to get started with.
Note that all of these will be slow if you have a large number of labels. At least CRFSuite can be configured to only take into account observed label-n-grams - in an (n-1)th order model - which will typically make training and prediction much faster.
请看一下这篇论文:
http://www.stanford.edu/~acoates/papers/LeNgiCoaLahProNg11.pdf
随机梯度下降方法似乎很难调整和并行化。
Please have a look at this paper:
http://www.stanford.edu/~acoates/papers/LeNgiCoaLahProNg11.pdf
It seems stochastic gradient descent methods are difficult to tune and parallelize.