Mallet CRF SimpleTagger 性能调优

发布于 2024-10-27 10:54:28 字数 482 浏览 3 评论 0原文

对于任何使用 Java 库 Mallet 的 SimpleTagger 类进行条件随机字段 (CRF) 的人来说，这是一个问题。假设我已经使用多线程选项来获得可用的最大 CPU 数量（就是这种情况）：我应该从哪里开始，如果我需要它运行得更快，我应该尝试哪些事情？

一个相关的问题是是否有一种方法可以做类似于随机梯度下降的事情，从而加快训练过程？

我想要做的训练类型很简单：（

Input:
Feature1 ... FeatureN SequenceLabel
...

Test Data:
Feature1 ... FeatureN
...

Output:

Feature1 ... FeatureN SequenceLabel
...

其中特征是我在自己的代码中对数据所做的处理的输出。）

我在让 Mallet 之外的任何 CRF 分类器大致工作时遇到了问题，但我可能会必须再次回溯并重新访问其他实现之一，或者尝试新的实现。

原文

A question for anyone who has used the Java library Mallet's SimpleTagger class for Conditional Random Fields (CRF). Assume that I'm already using the multi-thread option for the maximum number of CPUs I have available (this is the case): where would I start, and would kind of things should I try if I need it to run faster?

A related question is whether there is a way to do something similar to Stochastic Gradient Descent, which would speed up the training process?

The type of training I want to do is simple:

Input:
Feature1 ... FeatureN SequenceLabel
...

Test Data:
Feature1 ... FeatureN
...

Output:

Feature1 ... FeatureN SequenceLabel
...

(Where features are the output of processing I have done on the data in my own code.)

I've had problems getting any CRF classifier other than Mallet to approximately work, but I may have to backtrack again and revisit one of the other implementations, or try a new one.

分享到QQ

分享到微博