如何为实体提取（使用SVM或PercePtron）部署ML模型

发布于 2025-02-13 08:01:14 字数 1689 浏览 1 评论 0原文

我正在尝试从空缺中提取软技能，目前我正在努力将ML型号（SVM或PercePtron）应用于我的数据集。首先，我制作了一个自定义的旋转ner，最终导致了一个数据集供模型训练。这个数据集看起来像这样：

        words   POS        labels
37         We  PRON             O
38        are   AUX             O
39     always   ADV             O
40         on   ADP             O
41        the   DET             O
42    lookout  NOUN             O
43        for   ADP             O
44  motivated   ADJ  I-SOFT_SKILL
45       life  NOUN             O
46    science  NOUN             O

我想以不同的空缺提取技能。我创建了一个似乎正在起作用的感知器。

print('the shape is: ',X_train.shape, y_train.shape)
per = Perceptron(verbose=10, n_jobs=-1, max_iter=10)
per.partial_fit(X_train, y_train, classes)

输出

the shape is:  (51759, 6155) (51759,)

[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 2 concurrent workers.
-- Epoch 1-- Epoch 1

Norm: 17.15, NNZs: 95, Bias: -3.000000, T: 51759, Avg. loss: 0.009795
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 37.97, NNZs: 378, Bias: -3.000000, T: 51759, Avg. loss: 0.018296
Total training time: 0.57 seconds.
Norm: 42.68, NNZs: 457, Bias: 3.000000, T: 51759, Avg. loss: 0.025696
Total training time: 0.50 seconds.
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    1.1s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    1.1s finished
Perceptron(max_iter=10, n_jobs=-1, verbose=10)

现在，当我尝试将模型应用于完全看不见的数据时，我会得到一个ValueError：

ValueError: X has 6747 features, but Perceptron is expecting 6155 features as input.

那么看不见的空缺中的功能数量是否与培训模型中的功能完全相同？还是还有另一种更好的方法？

如果需要其他信息，我可以提供！如果有人知道这个问题，那将是感谢的！

原文

I am trying to extract soft skills from vacancies and I am currently struggling with applying a ML model (SVM or Perceptron) to my dataset. First I made a custom spaCy NER that eventually resulted in a dataset for the model to train upon. This dataset looks like this:

        words   POS        labels
37         We  PRON             O
38        are   AUX             O
39     always   ADV             O
40         on   ADP             O
41        the   DET             O
42    lookout  NOUN             O
43        for   ADP             O
44  motivated   ADJ  I-SOFT_SKILL
45       life  NOUN             O
46    science  NOUN             O

I want to extract the skills in different vacancies. And I created a perceptron that seems to be working.

print('the shape is: ',X_train.shape, y_train.shape)
per = Perceptron(verbose=10, n_jobs=-1, max_iter=10)
per.partial_fit(X_train, y_train, classes)

Output

the shape is:  (51759, 6155) (51759,)

[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 2 concurrent workers.
-- Epoch 1-- Epoch 1

Norm: 17.15, NNZs: 95, Bias: -3.000000, T: 51759, Avg. loss: 0.009795
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 37.97, NNZs: 378, Bias: -3.000000, T: 51759, Avg. loss: 0.018296
Total training time: 0.57 seconds.
Norm: 42.68, NNZs: 457, Bias: 3.000000, T: 51759, Avg. loss: 0.025696
Total training time: 0.50 seconds.
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    1.1s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    1.1s finished
Perceptron(max_iter=10, n_jobs=-1, verbose=10)

Now, when I try to apply the model to completely unseen data, I get a ValueError:

ValueError: X has 6747 features, but Perceptron is expecting 6155 features as input.

so does the amount of features in the unseen vacancies need to be exactly the same as in the training model? Or is there another, better way to do this?

If additional information is needed I can provide that! Would be thankful if somebody knew the problem!

分享到QQ

分享到微博