如何为实体提取(使用SVM或PercePtron)部署ML模型
我正在尝试从空缺中提取软技能,目前我正在努力将ML型号(SVM或PercePtron)应用于我的数据集。首先,我制作了一个自定义的旋转ner,最终导致了一个数据集供模型训练。这个数据集看起来像这样:
words POS labels
37 We PRON O
38 are AUX O
39 always ADV O
40 on ADP O
41 the DET O
42 lookout NOUN O
43 for ADP O
44 motivated ADJ I-SOFT_SKILL
45 life NOUN O
46 science NOUN O
我想以不同的空缺提取技能。我创建了一个似乎正在起作用的感知器。
print('the shape is: ',X_train.shape, y_train.shape)
per = Perceptron(verbose=10, n_jobs=-1, max_iter=10)
per.partial_fit(X_train, y_train, classes)
输出
the shape is: (51759, 6155) (51759,)
[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 2 concurrent workers.
-- Epoch 1-- Epoch 1
Norm: 17.15, NNZs: 95, Bias: -3.000000, T: 51759, Avg. loss: 0.009795
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 37.97, NNZs: 378, Bias: -3.000000, T: 51759, Avg. loss: 0.018296
Total training time: 0.57 seconds.
Norm: 42.68, NNZs: 457, Bias: 3.000000, T: 51759, Avg. loss: 0.025696
Total training time: 0.50 seconds.
[Parallel(n_jobs=-1)]: Done 3 out of 3 | elapsed: 1.1s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 3 out of 3 | elapsed: 1.1s finished
Perceptron(max_iter=10, n_jobs=-1, verbose=10)
现在,当我尝试将模型应用于完全看不见的数据时,我会得到一个ValueError:
ValueError: X has 6747 features, but Perceptron is expecting 6155 features as input.
那么看不见的空缺中的功能数量是否与培训模型中的功能完全相同?还是还有另一种更好的方法?
如果需要其他信息,我可以提供!如果有人知道这个问题,那将是感谢的!
I am trying to extract soft skills from vacancies and I am currently struggling with applying a ML model (SVM or Perceptron) to my dataset. First I made a custom spaCy NER that eventually resulted in a dataset for the model to train upon. This dataset looks like this:
words POS labels
37 We PRON O
38 are AUX O
39 always ADV O
40 on ADP O
41 the DET O
42 lookout NOUN O
43 for ADP O
44 motivated ADJ I-SOFT_SKILL
45 life NOUN O
46 science NOUN O
I want to extract the skills in different vacancies. And I created a perceptron that seems to be working.
print('the shape is: ',X_train.shape, y_train.shape)
per = Perceptron(verbose=10, n_jobs=-1, max_iter=10)
per.partial_fit(X_train, y_train, classes)
Output
the shape is: (51759, 6155) (51759,)
[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 2 concurrent workers.
-- Epoch 1-- Epoch 1
Norm: 17.15, NNZs: 95, Bias: -3.000000, T: 51759, Avg. loss: 0.009795
Total training time: 0.55 seconds.
-- Epoch 1
Norm: 37.97, NNZs: 378, Bias: -3.000000, T: 51759, Avg. loss: 0.018296
Total training time: 0.57 seconds.
Norm: 42.68, NNZs: 457, Bias: 3.000000, T: 51759, Avg. loss: 0.025696
Total training time: 0.50 seconds.
[Parallel(n_jobs=-1)]: Done 3 out of 3 | elapsed: 1.1s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 3 out of 3 | elapsed: 1.1s finished
Perceptron(max_iter=10, n_jobs=-1, verbose=10)
Now, when I try to apply the model to completely unseen data, I get a ValueError:
ValueError: X has 6747 features, but Perceptron is expecting 6155 features as input.
so does the amount of features in the unseen vacancies need to be exactly the same as in the training model? Or is there another, better way to do this?
If additional information is needed I can provide that! Would be thankful if somebody knew the problem!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论