如何计算句子后计算门票分类? (Python/NLP)
我训练了一个模型,将门票分为2个类别。我正在使用渐变boostClassifier。现在,我想调用一个函数,如果我在其中放任何句子,训练的模型将计算概率是1类还是类别2。我如何为此编写代码?
让我们想象我要使用的句子是票证描述:“实验室研究助理正在尝试创建临床活动报告”
def function(sentence):
#split the sentence into different words
Counter(" ".join(descr).split()).most_common
#remove stop words in this sentence
sentence.apply(remove_stopwords)
return list(sentence)
ticket = function('Lab Research Assistant is trying to create a Clinical Activity Report')
ticket
model.predict(ticket)
model.predict_proba(ticket)
谢谢!
I trained a model to classify tickets into 2 categories. I'm using GradientBoostClassifier. Now, I want to call on a function, where if I put any sentence in, the trained model would calculate the probability whether it will be category 1 or category 2. How do I write a code for this?
Let's imagine the sentence that I want to use is the ticket description: "Lab Research Assistant is trying to create a Clinical Activity Report"
def function(sentence):
#split the sentence into different words
Counter(" ".join(descr).split()).most_common
#remove stop words in this sentence
sentence.apply(remove_stopwords)
return list(sentence)
ticket = function('Lab Research Assistant is trying to create a Clinical Activity Report')
ticket
model.predict(ticket)
model.predict_proba(ticket)
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
注意:我给出了这个答案,假设您已经有了一个模型对句子进行分类并为您提供输出,因为您已经说过“我训练了一个模型,将门票分类为2个类别”。
如果您有一个模型,该模型对句子进行了分类,无需编写其他功能来确定概率,
因为分类是基于最终输出完成的,这也是一个概率矩阵。
举例来说,请与两个班级(就像您的那样)。然后有两个放节点。
节点0->第1类
节点1->第2类
如果节点0输出为0.943,则节点1输出将为(1-0.943)。由于概率始终总计1。输出矩阵为[0.943,0.057]。该句子属于1类。确定类时,也可以在确定类之前确定概率。您只需要获得分数即可。如果您使用的是第三方库,则必须已经有一个函数。如果您要从头开始构建模型,只需添加一条线以打印或返回概率分数即可。非常简单的
编辑:
在模型的培训中,
countvectorizer
已用于将培训文本数据转换为培训课程中的向量。您必须使用相同的矢量器(具有相同功能)来转换您要预测的句子,然后再将其传递给预测变量。Attention: I'm giving this answer assuming you already have a model which classifies sentences and gives you an output since you have said "I trained a model to classify tickets into 2 categories".
If you have a model which classifies the sentence already , There is no need to write another function to determine probability
Because of classification is done based on final output, which is also a matrix of probabilities.
For an example, take a case with two classes (just like yours). Then there are two out put nodes.
node 0 --> class 1
node 1 --> class 2
If node 0 output is 0.943 then node 1 output will be (1-0.943). because of probability always add up to 1. The output matrix is [0.943,0.057]. The sentence is belonged to class 1. When the class is determined, probability is determined also, even before determining the class. You just have to get the scores. There must be already a function if you are using a 3rd party library. If you are building a model from scratch, Just add a line to print or return probability scores. very simple
Edit:
In the training session of the model
A
CountVectorizer
has been used for turning training text data into vectors in the training session. You must use the same vectorizer (with same no of features) to convert sentences which you want to predict before passing them to the predictor.