我如何根据时态(现在时、过去时、将来时等)对句子进行分类?
我想解析文本并根据语法结构对句子进行分类,但我对 NLP 的了解很少,所以我什至不知道从哪里开始。
据我所读,我需要解析文本并找出(或标记?)每个单词的词性。然后我搜索动词从句或任何其他我想用来对句子进行分类的定义特征。
我不知道是否已经有某种方法可以更轻松地做到这一点,或者我是否需要单独定义语法规则或什么。
任何讨论这个问题的 NLP 资源都会很棒。也欢迎程序示例。我以前使用过 NLTK,但并不广泛。其他解析器或语言也可以!
I want to parse a text and categorize the sentences according to their grammatical structure, but I have a very small understanding of NLP so I don't even know where to start.
As far as I have read, I need to parse the text and find out (or tag?) the part-of-speech of every word. Then I search for the verb clause or whatever other defining characteristic I want to use to categorize the sentences.
What I don't know is if there is already some method to do this more easily or if I need to define the grammar rules separately or what.
Any resources on NLP that discuss this would be great. Program examples are welcome as well. I have used NLTK before, but not extensively. Other parsers or languages are OK too!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Python Natural Language Toolkit 是一个适合做这样的工作的库。与任何 NLP 库一样,您必须单独下载用于训练的数据集,并且也可以使用语料库(数据)和训练脚本。
还有一些示例教程 这将帮助您识别单词的词性。无论如何,我认为 nltk.org 应该是您寻找所需内容的地方。
具体问题可以在这里再发帖。
Python Natural Language Toolkit is a library which is suitable for doing such a work. As with any NLP library, you will have to download the dataset for training separately and corpus(data) and scripts for training are available too.
There are also certain example tutorials which will help you identify parts of the speech for words. By all means, I think nltk.org should be the place to go for what you are looking for.
Specific questions could be posted here again.
可能您需要简单地为每种类型的语法结构定义“名词动词名词”等模式,并在词性标记器输出序列中搜索匹配。
May be you need simply define patterns like "noun verb noun" etc for each type of grammatical structure and search matches in part-of-speach tagger output sequence.