分解/分解 nltk 中的复杂和复合句子
nltk或其他自然语言处理库中有没有办法将复杂句子分解为简单句子?
例如:
夕阳西下、微风徐徐的时候,公园真是太美妙了==>太阳正在落山。一阵凉风吹来。公园真是太棒了。
Is there a way to decompose complex sentences into simple sentences in nltk or other natural language processing libraries?
For example:
The park is so wonderful when the sun is setting and a cool breeze is blowing ==> The sun is setting. a cool breeze is blowing. The park is so wonderful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这比看起来要复杂得多,因此您不太可能找到完全干净的方法。
但是,使用 OpenNLP 中的英语解析器,我可以采用您的示例句子并获得以下语法树
: ,你可以随意挑选。您可以通过提取顶级 (NP *)(VP *) 减去 (SBAR *) 部分来获取子条款。然后您可以将 (SBAR *) 内的连词拆分为其他两个语句。
请注意,OpenNLP 解析器是使用 Penn Treebank 语料库进行训练的。我对您的示例句子进行了相当准确的解析,但解析器并不完美,并且在其他句子上可能会出现严重错误。 查看此处了解其标签的说明。它假设您已经对语言学和英语语法有一些基本的了解。
编辑:顺便说一句,这就是我从 Python 访问 OpenNLP 的方式。这假设您在 opennlp-tools-1.4.3 文件夹中有 OpenNLP jar 和模型文件。
This is much more complicated than it seems, so you're unlikely to find a perfectly clean method.
However, using the English parser in OpenNLP, I can take your example sentence and get a following grammar tree:
From there, you can pick it apart as you like. You can get your sub-clauses by extracting the top-level (NP *)(VP *) minus the (SBAR *) section. And then you could split the conjunction inside the (SBAR *) into the other two statements.
Note, the OpenNLP parser is trained using the Penn Treebank corpus. I obtained a pretty accurate parsing on your example sentence, but the parser isn't perfect and can be wildly wrong on other sentences. Look here for an explanation of its tags. It assumes you already have some basic understanding of linguistics and English grammar.
Edit: Btw, this is how I access OpenNLP from Python. This assumes you have the OpenNLP jar and model files in a opennlp-tools-1.4.3 folder.