使用 NLTK 进行分块/文本解析
我正在尝试解析一些文本并将其绘制成图表,就像解析一个句子一样。我是 NLTK 的新手,正在尝试在 NLTK 中找到一些可以帮助我实现这一目标的东西。到目前为止,我已经看到了 nltk.ne_chunk
和 nltk.pos_tag
。我发现它们不是很有帮助,而且我找不到任何好的在线文档。
我也尝试过使用 LancasterStemmer
,但我不完全理解它的作用、应该如何使用它或者它为何存在。
有人可以帮我解决这个问题吗?我真的很茫然,在没有任何指路明灯的情况下感到非常沮丧。
提前致谢
I am trying to parse some text and diagram it, like you would a sentence. I am new to NLTK and am trying to find something in NLTK that will help me accomplish this. So far, I have seen nltk.ne_chunk
and nltk.pos_tag
. I find them to be not very helpful and I am not able to find any good online documentation.
I have also tried to use the LancasterStemmer
, but I don't fully understand what it does or how it should be used or why it even exists.
Can somebody please help me out with this? I'm really at a loss and getting quite frustrated without any guiding lights.
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你所描述的实际上是一项非常艰巨的任务,因为最终,你的程序成功还是失败完全是一个主观的衡量标准。在这种情况下,通常意味着构建一个程序来解决该问题很困难。有些人在大学里从事此类问题的研究并获得报酬。
如果您想尝试一下,我建议尝试使用某种自动词法分析工具,而不是尝试手动解析和注释,然后利用解析树。通常解析树表示句法分析,即句子的结构。另一方面,您更关心语义分析,即它的含义 - 或者至少两个句子是否相似或不同(这实际上比某些含义更容易)。
您可以研究一些现成的自动摘要工具。这些尝试根据句子对一段文本的重要性来对句子进行评分,并过滤掉重要性低于指定阈值的句子。这并不是说这真的对您有多大帮助,因为您仍然存在需要合并摘要的问题。
What you are describing is actually a really hard task, as in the end, whether your program has succeeded or failed is an entirely subjective measure. When this is the case, it usually means constructing a program to solve the problem is hard. There are people who get paid to work on these kind problems in universities.
If you wanted to have a stab at it, I'd suggest trying for to use some kind on automated lexical analysis tool rather than trying to manually parse and annotate, and then leverage your parse tree. Usually parse-trees represent syntactic analyses, ie the structure of the sentence. You on the other hand are concerned rather with semantic analysis, ie what it means - or at least whether two sentences are similar or different (which is actually a bit easier than what something means).
You could look into some off-the-shelf automatic summarization tools. These try to score sentences by how important they are to a piece of text and filter out sentences which are less important than a specified threshold. Not that this really helps you that much as you still have the problem of needing the merge the summaries.