有没有基于perl或python的开源工具来生成文档思维导图摘要

发布于 2024-10-12 10:48:15 字数 185 浏览 6 评论 0原文

我真的在寻找一个工具包或现成的工具,它可以解析给定的文档,然后生成文档的思维导图的简短摘要。我知道Python有ntlk,而perl有很多模块,这将有助于自然语言解析等。 甚至可以使用类似 ntlk 的工具包编写一个工具来执行此操作,但由于缺乏时间。如果您知道一些这样的工具或有一些指向这样的工具的指针,如果您可以将其发布在这里,我将不胜感激,提前致谢。

I am really looking for a toolkit or readymade tool which will parse a given document and then generate a brief summary of better still a mindmap of the document. I know Python has ntlk and perl has quite a few modules which will help in natural language parsing etc.
It is even feasible to write a tool to do so, with using ntlk like tool kit, but for the lack of time. Would appreciate if you know of some such tool or has some pointer to such a tool, if you could post it here, with thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

┾廆蒐ゝ 2024-10-19 10:48:15

有人(在这里)已经 为您编写 (讨论)。另一个选择是 TexLexAn (文本分析器分类器摘要器)。

Someone (here on SO) has already written it for you (discussion). Another option would be TexLexAn (Text Analyzer Classifier Summarizer).

往日 2024-10-19 10:48:15

谷歌的人可能已经在研究这样的事情了。 ;-)

如果我的理解是对的,您需要一个工具来为您阅读一本书,然后为您简要总结这本书的全部内容,以便您可以腾出时间自己阅读。也许您对内容不感兴趣,而是想对材料进行分类,例如作为图书管理员。

对于非常结构化的文本来说,这在技术上是可能的,在一个非常专业的领域中有许多非常相似的文档,例如论文的数学证明或实验结果或医学报告。当然,有可能有一个工具可以区分小说和电话簿,对文学进行粗略的分类。显然,提供页数或字数、识别书面语言等非常容易,因为这些参数可以明确定义。

但可以肯定的是,计算机无法理解真实的故事,无论是对话性的还是休闲的故事。因此,要决定谁是好人、谁是坏人,或者手头的作品是一部以侦探为主角的爱情小说,还是一部侦探爱上某人的犯罪惊悚小说,机器没有机会决定任何一部作品的内容是什么。可行的内存量、CPU 能力和知识数据库。

如果您能更具体地说明您想要使用此工具的实际目的,也许会有所帮助。

Google people may already be working on such a thing. ;-)

If I get you right, you want a tool that will read a book for you and then briefly summarize for you what it was all about so you can spare the time reading it yourself. Maybe you're not interested in the contents but rather want to categorize the material, as a librarian for example.

That may be technically possible for very structured text with many very similar documents in a very specialized area, say mathematical proofs of theses or experimental results or medical reports. Surely it would be possible to have a tool that can distinguish between a novel and a phone book to roughly sort through literature. Obviously it's very easy to provide page or word counts, identify the written language etc. because these parameters can be clearly defined.

Quite surely though, computers will fail trying to get a grasp of actual stories, anything more conversational or casual. So to decide who's the good guy and who's the bad one, or whether the piece at hand is a love novel featuring detectives or a criminal thriller where a detective is in love with somebody, a machine would have no chance to decide what's what with any feasible amount of memory, CPU power, and knowledge database.

Maybe it would help if you could be more specific regarding the actual purpose for which you want to use this tool.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文