阿拉伯文本文件搜索和索引

发布于 2024-11-23 19:24:49 字数 429 浏览 1 评论 0原文

我正在开展一个电子图书馆项目(阿拉伯语书籍)。一个程序,允许用户将他的书籍导入系统图书馆并针对他的图书馆执行搜索。该系统向用户提供了一个基本图书馆(书籍集),用户可以稍后更新该图书馆。

为了处理搜索问题,我认为系统在数据库中有一个用于基本搜索关键字的初始表。每个搜索关键字都指向其在图书馆图书中的位置。

当用户将新书导入图书馆时会出现此问题。有两步。 首先针对新书搜索已经进入系统的关键词,看看其中是否有任何关键词出现在书中,并将其位置添加到系统中。 第二个主要障碍是在新书中确定新的搜索关键词。

我的想法(我认为这是非常糟糕和天真的)是将新书分解为标记,然后根据以前在图书馆中找到的所有书籍搜索每个标记。

总而言之,如果有任何帮助(工具、库或数据库选项)或解决第二个问题的想法或整个系统的另一个想法,我很感激。确实尝试阅读和搜索很多解决方案,但徒劳无功。

多谢,

I am working on a project of an electronic library (for Arabic books). A program that allows the user to import his books into the systems library and perform searching against his library. The system is delivered to the user with a basic library (set of books) that the user ca update later.

To handle the searching problems, i thought for the system to have an initial table in the DB for the basic searching keywords. Every search keyword points to its locations in the books in the library.

The problem appears when in the user imports a new book into the library. There are two step.
The first search the keywords that are already into the system against the new book to find if any of them appear in the book and add there location into the system.
The second, which the main stumbling block, is to identify NEW search keywords in the new book.

The idea that i have, which i think is pretty bad and naive, is to break the new book into tokens and then search each token against all the book previously found in the library.

so to sum-up, if any help (tools, libraries or DB options) or idea to solve the second problem or another idea for the whole system, i appreciate. really tried reading and searching a lot of a solution, but in-vain.

Thanks a lot,

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文