我应该如何加载 .txt 文件的内容以在网站上提供服务?
我正在尝试为我的网站上作为搜索结果返回的每个文档构建摘录。我正在 Linux CentOS 上使用 Sphinx 搜索引擎和 Apache Web 服务器。我想使用的 Sphinx API 中的函数称为 构建摘录。此函数要求您传递一个字符串数组,其中每个字符串包含文档内容。
我想知道当我在网络上提供结果时实时检索文档内容的最佳实践是什么。目前,这些文档位于我系统上的文本文件中,分布在多个驱动器上。它们大约有 100MM,占用几 TB 的空间。
对我来说,调用诸如 file_get_contents()
之类的东西很容易,但这感觉像是错误的方法。我的数据库已经很大(100GB+),我并不想将文档内容与已经存在的文档属性一起放入其中。然而,也许这是最好的方法。
建议?
I am trying to build excerpts for each document returned as a search results on my website. I am using the Sphinx search engine and the Apache web server on Linux CentOS. The function within the Sphinx API that I'd like to use is called BuildExcerpts. This function requires you to pass an array of strings where each string contains the documents contents.
I'm wondering what the best practice is for retrieving the document contents in real time as I serve the results on the web. Currently, these documents are in text files on my system, spread across multiple drives. There are roughly 100MM of them and they take up a few terabytes of space.
It's easy for me to call something like file_get_contents()
, but that feels like the wrong way to do this. My databases are already gigantic ( 100GB+ ) and I don't particularly want to throw the document contents in there along with the document attributes that already exist. Perhaps this is the best way to do this, however.
Suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
那么源需要从某个地方获取。如果您不想在数据库中复制它,那么您将需要从文件系统中获取它。 (使用 file_get_contets 或类似的)
虽然 BuildExerpts 函数确实为您提供了一个额外的选项“load_files”
...然后 sphinx 将为您从文件名中读取数据。
从文件中读取它时遇到什么问题?是不是太慢了?如果是这样,也许在前面使用一些缓存 - 也许使用内存缓存。
Well the source needs to be fetched from somewhere. If you dont want to duplicate it in your database, then you will need to fetch it from the filesystem. (using file_get_contets or similar)
Although the BuildExerpts function does give you one extra option "load_files"
... then sphinx will read the data from the filename for you.
What problem are you experiencing with reading it from files? Is it too slow? If so maybe use some caching in front - using memcache maybe.