Zend_Search_Lucene 尝试分配 3503812093817007931 字节

发布于 2025-01-08 12:09:45 字数 759 浏览 3 评论 0原文

我有大约 250kb 的静态 HTML,我必须对其进行搜索。我想我会使用 Zend Lucene 来实现这一点。创建索引需要几秒钟,一切都很好,除非我搜索“关于”,结果是这样的:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 
3503812093817007931 bytes) in /var/www/u1938159/data/www/-----
/protected/vendors/Zend/Search/Lucene/Storage/File/Filesystem.php on line 163

其他词似乎也可以。此外,这些文件还包含一些外文文本。所以我必须使用不区分大小写的分析器,

Zend_Search_Lucene_Analysis_Analyzer::setDefault(
    new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive()
);
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');

在这种情况下,它需要很长时间才能加载,并且根本无法工作,并且会崩溃:

Error occured while file reading.

Lucene 是否有严重问题,或者我自己搞砸了什么?

I have around 250kb of static HTML that I have to search through. I figured I would use Zend Lucene for that. Creating indexes takes a few secs and all is nice and good except if I search for "about" it ends up with this:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 
3503812093817007931 bytes) in /var/www/u1938159/data/www/-----
/protected/vendors/Zend/Search/Lucene/Storage/File/Filesystem.php on line 163

Other words seem to be ok for it. Moreover, the files contain some foreign texts. So I have to use case insensitive analyzer

Zend_Search_Lucene_Analysis_Analyzer::setDefault(
    new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive()
);
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');

In which case it takes an eternity to load and doesn't work at all crashing with this:

Error occured while file reading.

Does Lucene have serious issues or did I messs something up myself?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

最单纯的乌龟 2025-01-15 12:09:45

Lucene 不存在这些问题,但 Zend_Search_Lucene 有。我不确定您需要搜索多少,以及这是否是一次性的事情,但我会研究 Apache SolrElasticSearch

你能用一些数据来扩展你的问题吗?

还有一些托管服务,如果您需要更多指导,请告诉我。

Lucene doesn't have these issues, but Zend_Search_Lucene has. I'm not sure how much you have to search and if this is a one time thing, but I'd look into Apache Solr or ElasticSearch.

Can you extend your question with some data?

There are also a couple hosted services, let me know if you need more pointers.

薄荷梦 2025-01-15 12:09:45

我不知道 Zend Lucene 的具体问题是什么,但如果您尝试搜索相对较小的 HTML 文件,您可能想尝试仅使用 grep。例如,在命令行上:

cat file.html | grep -i about 查找包含单词 about 的行。

cat file.html | grep -i -o -P '.{30}About.{30}' 如果您只想在单词 about 的两侧各输入 30 个字符。

I don't know what the specific problem with Zend Lucene is here, but if you're trying to search through a relatively small HTML file, you might want to try just using grep. For example, on the command line:

cat file.html | grep -i about to find lines containing the word about.

or

cat file.html | grep -i -o -P '.{30}About.{30}' if you want just 30 characters on either side of the word about.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文