Zend_Search_Lucene 尝试分配 3503812093817007931 字节
我有大约 250kb 的静态 HTML,我必须对其进行搜索。我想我会使用 Zend Lucene 来实现这一点。创建索引需要几秒钟,一切都很好,除非我搜索“关于”,结果是这样的:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate
3503812093817007931 bytes) in /var/www/u1938159/data/www/-----
/protected/vendors/Zend/Search/Lucene/Storage/File/Filesystem.php on line 163
其他词似乎也可以。此外,这些文件还包含一些外文文本。所以我必须使用不区分大小写的分析器,
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive()
);
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
在这种情况下,它需要很长时间才能加载,并且根本无法工作,并且会崩溃:
Error occured while file reading.
Lucene 是否有严重问题,或者我自己搞砸了什么?
I have around 250kb of static HTML that I have to search through. I figured I would use Zend Lucene for that. Creating indexes takes a few secs and all is nice and good except if I search for "about" it ends up with this:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate
3503812093817007931 bytes) in /var/www/u1938159/data/www/-----
/protected/vendors/Zend/Search/Lucene/Storage/File/Filesystem.php on line 163
Other words seem to be ok for it. Moreover, the files contain some foreign texts. So I have to use case insensitive analyzer
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive()
);
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
In which case it takes an eternity to load and doesn't work at all crashing with this:
Error occured while file reading.
Does Lucene have serious issues or did I messs something up myself?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Lucene 不存在这些问题,但
Zend_Search_Lucene
有。我不确定您需要搜索多少,以及这是否是一次性的事情,但我会研究 Apache Solr 或 ElasticSearch。你能用一些数据来扩展你的问题吗?
还有一些托管服务,如果您需要更多指导,请告诉我。
Lucene doesn't have these issues, but
Zend_Search_Lucene
has. I'm not sure how much you have to search and if this is a one time thing, but I'd look into Apache Solr or ElasticSearch.Can you extend your question with some data?
There are also a couple hosted services, let me know if you need more pointers.
我不知道 Zend Lucene 的具体问题是什么,但如果您尝试搜索相对较小的 HTML 文件,您可能想尝试仅使用 grep。例如,在命令行上:
cat file.html | grep -i about
查找包含单词 about 的行。或
cat file.html | grep -i -o -P '.{30}About.{30}' 如果您只想在单词 about 的两侧各输入 30 个字符。
I don't know what the specific problem with Zend Lucene is here, but if you're trying to search through a relatively small HTML file, you might want to try just using grep. For example, on the command line:
cat file.html | grep -i about
to find lines containing the word about.or
cat file.html | grep -i -o -P '.{30}About.{30}'
if you want just 30 characters on either side of the word about.