Sphinx delta 索引忽略主索引
我有一个非常奇怪的问题,由于某种原因我的索引根本无法正常工作。
我在 Sphinx 中构建了一个完全工作的 delta 索引,并具有完整的 cron 作业,以保持一切正常,一切都很好。
然后我用 PHP 进行查询:
class sphinx_searcher{
function __construct(){
$config = array('host'=>'localhost', 'port'=>9312);
$this->sphinx = new SphinxClient();
$this->sphinx->SetServer ( $config['host'], $config['port'] );
$this->sphinx->SetConnectTimeout ( 1 );
}
function query(){
$this->sphinx->SetSortMode(SPH_SORT_RELEVANCE);
$this->sphinx->SetLimits(0, 20); // Testing first page
$this->sphinx->SetRankingMode(SPH_RANK_PROXIMITY_BM25);
$this->sphinx->SetArrayResult ( true );
$res = $this->sphinx->Query("040*", "media media_delta");
if($res)
return $res;
else
return $this->sphinx->GetLastError();
}
}
出于某种原因,它需要一个或另一个索引(到目前为止只有后者)。
当我单独通过媒体查询时,我得到文档 ID 1 和 2,但是当我通过两者查询时,我只得到增量索引中的文档 ID 3。
这是我的数据源配置:
source media
{
type = mysql
sql_query_pre = SET NAMES utf8
sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM documents
sql_query = \
SELECT id, deleted, _id, uid, listing, title, description, tags, author_name, playlist, UNIX_TIMESTAMP(date_uploaded) AS date_uploaded \
FROM documents \
WHERE id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
sql_field_string = tags
sql_field_string = description
sql_field_string = author_name
sql_field_string = title
sql_attr_uint = deleted
sql_attr_string = _id
sql_attr_string = uid
sql_attr_string = listing
sql_attr_uint = playlist
sql_attr_timestamp = date_uploaded
sql_ranged_throttle = 0
sql_query_info = SELECT * FROM media WHERE id=$id
sql_query_killlist = SELECT id FROM documents WHERE deleted = 0
}
source media_delta : media
{
sql_query_pre = SET NAMES utf8
sql_query = \
SELECT id, deleted, _id, uid, listing, title, description, tags, author_name, playlist, UNIX_TIMESTAMP(date_uploaded) AS date_uploaded \
FROM documents \
WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}
这是我的索引配置:
index media
{
source = media
path = /home/sam/sphinx/var/data/media
docinfo = extern
mlock = 0
morphology = stem_en, stem_ru, soundex
min_word_len = 1
charset_type = sbcs
min_infix_len = 2
infix_fields = title, tags
enable_star = 1
expand_keywords = 1
html_strip = 0
index_exact_words = 1
}
index media_delta : media
{
source = media_delta
path = /home/sam/sphinx/var/data/media_delta
}
我真的很困惑我出了什么问题,希望这里有人可以帮助我找出问题所在?
编辑:
不使用所有索引:
array(9) { ["error"]=> string(0) "" ["warning"]=> string(0) "" ["status"]=> int(0) ["fields"]=> array(4) { [0]=> string(5) "title" [1]=> string(11) "description" [2]=> string(4) "tags" [3]=> string(11) "author_name" } ["attrs"]=> array(10) { ["deleted"]=> int(1) ["_id"]=> int(7) ["uid"]=> int(7) ["listing"]=> int(7) ["title"]=> int(7) ["description"]=> int(7) ["tags"]=> int(7) ["author_name"]=> int(7) ["playlist"]=> int(1) ["date_uploaded"]=> int(2) } ["total"]=> string(1) "0" ["total_found"]=> string(1) "0" ["time"]=> string(5) "0.000" ["words"]=> array(1) { ["040*"]=> array(2) { ["docs"]=> string(1) "2" ["hits"]=> string(1) "2" } } }
谢谢,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在研究了几种可能性后,发现了这个问题,
即任何带有“deleted=0”的文档都会消失。即会被“杀死”。
我想在这种情况下,令人困惑的“命中”仍然计入“单词”数组中。尽管后来被杀了。 (words 数组是任何过滤之前的原始数字 - 它直接来自索引 - 因此任何 setFilter (或在本例中为kill-list)都会使其被高估)
所以将其更改为
:)
总是最意想不到的事情!
After working though a few possiblities, spotted the issue,
That says any document with "deleted=0" will disappear. I.e. will be "killed".
I suppose in this context its confusing the 'hit' is still counted in the "words" array. Despite been later killed. (the words array is the raw number before any filtering - its direct from the index - so any setFilter (or in this case the kill-list) will make it an overestimate)
So change it to
:)
Always the most unexpected of things!