减少 Sphinx 搜索时间的想法
我正在使用思考型 sphinx gem,我的查询大约需要 45 秒才能完成(1300 万条记录,包含索引的文件夹为 1.1GB)。我假设我的配置不正确(第一次使用 Sphinx 用户)。不管怎样,如果您发现任何不对劲的地方,请告诉我。这是我的配置:
define_index do
indexes :name
indexes :summary
indexes :tag_list
indexes categories.name, :as => :category_name
has "RADIANS(lat)", :as => :latitude, :type => :float
has "RADIANS(lng)", :as => :longitude, :type => :float
set_property :field_weights => {
:name => 8,
:summary => 6,
:category_name => 5,
:tag_list => 3
}
set_property :delta => ThinkingSphinx::Deltas::ResqueDelta
set_property :ignore_chars => %w(' -)
end
这是一个示例查询:
Location.search('Restaurant',
:geo => [0.5837843098436726,-1.9560609568879357],
:latitude_attr => "latitude",
:longitude_attr => "longitude",
:with => {"@geodist" => 0.0..4000.0},
:include => :categories,
:page => 1,
:per_page => 100)
我的日志显示:
Sphinx Query (43066.3ms) restaurant
Sphinx Found 467 results
我将继续挖掘文档并尝试一些东西!
更新:我的development.sphinx.conf
indexer
{
}
searchd
{
listen = 127.0.0.1:9312
log = /project_path/log/searchd.log
query_log = /project_path/log/searchd.query.log
pid_file = /project_path/log/searchd.development.pid
}
source location_core_0
{
type = pgsql
sql_host = localhost
sql_user = user
sql_pass = pass
sql_db = db_name
sql_query_pre = UPDATE "business_entities" SET "delta" = FALSE WHERE "delta" = TRUE
sql_query_pre = SET TIME ZONE 'UTC'
sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = FALSE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type"
sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = FALSE
sql_attr_uint = sphinx_internal_id
sql_attr_uint = sphinx_deleted
sql_attr_uint = class_crc
sql_attr_float = latitude
sql_attr_float = longitude
sql_attr_string = sphinx_internal_class
sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0) / 1)
}
index location_core
{
source = location_core_0
path = /project_path/db/sphinx/development/location_core
morphology = stem_en
charset_type = utf-8
ignore_chars = ', -
enable_star = 1
}
source location_delta_0 : location_core_0
{
type = pgsql
sql_host = localhost
sql_user = user
sql_pass = pass
sql_db = db_name
sql_query_pre =
sql_query_pre = SET TIME ZONE 'UTC'
sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = TRUE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type"
sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = TRUE
sql_attr_uint = sphinx_internal_id
sql_attr_uint = sphinx_deleted
sql_attr_uint = class_crc
sql_attr_float = latitude
sql_attr_float = longitude
sql_attr_string = sphinx_internal_class
sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0) / 1)
}
index location_delta : location_core
{
source = location_delta_0
path = /project_path/db/sphinx/development/location_delta
}
index location
{
type = distributed
local = location_delta
local = location_core
}
I'm using the thinking sphinx gem my queries are taking about 45 seconds to complete (13 Million records, the folder containing the indexes is 1.1GB). I'm assuming I have something configured incorrectly (first time Sphinx user). Anyway let me know if you see anything that looks amiss. Here is my configuration:
define_index do
indexes :name
indexes :summary
indexes :tag_list
indexes categories.name, :as => :category_name
has "RADIANS(lat)", :as => :latitude, :type => :float
has "RADIANS(lng)", :as => :longitude, :type => :float
set_property :field_weights => {
:name => 8,
:summary => 6,
:category_name => 5,
:tag_list => 3
}
set_property :delta => ThinkingSphinx::Deltas::ResqueDelta
set_property :ignore_chars => %w(' -)
end
Here is an example query:
Location.search('Restaurant',
:geo => [0.5837843098436726,-1.9560609568879357],
:latitude_attr => "latitude",
:longitude_attr => "longitude",
:with => {"@geodist" => 0.0..4000.0},
:include => :categories,
:page => 1,
:per_page => 100)
My Log shows:
Sphinx Query (43066.3ms) restaurant
Sphinx Found 467 results
I'll keep digging through the docs and trying stuff!
UPDATE: my development.sphinx.conf
indexer
{
}
searchd
{
listen = 127.0.0.1:9312
log = /project_path/log/searchd.log
query_log = /project_path/log/searchd.query.log
pid_file = /project_path/log/searchd.development.pid
}
source location_core_0
{
type = pgsql
sql_host = localhost
sql_user = user
sql_pass = pass
sql_db = db_name
sql_query_pre = UPDATE "business_entities" SET "delta" = FALSE WHERE "delta" = TRUE
sql_query_pre = SET TIME ZONE 'UTC'
sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = FALSE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type"
sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = FALSE
sql_attr_uint = sphinx_internal_id
sql_attr_uint = sphinx_deleted
sql_attr_uint = class_crc
sql_attr_float = latitude
sql_attr_float = longitude
sql_attr_string = sphinx_internal_class
sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0) / 1)
}
index location_core
{
source = location_core_0
path = /project_path/db/sphinx/development/location_core
morphology = stem_en
charset_type = utf-8
ignore_chars = ', -
enable_star = 1
}
source location_delta_0 : location_core_0
{
type = pgsql
sql_host = localhost
sql_user = user
sql_pass = pass
sql_db = db_name
sql_query_pre =
sql_query_pre = SET TIME ZONE 'UTC'
sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = TRUE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type"
sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = TRUE
sql_attr_uint = sphinx_internal_id
sql_attr_uint = sphinx_deleted
sql_attr_uint = class_crc
sql_attr_float = latitude
sql_attr_float = longitude
sql_attr_string = sphinx_internal_class
sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0) / 1)
}
index location_delta : location_core
{
source = location_delta_0
path = /project_path/db/sphinx/development/location_delta
}
index location
{
type = distributed
local = location_delta
local = location_core
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不知道为什么它的搜索运行速度如此之慢,但我首先会简化查询中的内容,然后一点一点地增加复杂性,看看是否有任何特定的原因。所以,首先:
然后也许:
等等。不要忘记索引定义中的
:field_weights
也会产生影响。话虽如此,我并没有发现你所做的事情有什么特别奇怪的地方,而且 43 秒的搜索时间(或任何接近的时间)是我以前从未遇到过的。
I don't know precisely why it's running so slow for a search, but I'd start by simplifying things in your query, and adding back complexity bit by bit, to see if anything in particular is the cause. So, first:
Then perhaps:
And so on. Don't forget that
:field_weights
in your index definition will have an effect as well.All that said, I'm not spotting anything particularly odd with what you're doing, and 43 seconds for a search (or anything close to that) is something I've not come across before.
我发现了我的问题 - 记录恰好位于 STI 表中,但我只想对位置类型的记录进行索引(位置没有任何后代)。在该表中的 1300 万条记录中,99.99984%(认真地)属于位置类型。 SELECT DISTINCT type FROM Business_entities 查询花费的时间太长(即使使用索引)。棘手的部分是注意到这一点,因为日志报告了持续 84 秒的 Sphinx 查询,但问题实际上是掠夺性 SQL 查询:
所以我在初始化程序中修补了 Thinking Sphinx 以返回我唯一关心的类型:
https://gist.github.com/1603565
I found my problem - The records happen to be in an STI table but I only want to index those of type Location (Location doesn't have any descendants). Of the 13 million records in this table 99.99984% (seriously) of them are of Location type. The SELECT DISTINCT type FROM business_entities query was taking way too long (even with an index). The tricky part was noticing this since the log was reporting the Sphinx Query lasting 84 seconds but it was really the predatory SQL queries that were the problem:
So I monkey patched Thinking Sphinx in an initializer to return the only type I care about:
https://gist.github.com/1603565