如何使用 Sphinx 搜索大型 JOINed 表?

发布于 2024-12-20 08:46:42 字数 2387 浏览 3 评论 0原文

我的数据库中有几个不同的表,我正在尝试使用 Sphinx 进行快速全文搜索。为了便于讨论,我们假设感兴趣的主要记录是装箱单,订单发货时会包含其中之一。如何使用 Sphinx 在所有这些表中执行复杂查询而不完全对数据库进行非规范化?

每张装箱单都列出了订单号、发件人、收件人以及货件中每个箱子的跟踪号码。单独的表包含有关订单项目的信息。附加表包含客户地址信息。因此,订单包含盒子,盒子包含物品。 (本问题底部列出的示例架构)。

我希望能够向 Sphinx 查询以下问题的答案:

  • 居住在名为“Maple”的街道上的有多少人订购了以下商品:描述中的“大”?
  • 哪些订单的包装盒描述或订单商品描述中包含“蓝色”一词?

为了回答这类问题,我需要参考几个表格。由于 Sphinx 没有 JOIN,因此一种选择是对数据库进行非规范化。使用视图进行非规范化,以便每一行代表一个订单项,加上其父框和订单的所有数据,将导致数十亿个非常宽的行。所以我一直为每个表创建一个单独的索引。但这不允许我像 SQL JOIN 那样跨表查询。还有其他解决方案吗?

示例数据库

CREATE TABLE orders (
    id               integer PRIMARY KEY,
    date_ordered     date,
    customer_po      varchar
);
INSERT INTO orders VALUES (1, '2012-12-13', NULL);
INSERT INTO orders VALUES (2, '2012-12-14', 'DF312442');

CREATE TABLE parties (
    id               integer PRIMARY KEY,
    order_id         integer NOT NULL REFERENCES orders(id),
    party_type       varchar,
    company          varchar,
    city             varchar,
    state            char(2)
);
INSERT INTO parties VALUES (1, 1, 'shipper', 'ACME, Inc.', 'New York', 'NY');
INSERT INTO parties VALUES (2, 1, 'recipient', 'Wylie Coyote Corp.', 'Flagstaff', 'AZ');
INSERT INTO parties VALUES (3, 2, 'shipper', 'Cyberdyne', 'Las Vegas', 'NV');
-- Please disregard the fact that this design permits multiple shippers and multiple recipients
-- per order.  This is a vastly simplified version of the system I'm working on.

CREATE TABLE boxes (
    id               integer PRIMARY KEY,
    order_id         integer NOT NULL REFERENCES orders(id),
    tracking_num     varchar NOT NULL,
    description      varchar NOT NULL,
);
INSERT INTO boxes VALUES (1, 1, '1234567890', 'household goods');
INSERT INTO boxes VALUES (2, 1, '0987654321', 'kitchen appliances');
INSERT INTO boxes VALUES (3, 2, 'ABCDE12345', 'audio equipment');

CREATE TABLE box_contents (
    id               integer PRIMARY KEY,
    order_id         integer NOT NULL REFERENCES orders(id),
    box              integer NOT NULL REFERENCES boxes(id),
    qty_units        integer,
    description      varchar
);
INSERT INTO box_contents VALUES (1, 1, 1, 4, 'cookbook');
INSERT INTO box_contents VALUES (2, 1, 1, 2, 'baby bottle');
INSERT INTO box_contents VALUES (3, 1, 2, 1, 'television');
INSERT INTO box_contents VALUES (4, 2, 3, 2, 'lamp');

I have several different tables in my database and I'm trying to use Sphinx to do fast full-text searches. For ease of discussion, let's say the main records of interest are packing slips, one of which is included when an order ships. How do I use Sphinx to execute complex queries across all of these tables without completely denormalizing the database?

Each packing slip lists the order number, shipper, recipient, and the tracking number of each box included with the shipment. A separate table contains information about the order items. An additional table contains the customer address information. So, orders contain boxes and boxes contain items. (Example schema listed at the bottom of this question).

I would like to be able to query Sphinx to answers to questions like:

  • How many people who live on a street named "Maple" ordered an item with "large" in the description?
  • Which orders contain include the word "blue" in either the box description or order items' description?

To answer these types of questions, I need to refer to several tables. Since Sphinx doesn't have JOINs, one option is to denormalize the database. Denormalizing using a view, so that each row represents an order item--plus all of the data of it's parent box and order, would result in billions of very wide rows. So I've been creating a separate index for each table instead. But that doesn't allow me to query across tables as a SQL JOIN would. Is there another solution?

Example database

CREATE TABLE orders (
    id               integer PRIMARY KEY,
    date_ordered     date,
    customer_po      varchar
);
INSERT INTO orders VALUES (1, '2012-12-13', NULL);
INSERT INTO orders VALUES (2, '2012-12-14', 'DF312442');

CREATE TABLE parties (
    id               integer PRIMARY KEY,
    order_id         integer NOT NULL REFERENCES orders(id),
    party_type       varchar,
    company          varchar,
    city             varchar,
    state            char(2)
);
INSERT INTO parties VALUES (1, 1, 'shipper', 'ACME, Inc.', 'New York', 'NY');
INSERT INTO parties VALUES (2, 1, 'recipient', 'Wylie Coyote Corp.', 'Flagstaff', 'AZ');
INSERT INTO parties VALUES (3, 2, 'shipper', 'Cyberdyne', 'Las Vegas', 'NV');
-- Please disregard the fact that this design permits multiple shippers and multiple recipients
-- per order.  This is a vastly simplified version of the system I'm working on.

CREATE TABLE boxes (
    id               integer PRIMARY KEY,
    order_id         integer NOT NULL REFERENCES orders(id),
    tracking_num     varchar NOT NULL,
    description      varchar NOT NULL,
);
INSERT INTO boxes VALUES (1, 1, '1234567890', 'household goods');
INSERT INTO boxes VALUES (2, 1, '0987654321', 'kitchen appliances');
INSERT INTO boxes VALUES (3, 2, 'ABCDE12345', 'audio equipment');

CREATE TABLE box_contents (
    id               integer PRIMARY KEY,
    order_id         integer NOT NULL REFERENCES orders(id),
    box              integer NOT NULL REFERENCES boxes(id),
    qty_units        integer,
    description      varchar
);
INSERT INTO box_contents VALUES (1, 1, 1, 4, 'cookbook');
INSERT INTO box_contents VALUES (2, 1, 1, 2, 'baby bottle');
INSERT INTO box_contents VALUES (3, 1, 2, 1, 'television');
INSERT INTO box_contents VALUES (4, 2, 3, 2, 'lamp');

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

月下凄凉 2024-12-27 08:46:43

您将 JOIN 放入构建索引的 sql_query 中。表保持规范化,但在构建索引时进行非规范化。

它只是一个基本示例,但您的查询将类似于......

sql_query = SELECT o.id,customer_po,UNIX_TIMESTAMP(date_ordered) AS date_ordered,   \
    GROUP_CONCAT(DISTINCT party_type) AS party_type, \
    GROUP_CONCAT(DISTINCT company) AS company, \
    GROUP_CONCAT(DISTINCT city) AS city, \
    GROUP_CONCAT(DISTINCT description) AS description \
  FROM orders o \
    INNER JOIN parties p ON (o.id = p.order_id) \
    INNER JOIN box_contents b ON (o.id = b.order_id) \
  GROUP BY o.id \
  ORDER BY NULL

更新:或者可以使用 sql_joined_field 执行相同的操作,但避免实际的 sql_query 连接。然后 Sphinx 会为您执行加入过程

You put the JOIN in the sql_query that builds the index. The tables remain normalized, but you denormalize when building the index.

Its only a basic example, but your query would be something like.. .

sql_query = SELECT o.id,customer_po,UNIX_TIMESTAMP(date_ordered) AS date_ordered,   \
    GROUP_CONCAT(DISTINCT party_type) AS party_type, \
    GROUP_CONCAT(DISTINCT company) AS company, \
    GROUP_CONCAT(DISTINCT city) AS city, \
    GROUP_CONCAT(DISTINCT description) AS description \
  FROM orders o \
    INNER JOIN parties p ON (o.id = p.order_id) \
    INNER JOIN box_contents b ON (o.id = b.order_id) \
  GROUP BY o.id \
  ORDER BY NULL

Update: alternatively can use sql_joined_field to do the same but avoid actual sql_query joins. Sphinx then does the join process for you

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文