使用大型 IN() 子句或派生表联接优化 MySQL 查询
假设我需要查询一家公司的员工。我有一个表“交易”,其中包含每笔交易的数据。
CREATE TABLE `transactions` (
`transactionID` int(11) unsigned NOT NULL,
`orderID` int(11) unsigned NOT NULL,
`customerID` int(11) unsigned NOT NULL,
`employeeID` int(11) unsigned NOT NULL,
`corporationID` int(11) unsigned NOT NULL,
PRIMARY KEY (`transactionID`),
KEY `orderID` (`orderID`),
KEY `customerID` (`customerID`),
KEY `employeeID` (`employeeID`),
KEY `corporationID` (`corporationID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
查询此表中的员工相当简单,但有一个问题:每个员工只注册一次交易记录,因此一个公司的每个订单可能有多个记录。
例如,如果公司 1 的员工 A 和 B 都参与向公司 2 销售吸尘器,则“交易”表中将有两条记录:每位员工一个,公司 1 两个。不过,这不能影响结果。公司 1 的一项交易,无论涉及多少名员工,都必须被视为一项交易。
很容易,我想。我将只在派生表上进行联接,如下所示:
SELECT corporationID FROM transactions JOIN (SELECT DISTINCT orderID FROM transactions WHERE corporationID = 1) AS foo USING (orderID)
查询返回与公司 1 进行交易的公司列表。这正是我所需要的,但速度非常慢,因为 MySQL 无法使用corporationID索引来确定派生表。据我所知,MySQL 中的所有子查询/派生表都是这种情况。
我还尝试单独查询 orderID 的集合,并使用一个大得离谱的 IN() 子句(通常超过 100 000 个 ID),但事实证明 MySQL 在使用大得离谱的 IN() 子句上的索引时也存在问题结果查询时间并没有改善。
还有其他可用的选择吗,还是我已经用尽了它们?
Let's say I need to query the associates of a corporation. I have a table, "transactions", which contains data on every transaction made.
CREATE TABLE `transactions` (
`transactionID` int(11) unsigned NOT NULL,
`orderID` int(11) unsigned NOT NULL,
`customerID` int(11) unsigned NOT NULL,
`employeeID` int(11) unsigned NOT NULL,
`corporationID` int(11) unsigned NOT NULL,
PRIMARY KEY (`transactionID`),
KEY `orderID` (`orderID`),
KEY `customerID` (`customerID`),
KEY `employeeID` (`employeeID`),
KEY `corporationID` (`corporationID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
It's fairly straightforward to query this table for associates, but there's a twist: A transaction record is registered once per employee, and so there may be multiple records for one corporation per order.
For example, if employees A and B from corporation 1 were both involved in selling a vacuum cleaner to corporation 2, there would be two records in the "transactions" table; one for each employee, and both for corporation 1. This must not affect the results, though. A trade from corporation 1, regardless of how many of its employees were involved, must be treated as one.
Easy, I thought. I'll just make a join on a derived table, like so:
SELECT corporationID FROM transactions JOIN (SELECT DISTINCT orderID FROM transactions WHERE corporationID = 1) AS foo USING (orderID)
The query returns a list of corporations who have been involved in trades with corporation 1. That's exactly what I need, but it's very slow because MySQL can't use the corporationID index to determine the derived table. I understand that this is the case for all subqueries/derived tables in MySQL.
I've also tried to query a collection of orderIDs separately and use a ridiculously large IN() clause (typhically 100 000+ IDs), but as it turns out MySQL has issues using indices on ridiculously large IN() clauses as well and as a result the query time does not improve.
Are there any other options available, or have I exhausted them both?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我理解你的要求,你可以尝试这个。
或者这个:
If I understand your requirement, you could try this.
or this:
你的数据对我来说没有意义,我认为你正在使用corporationID,你的意思是客户ID在其中的某个点,因为你的查询将交易表连接到基于orderID的corporateID = 1的交易表以获取corporateID...那么就是 1,对吗?
您能具体说明一下 customerID、employeeID 和 CorporateID 的含义吗?我如何知道员工 A 和 B 来自公司 1 - 在这种情况下,公司 1 是公司 ID,公司 2 是客户,因此存储在客户 ID 中?
如果是这种情况,您只需要执行分组依据:(
或者如果您想要每个订单一行而不是每个客户一行,则选择并按 orderID 分组。)
通过使用分组依据,您忽略了以下事实:除员工 ID 外,多条重复记录。
相反,返回所有已出售给公司 2 的公司。
Your data makes no sense to me, I think you are using corporationID where you mean customer ID at some point in there, as your query joins the transaction table to the transaction table for corporationID=1 based on orderID to get the corporationIDs...which would then be 1, right?
Can you please specify what the customerID, employeeID, and corporationIDs mean? How do I know employees A and B are from corporation 1 - in that case, is corporation 1 the corporationID, and corporation 2 is the customer, and so stored in the customerID?
If that is the case, you just need to do a group by:
(Or select and group by orderID if you want one row per order instead of one row per customer.)
By using the group by, you ignore the fact that there are multiple records that are duplicate except for the employeeID.
Conversely, to returns all corporations that have sold to corporation 2.