NOT IN (SELECT * FROM X) 相当于 MongoDB
我正在构建一个应用程序,它将定期对 MongoDB 集合进行离线匹配。然而,我需要做的是获取以前未匹配的所有潜在匹配项的列表。
例如,想象以下对象(经过简化以使其更具可读性):
person { _id: 1, name: 'Matt', previousMatched: [2] }
person { _id: 2 , name: 'John', previousMatched: [1] }
person { _id: 3, name: 'Tony', previousMatched: [] }
我想对 Matt (id:1) 运行查询,以查明是否有任何其他人员记录不在 previousMatched 数组中。
现在在 SQL 中我会这样 SELECT ID FROM person WHERE ID <> 1 AND ID NOT IN(从 person_match 中选择 match_id,其中 person_id = 1)
当然,假设我有一个名为 person_match 的查找表,其中包含 person_id 和 match_id 列,其中存储了所有以前的匹配项。我意识到此查询中存在潜在的性能缺陷,但请耐心等待,因为这只是一个示例,我们还会进行其他优化。
在 MongoDB 中,我还不清楚我将如何去做这件事。我当然可以检索人员文档 (id:1),然后检索所有其他人员文档,并检查 ID 是否在 previousMatched 数组客户端中,但我担心这样做会对性能造成不必要的大影响数据从DB服务器传输到客户端。
我知道有一个 $nin 函数,但我有读到它的性能不是很好,并且我再次不确定当我的应用程序扩展时将可能非常长的 ID 数组传递到该字段是多么明智的想法。
我的直觉告诉我答案可能在于 服务器端代码执行,但我不清楚如何实现这一点。
最后,我还担心单个数组字段中实际上可以存储多少项。有实际限制吗?
谢谢, 马特
I am building an application which will do offline matching periodically on a MongoDB collection. What I need to do however is get a list of all potential matches that have not previously matched.
So for example, imagine the following objects (simplified to make this more readable):
person { _id: 1, name: 'Matt', previouslyMatched: [2] }
person { _id: 2, name: 'John', previouslyMatched: [1] }
person { _id: 3, name: 'Tony', previouslyMatched: [] }
I want to run a query on Matt (id:1) to find out if there are any other person records that are not in the previouslyMatched array.
Now in SQL I would so something like SELECT ID FROM person WHERE ID <> 1 AND ID NOT IN (SELECT match_id from person_match where person_id = 1)
assuming of course that I have a lookup table called person_match with the columns person_id and match_id which stores all previous matches. I realise there are potential performance pitfalls in this query, but please bear with me as this is just an example and other optimisations will be done.
In MongoDB, it's just not clear how I would go about doing this though. I could of course retrieve the person document (id:1), then retrieve all other people documents, and check if the ID is in the previouslyMatched array client-side, but I am concerned about the performance hit of this in regards to unnecessarily large data transfers to the client from the DB server.
I know there is a $nin function, but I have read that the performance of this is not great, and once again, I'm not sure how wise an idea it is to pass in a potentially extremely long array of IDs to this field as my app scales.
My gut feeling tells me that the answer might lie in Server side code execution, but I'm not clear how this can be achieved.
Finally, I am also worried about how many items can be stored in a single array field practically. Is there a practical limit?
Thanks,
Matt
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听起来您正尝试在这里加入。它并不完全是联接,但 SQL
IN
语法有效地允许您将一组数据的输出应用到另一组数据。无论哪种情况,MongoDB 都不支持联接,也不支持
WHERE IN (subtable)
语法。您提到的$in
语法相当于WHERE IN (a,b,c)
,具有固定列表。我在这里看到的唯一方法涉及多个查询(2 个或更多)或某种形式的客户端处理。
服务器端代码执行在锁定方面有一些限制。如果您想经常执行此查询,我不相信服务器端代码将有效解决此问题。
MongoDB 文档只能包含 16MB 的数据。这是一个严格的物理限制。
64 位整数占用 8 个字节。所以这是几百万个整数。对于字符串来说显然要少得多。
我个人的经验法则是数千或数万(取决于数据)。如果您需要存储“数十万”,您可能会遇到 16MB 的限制。
It sounds like you're trying to do a join here. It's not exactly a join, but the SQL
IN
syntax effectively allows you to apply the output from one set of data to the other.In either case, MongoDB does not support joins and it does not support the
WHERE IN (subtable)
syntax. The$in
syntax you mention is equivalent toWHERE IN (a,b,c)
, with a fixed list.The only methods I can see here involve multiple queries (2 or more) or some form of client side processing.
Server-side code execution has some limitations around locking. If you want to do this query a lot I'm not confident that server-side code will effectively solve this.
A MongoDB Document can only contain 16MB of data. That's a hard physical limit.
A 64-bit integer takes 8 bytes. So that's a couple million ints. It's obviously much less for strings.
My personal rule of thumb is thousands or tens of thousands (depending on the data). If you need to store "hundreds of thousands", you're probably going to bump into that 16MB limit.
您必须执行多个查询...没有什么比 MongoDB 中的子选择或在应用程序端执行一些显式过滤更好的了。
You have to perform multiple queries...there is nothing like a Sub-Select in MongoDB or perform some explicit filtering on the application side.