django数据库用于大数据比较
我在django数据库中有一个很大的数据集(基于数字的数据,例如200,000行数字),客户端将传入另一组数据,例如100-500条基于数字的数据,那么服务器需要从传入的数据中找出数据库中已有哪些号码。假设号码数据是电话号码。如果我只是进行常规的数字比较,服务器甚至无法处理来自客户端的 2-3 个请求。
请为我的问题建议一些解决方案。
I am having a large data set (number based data, for example, 200,000 rows of numbers) in django database, and the client will pass in another set of data, for example 100-500 pieces of number based data, then the server needs to find out what numbers are already in the database from the data passed in. Let's say the number data are phone numbers. If I just do the regular number comparison, the server cant even handle 2-3 requests from clients.
Please suggest me some solution for my problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这些数字是唯一的吗?它们有钥匙吗?
应该给你一个数据库中的数字列表,你获取该列表,将其与你的集合进行比较并取差值。
大多数 SQL 数据库都会采用这种大小的 SQL 语句,它实际上性能相当好,并且,如果您只对实际存在感兴趣,那么数据库可能会简单地扫描索引,而永远不会命中实际行(当然取决于数据库) )。
所以,尝试一下,看看它是如何工作的。如果你的号码没有被索引,那么你就注定会被困在门口——也要解决这个问题。
附录:
简单来说,如果您的号码是唯一的,您需要确保数据库中该号码的列有索引。如果您想强制它保持唯一,您可以将其设为唯一索引,但这现在是必需的:
如果您没有索引,数据库将不断扫描表的所有行,这不是您想要的想。
是的,111,222,333 是您正在检查的客户端传递的号码。
假设您的数据库中有数字 1,2,3,4,5,6,而客户端列表是 1,5,7。当您执行 SELECT num FROM table WHERE num IN (1,5,7) 时,您将返回 2 行:1 和 5。
因此,您需要将结果编号 1,5 与列表 1 进行比较, 5,7。我对 Python 的了解不够多,更不用说 Django 了,无法为您提供一个很好的例子,但快速浏览一下就会发现它们有“设置”对象。通过这些,您可以执行以下操作:
其中 clientSet 是来自客户端的数字集,dbSet 是来自给定查询的数字集,newSet 是客户端拥有的不在数据库中的数字列表。
我可能误用了集合运算符“差异”,但这就是它的要点。
Are the numbers unique? Are they keyed?
Should give you a list of numbers that are in the db, you take that list, compare it against your set and take the difference.
Most SQL DBs will take a SQL statement that size, it's actually quite performant, and, if you're only interested in actual existence, then the DB will likely simply scan the index and never hit the actual rows (depends on the DB of course).
So, try that and see how it works. If your numbers aren't indexed, then you're doomed at the gate -- fix that too.
Addenda:
Simply, if your number is unique, you need to ensure that you have an index on that number's columns in your database. If you want to enforce that it remains unique, you can make it a unique index, but that's now required:
If you don't have the index, the db will continually scan all of the rows of the table, which is not what you want.
And, yes, the 111,222,333 are the numbers passed from the clients that you're checking for.
Lets say that you had the numbers 1,2,3,4,5,6 in your database, and the list of the client is 1,5,7. When you execute the SELECT num FROM table WHERE num IN (1,5,7) you will get back 2 rows: 1 and 5.
So, you'll need to compare the result number, 1,5 to you list, 1,5,7. I don't know enough Python, much less Django, to give you a good example, but a quick glance shows that they have 'set' objects. With these you could do:
where clientSet is the set of numbers from the client, dbSet is the set of numbers from the query given, and newSet is the list of numbers that the client has that are not in the db.
I may be misusing the set operator 'difference', but that's the gist of it.
如果您想检查查询是否匹配任何行,请在您的代码中使用
count()
查询集;这可以避免评估查询集(从而执行查询)并可能带来性能提升。不要这样做:
相反:
正如 Will 所建议的 - 您还应该确保表中针对要搜索的列有正确的索引。
If you want to check if a query matched any rows, use
count()
in your queryset; this avoids evaluating the queryset (thus executing your query) and could lead to performance gains.Don't do this:
Instead:
As suggested by Will - you should also make sure you have correct indexes in your tables against the columns that you will be searching on.