150M记录按名称排序
我有一个每天生成约 1.5 亿条记录的数据集,其中包含: member_id
、member_name
、member_name_first_letter
我需要获取按 member_name asc
排序的列表,
如果我尝试使用 MySQL按member_name的第一个字母进行细分并为每个人运行一次查询(大约40个不同的:AZ,数字,特殊字符)我能够在大约40分钟内填写一个表格,而无需先细分这几个小时。对于测试,我仅使用 1300 万条记录。 无论如何,我需要更快地获得它们(目标:1-2 分钟)。
我怎样才能做到这一点? MongoDB 能提供帮助吗?
I have a dataset of around 150 million records that's generated daily it contains:member_id
, member_name
, member_name_first_letter
I need to get the list ordered by member_name asc
I have try with MySQL if i break down by first letter of member_name and run the query for everyone once (around 40 different: A-Z, digits, special chars) I'm able to fill a table in around total 40 min, without breakdown to frist later it tkaes hours. for the test i'm using only 13 million records.
Anyway I need to get them much faster (target: 1-2 min).
How I can do that? Can MongoDB help?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您使用 InnoDB,并将 member_name 设置为表的主键,则它将始终按该列隐式排序,因此您可以不使用 order-by 子句来
SELECT ... FROM Members
。 InnoDB 表实际上是一个聚集索引。回复您的评论:您还可以定义一个复合主键,如果member_name是主键中的第一列,则表将按member_name顺序排列。
If you use InnoDB, and make member_name the primary key of the table, it will always be ordered by that column implicitly, so you can
SELECT ... FROM members
with no order-by clause. InnoDB tables are actually a clustered index.Re your comment: You could also define a compound primary key and the table would be in member_name order if member_name is the first column in the primary key.
Mongo 不是一个选项,请尝试在 member_name 上使用索引,
其他解决方案是对数据进行排序并拆分到多个表或使用外部工具为 member_name 字段创建索引
Mongo is not an option, try to use index on member_name,
other solutions is to order and data and split to several tables or create index for member_name field using external tools
扩展 Bill Karwin 的想法,如果您可以在获取数据时添加一个新列,您可以通过在名称后附加时间戳来生成唯一的主键,并用空格分隔以避免任何奇怪的串联问题(当然,任何人的名字都不可能是数字,但仍然)。
Extending Bill Karwin's idea, if you can add a new column while acquiring the data, you can generate a unique primary key by appending a timestamp to the name, separated by a space to avoid any weird concatenation problems (well its unlikely anyone's name would be numeric, but still).