只有Mysql OR mysql+sqlite OR mysql+自己的解决方案
目前我正在构建相当大的网络系统,我需要强大的 SQL 数据库解决方案。我选择 Mysql 而不是 Postgres,因为有些任务需要只读(MyISAM 引擎),而另一些则需要大量写入(InnoDB)。
我对这个只读功能有疑问。它必须非常快。用户必须在不到一秒的时间内得到答案。 假设我们有一个索引良好的表,名为“object
”,行数不超过 1000 万行,另一个名为“element
”,行数大约为 1.5 亿行。 我们还有名为“element_object
”的表,其中包含将表“element
”中的对象与表“object
”(数亿行)连接起来的信息
因此,我们将对表“element
”和“element_object
”进行分区,并拥有 8192 个表“element_hash_n{0..8191}a”和24576 个表“
element_object_hash_n{0..8191}_m{0..2}
”。
用户问题的答案将是一个两步搜索:
- 从表“element_hash_n”中查找元素的 id
- 在表“object”上执行主 sql 选择并与表“element_object..hash_n_m”连接以过滤找到的结果(从第一步开始) )ID
我想知道第一步: 更好的是:
- 在mysql中存储(所有)超过32k的表
- 创建一个sqlite数据库并存储8192个表作为第一步
- 创建8192个不同的sqlite文件(数据库)
- 在文件系统中创建8192个文件并制作自己的二进制解决方案来查找ID。
我为我的英语感到抱歉。这不是我的母语。
Currently I am building quite big web system and I need strong SQL database solution. I chose Mysql over Postgres because some of tasks needs to be read-only (MyISAM engine) and other are massive-writes (InnoDB).
I have a question about this read-only feature. It has to be extremely fast. User must get answer a lot less than one second.
Let say we have one well-indexed table named "object
" with not more than 10 millions of rows and another one named "element
" with around 150 millions of rows.
We also have table named "element_object
" containing information connecting objects from table "element
" with table "object
" (hundreds of millions of rows)
So we're going to do partitioning on tables "element
" and "element_object
" and have 8192 tables "element_hash_n{0..8191}a
" and 24576 of tables "element_object_hash_n{0..8191}_m{0..2}
".
An Answer on user's question would be a 2-step searching:
- Find id of element from tables "element_hash_n"
- Do main sql select on table "object" and join with table "element_object..hash_n_m" to filter result with found (from first step) ID
I wonder about first step:
What would be better:
- store (all) over 32k tables in mysql
- create one sqlite database and store there 8192 tables for first step purpose
- create 8192 different sqlite files (databases)
- create 8192 files in file system and make own binary solution to find ID.
I'm sorry for my English. Its not my native language.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为你给很多分区让路了。如果您有超过 32000 个分区,那么您将面临巨大的管理开销。给定名称 element_hash_* ,它看起来好像您想要对元素进行哈希并以这种方式对其进行分区。但是哈希将使您(最有可能)在所有分区上均匀分布数据。我看不出这应该如何提高性能。如果您的数据是通过所有这些分区访问的,那么您不会通过内存大小的分区获得任何东西 - 您将需要从另一个分区加载每个查询数据。
我们在事务系统上使用分区,其中超过 90% 的查询使用当天作为条件。在这种情况下,基于天的分区效果非常好。但我们只有 8 个分区,然后将数据移至另一个数据库以进行长期存储。
我的建议:尝试快速找出需要哪些数据,并尝试将其分组在一起。并且您需要进行自己的性能测试。如果快速交付数据如此重要,就应该有足够的管理支持来构建一个像样的测试环境。
也许您的测试结果会表明您根本无法使用关系数据库系统足够快地交付数据。如果是这样,您应该考虑 NoSQL(如 Not Only SQL)解决方案。
您使用什么技术构建网络系统?您也应该测试这部分。如果您在性能不佳的 Web 应用程序中浪费了时间,那么超快的数据库不会对您有太大帮助。
I think you make way to many partitions. If you have more than 32000 partitions you have a tremendous overhead of management. Given the name element_hash_* it seams as if you want to make a hash of your element and partition it this way. But a hash will give you a (most likely) even distribution of the data over all partitions. I can't see how this should improve performance. If your data is accessed over all those partitions you don't gain anything by having partitions in size of your memory - you will need to load for every query data from another partition.
We used partitions on a transaction systems where more than 90% of the queries used the current day as criteria. In such a case the partition based on days worked very well. But we only had 8 partitions and moved the data then off to another database for long time storage.
My advice: Try to find out what data will be needed that fast and try to group it together. And you will need to make your own performance tests. If it is so important to deliver data that fast there should be enough management support to build a decent test environment.
Maybe your test result will show that you simply can't deliver the data fast enough with a relational database system. If so you should look at NoSQL (as in Not only SQL) solutions.
In what technology do you build your web system? You should test this part as well. A super fast database will not help you much if you lose the time in a poorly performing web application.