Mysql 一个大数据库或者小很多表
我们的应用程序将为每个用户创建动态表单。
我们正在考虑两种方法
a) 我们将所有用户表单数据作为键值对存储在单个表中,然后使用 数据透视表技术 获取数据并进行过滤,这很有效,但我认为这可能是非常巨大的资源消耗者
b) 为用户创建的每个表单创建/设计适合其设计的表格,在这里我们不必进行数据透视,我们得到了所有mysql 的好处,而且我们不会有超过几个表的连接
我们主要关心的是可扩展性,在场景“a”中,如果表变得非常巨大(如果我考虑甚至有 1000 个用户注册,这显然会发生),而计划“b”如果明天我们的表随着我们的流量增长,我们可能必须使用 mysql 集群,但集群有表限制,正如我在这里读到的 http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-limitations-database-objects.html
< )
一个新的想法让我们震惊,为什么不创建一个适合单个表单元素(文本字段、文本区域等)的表(作为键、值),使用连接(连接数 = 字段数 )根据用户创建的表单定义,我们可以创建水平数据,而无需改变大家对此想法的看法。
2011 年 4 月 9 日更新
我们测试了针对连接的数据透视
场景 - 我们有一个包含 7 个字段的表单,数据存储为(单个表单插入的示例数据)
------------------------------ | Key | Value | ------------------------------ | Key1 | Value1 | | Key2 | Value2 | | Key3 | Value3 | | Key4 | Value4 | | Key5 | Value5 | | Key6 | Value6 | | Key7 | Value7 | ------------------------------
对于数据透视查询,数据透视需要 0.92 秒70000 行插入 10000 个表单插入 对于连接,需要 17.63 秒(woof...)来显示这 10000 个表单插入
我的表
CREATE TABLE IF NOT EXISTS `vet` ( `id` int(11) NOT NULL, `form_id` int(11) NOT NULL, `key` varchar(255) NOT NULL, `value` varchar(255) NOT NULL, KEY `id` (`id`), KEY `form_id` (`form_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Pivot 查询
SELECT id, GROUP_CONCAT(if(`key` = 'k1', value, NULL)) as 'key1', GROUP_CONCAT(if(`key` = 'k2', value, NULL)) as 'key2', GROUP_CONCAT(if(`key` = 'k3', value, NULL)) as 'key3', GROUP_CONCAT(if(`key` = 'k4', value, NULL)) as 'key4', GROUP_CONCAT(if(`key` = 'k5', value, NULL)) as 'key5', GROUP_CONCAT(if(`key` = 'k6', value, NULL)) as 'key6', GROUP_CONCAT(if(`key` = 'k7', value, NULL)) as 'key7' FROM vet WHERE form_id = 2 GROUP BY id
JOIN 查询
SELECT v.id, v1.value as key1, v2.value as key2, v3.value as key3, v4.value as key4, v5.value as key5, v6.value as key6, v7.value as key7 FROM vet v LEFT JOIN vet v1 ON v1.id = v.id AND v1.`key` = "k1" LEFT JOIN vet v2 ON v2.id = v.id AND v2.`key` = "k2" LEFT JOIN vet v3 ON v3.id = v.id AND v3.`key` = "k3" LEFT JOIN vet v4 ON v4.id = v.id AND v4.`key` = "k4" LEFT JOIN vet v5 ON v5.id = v.id AND v5.`key` = "k5" LEFT JOIN vet v6 ON v6.id = v.id AND v6.`key` = "k6" LEFT JOIN vet v7 ON v7.id = v.id AND v7.`key` = "k7" WHERE v.form_id = 2 GROUP BY v.id
我认为我们现在将坚持旋转,如果巨大的差异是由于它们造成的,也请参阅我的查询。
请建议哪种解决方案更好,或者是否还有其他更好的解决方案
Sudesh
Our app would be creating dynamics forms for each users.
We are considering two approaches
a) We store all users forms data in single table as key value pair, then using pivot table technique fetch data and do filtering which works but i think might be really huge resource guzzler
b) Create / Design table for each form user creates that fits to its design, here we dont have to pivot and we get all benefits of mysql and also we wont have joins for more than couple of tables
Our major concern is scalability, in scenario 'a' if table grows really huge (that will obviously happen if i consider even 1000 users register), with plan 'b' if tomorrow our tables grow with it our traffic we might have to use mysql cluster but cluster have table limitations as i read here http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-limitations-database-objects.html
Update
A new idea struck us, why not create a single table (as key, value) that fits single form element (Text Field, Text area, etc), using join (No of joins = No of fields) based on form definition created by user we can create a horizontal data, without pivoting what you all think about this idea.
Update 9-April-2011
We tested pivoting against joins
Scenario - We have a form with 7 fields, the data is stored as (sample data for single form insert)
------------------------------ | Key | Value | ------------------------------ | Key1 | Value1 | | Key2 | Value2 | | Key3 | Value3 | | Key4 | Value4 | | Key5 | Value5 | | Key6 | Value6 | | Key7 | Value7 | ------------------------------
For pivot query it took 0.92 seconds to pivot 70000 rows into 10000 form inserts
For join it took 17.63 seconds (woof.....) to show these 10000 form inserts
My table
CREATE TABLE IF NOT EXISTS `vet` ( `id` int(11) NOT NULL, `form_id` int(11) NOT NULL, `key` varchar(255) NOT NULL, `value` varchar(255) NOT NULL, KEY `id` (`id`), KEY `form_id` (`form_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Pivot Query
SELECT id, GROUP_CONCAT(if(`key` = 'k1', value, NULL)) as 'key1', GROUP_CONCAT(if(`key` = 'k2', value, NULL)) as 'key2', GROUP_CONCAT(if(`key` = 'k3', value, NULL)) as 'key3', GROUP_CONCAT(if(`key` = 'k4', value, NULL)) as 'key4', GROUP_CONCAT(if(`key` = 'k5', value, NULL)) as 'key5', GROUP_CONCAT(if(`key` = 'k6', value, NULL)) as 'key6', GROUP_CONCAT(if(`key` = 'k7', value, NULL)) as 'key7' FROM vet WHERE form_id = 2 GROUP BY id
JOIN Query
SELECT v.id, v1.value as key1, v2.value as key2, v3.value as key3, v4.value as key4, v5.value as key5, v6.value as key6, v7.value as key7 FROM vet v LEFT JOIN vet v1 ON v1.id = v.id AND v1.`key` = "k1" LEFT JOIN vet v2 ON v2.id = v.id AND v2.`key` = "k2" LEFT JOIN vet v3 ON v3.id = v.id AND v3.`key` = "k3" LEFT JOIN vet v4 ON v4.id = v.id AND v4.`key` = "k4" LEFT JOIN vet v5 ON v5.id = v.id AND v5.`key` = "k5" LEFT JOIN vet v6 ON v6.id = v.id AND v6.`key` = "k6" LEFT JOIN vet v7 ON v7.id = v.id AND v7.`key` = "k7" WHERE v.form_id = 2 GROUP BY v.id
I think we will stick to pivoting for now, please also see my queries if the huge difference is due to them.
Please suggest which solution is better or if there is another better solution
Sudesh
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我建议使用一个数据库来设计它。然后,您始终可以在需要时使用分片进行水平扩展。
I suggest design it using one single database. Then you always can use sharding for horizontal scaling when needed.
您不想为用户创建的每个表单创建一个新表。
然而,您也不希望将所有数据存储在一张大表中。
最好将东西存储在几张表中,这样就不会重复东西,就像这样。
首先是一个包含用户数据的表:
Table user
id:整数自增主键
用户名:varchar(255)
其他用户数据
然后是一个将用户链接到表单数据的表(但实际上并不保存任何表单数据
表用户表单
id:整数自增主键
user_id: 整数索引
...固定数据的其他字段在每个表单中始终仅出现一次。
然后是包含表单数据的表
<代码>
表格属性
id:整数自增主键
UserForm_id:整数索引
属性名称:varchar(255)
PropertyValue: varchar(255)
现在,当您想要从表单访问数据时,您可以使用类似
这样的查询,这样您就不会存储冗余数据。
请注意,如果您不需要唯一访问单个属性,则可以将唯一 ID 删除到属性表中。
不用担心 MySQL 集群,只需从带有 MySAM 表的普通 MySQL (5.x) 开始,如果不够快,那么您可以开始考虑技巧,但低于 100 万条记录我不会打扰。
You don't want to create a new table for each form that a user creates.
However you also do not want to store all data in one big table.
Best to store stuff in a few tables so that you do not repeat stuff, something like this.
First a table with user data:
Table user
id: integer autoincrement primary key
username: varchar(255)
other user data
Then a table that links the user to the form data (but does not actually hold any form data
Table UserForm
id: integer autoincrement primary key
user_id: integer index
... other fields for fixed data that always occurs only once in each form.
Then the table with the Form data
Table FormProperties
id: integer autoincrement primary key
UserForm_id: integer index
PropertyName: varchar(255)
PropertyValue: varchar(255)
Now when you want to access data from a form you use a query like
That way you will not store redundant data.
Note that if you never need to uniquely access a individual property you can drop the unique id on the properties table.
And don't worry about MySQL cluster, just start with vanilla MySQL (5.x) with MySAM tables and if that is not fast enough then you can start thinking about tricks, but below one million record I would not bother.
拥有许多小型数据库的问题是众所周知的:
- 当索引和表很小时,磁盘和内存的使用效率低下
- 当 #databases > 时 mysql 服务器扩展性很差100 和#tables>10000
- 行政/管理噩梦
但有一些优点:
- 客户隔离提供更好的安全性
- 转储/加载/锁定/更改单个客户的表而不影响其他客户
- 更容易管理多个版本和自定义模式添加
拥有一个带有分片的大型数据库,听起来不错,但也有缺点:
- 所有客户都被锁定在相同的架构中,必须同时进行升级
- 个人客户的安全性较差
- 需要修改代码来管理分片
- 分片一旦建立就很难改变
- “坏邻居”效应:一个过度活跃的客户会影响分片中邻居的 SLA
主要优点是它具有良好的可扩展性。
全面披露:我在 ParElastic 工作。但我真诚地相信,我们的虚拟多租户数据库是解决此问题的唯一真正完整的解决方案:
- 每个客户都会看到自己数据库的完整、独立的实例
- 数据库的独立管理命令
- 可扩展的分片在用户之间和用户内部分配数据
- 每个客户模式管理支持滚动升级和自定义扩展
- 使用普通 SQL 进行安全的跨客户查询
如果您有兴趣,可以在 https://aws.amazon.com/marketplace(搜索“parelastic”)。或者查看 http://parelastic.com。
The problems with having many small databases is well-understood:
- inefficient use of disk and memory when indexes and tables are small
- mysql server scales poorly when #databases > 100 and #tables>10000
- admin/management nightmare
But some advantages:
- isolation of customer provides better security
- dump/load/lock/alter an individual customer's tables without impacting others
- easier to manage multiple versions and custom schema additions
Having one large database, with sharding, ssounds good but there are disadvantages:
- all customers are locked into the same schema, upgrades must happen for all at once
- poor security for the individual customer
- requires modification of code to manage shards
- very difficult to change shards once they're established
- "bad neighbor" effect: one over-active customer impacts the SLA of neighbors in the shard
The main advantage is that it scales well.
Full disclosure: I work at ParElastic. But I sincerely believe we have the only truly complete solution to this problem with our virtual multi-tenant database:
- each customer sees a complete, isolated instance of their own database
- independent admin commands for their database
- scalable sharding under-the-cover to distribute data across and within a user
- per-customer schema management supports rolling upgrades and custom extensions
- secure, cross-customer queries using normal SQL
If you're interested, you can download a pre-packaged ParElastic environment for free in the https://aws.amazon.com/marketplace (search "parelastic"). Or check out http://parelastic.com.