oracle中大表的设计模式
我有一个用户表、事务表和 user_transaction 表。用户数量约为 75,000 个,应用程序中可能的唯一交易数量约为(交易表中的行数在 1 到 3 百万之间)。 user_transaction 是上面两个表的连接,存储用户在什么日期和时间执行了哪些事务。所以这个表对于 1 年的数据来说将是巨大的(我们将从表中清除活动数据并在 1 年后将其存档)年)。我们预计计数约为 50-6000 万行。这将是今年年底的最终数据大小。
我想说平均大小约为 3000 万条记录。 此外,夜间导入作业会更新所有这些表,这是在这些表中完成插入时的唯一部分,我们仅从我们的应用程序访问数据(使用选择查询)。
设计联接表的最佳方法是什么,以便更快地从庞大的事务表中进行检索?我们在表中添加了许多字段来对其进行非规范化并减少联接,并且几乎所有数据都仅在 transaction 和 user_transaction 表中可用。
如果我们想对表进行分区,我们该如何进行分区呢?该应用程序用于最频繁地查询较新的数据。
我们正在考虑按月对事务表进行分区,这样我们每个月就有 1 个表。
我们考虑的其他选项是每周 1 天每个表有 7 个表,但这大大增加了查询的复杂性,考虑到我们正在使用休眠。
我们如何设计大约6000万的巨大表格
按要求提供更多详细信息:
我必须根据模式制作一个图表,同时这里还有一些更多信息:关系并不复杂,大约有 4 个表:用户、事务、users_transaction、资源表。 user_transaction 是包含所有其他三个表 id 的连接表,这将是一个巨大的表,因为它在每个 id 上都有单独的条目,并且基于时间戳也有单独的条目。
目前该应用程序的用户数量非常少,大约小于 20。 (但将来可能会增长)。
表的主要消费者是:
1) 每周自我审核报告以电子邮件形式发送,其中包含这些表格中过去一周的用户活动详细信息。这些将(最终)发送给大约 75,000 个用户,生成报告并为 1 个用户发送电子邮件目前需要大约 1 分钟(试点阶段测试)。我们需要认真提高这方面的性能,例如每封电子邮件的时间少于 5 秒。这是一个在夜间运行的后端作业(最多应该消耗 3-4 小时)
2) 包含图表的仪表板,显示这些表格中交易的汇总视图。这些查询根据日期范围内的各个字段运行和汇总数据。 因此,如果所有其他字段都相同(用户 ID、资源 ID、资源事件 ID、位置),我们计划总结存储每天计数(不包括时间)的 user_transactions 表。
并根据月份对这些汇总表进行分区。 (每月一份)
需要注意的事项:该解决方案应该适用于所有数据库(MySQL、DB2 等),而不仅仅是 Oracle。
问候, 普里扬克·德瓦卡
I have a user table, transaction table and user_transaction table. the number of users is around 75,000 number of unique transactions possible in the application is about (rows in transaction table is between 1 and 3 million).
user_transaction is the join of the above two table storing which transaction users did at what dateand time.. SO this table is going to be huge for 1 year of data (we are going to purge the active data from the table and archive it after 1 year). We are expecting the count to be around 50- 60 million rows. This will be final data size at the end of the year.
I would say average size is about 30 million records.
Also a nightly import job updates all these tables and thats the only part when inserts are done in these tables, we only access data (use select queries) from our app.
What would be the best way to design the join table to make retrieval from the huge transaction table faster?We have added many fields in the table to denaormalize it and reduce joins, and have almost all data available only in the transaction and user_transaction table.
If we want to partition the table how do we go about partitioning? The application is used to query the more recent data most frequently.
We are thinking in terms of partitioning month wise the transaction table so we would have 1 table for each month..
Other option we were thinking of is have 7 tables each for 1 day of the week, but this is increasing the complexity of queries greatly, considering we are using hibernate.
How do we design the huge table of around 60 milion
More Details as requested:
I will have to make a diagram from the schema,here is some more info in the mean time: the relationships are not complex, its about 4 tables: users, transactions, users_transaction, resource table. user_transaction is the join table containing all other three tables id and that's the one which is going to be huge, since it will have separate entries on each of these id and also separate entries based on timestamp.
The number of users of the application right now is very less like <20. (but may grow in the future).
The main consumers of the tables are:
1) weekly self audit reports sent out as emails containing user activity details for past week from these tables. these are going to be sent (eventually) to like 75,000 users and generating report and sending out the email for 1 user currently takes around 1 minute (testing in pilot phase). we need to seriously improve performance on this to like less than 5 seconds per email. This is a back-end job which runs at night (should consume at most 3-4 hours)
2) Dashboards containing charts which show summarized view of the transaction from these tables. These queries run and summarize data based on various fields in a date range.
Hence we are planning to summarize the user_transactions table storing counts for each day (not including time) if all other fields are same (users id, resource id, resource_eventid, location).
And partition these summary tables based on month. (one for each month)
Thing to note: the solution should be good for all databases (MySQL, DB2 etc..)and not just oracle.
Regards,
Priyank Devurkar
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好的,首先要说的是第一件事。
审计...
Oracle 具有非常强大的审计功能,因此在尝试推出自己的审计功能之前应该先研究一下这些功能。
如果您确实想推出自己的产品,那么用户和交易之间存在一对多关系。现在,我在这里非常宽松地使用术语事务,因为您似乎想要做的是记录用户 X 执行某些操作来修改表或表中的行。
最简单的做法是使用任何前端代码来插入到该表中,例如:
插入审计表(用户ID,操作)值('fred','更新表付款并将某些列旧值更改为新值');
我将创建一个 userID 和时间戳的复合索引,以便可以将这两列作为单个实体进行查询。该表看起来像这样:
复合索引的作用是创建两个键的几乎散列,并且当您按这两列查询时它们非常快。
这个单一表的删除和插入速度将非常快。您可以通过以下方式使其更快:
没有返回用户表的 FK(这确实毫无意义)。
如果数据库计算机上有足够的内存,请将其设置为保留缓存缓冲区,但仅如果您有足够的内存,否则您会将服务器放入容器中。
如果您选择分区,请仅在仔细阅读并理解 Oracle 上的分区后选择您的方法。
定义表空间时,请确保它是BIG TABLE,因为这将确保您不会超出大小限制(至少在 Linux 上)一个文件。
至于您处理的其余数据库将有各自的调整问题,因此每个都是一组一次性条件,适合一个数据库引擎,但不适合另一个。
始终记住 Unix 座右铭,做一件事,并做好。
Ok, so first things first.
Auditing...
Oracle has very powerful auditing features, so should look into those before you try and roll your own.
If you really want to roll your own then there is a 1-many relationship between users and transactions. Now I use the term transactions very loosely here, since what you seem to want to do is record when user X does something to modify a table, or row in a table.
The simplest is thing to do is have whatever front end code you are using to the inserts into that table, eg:
insert into auditing table ( userID, Operation) values ( 'fred', 'udpated table payments and changed some column old value to new value' );
I would make a compound index of userID and a timestamp, so that if can be queried on those two columns as a single entity. The table would look something like:
The effect of the compound index is to create almost a hash of the two keys and they are very very fast when you query by those two columns.
This single table will be very very fast for deletes and inserts. You can make it even faster by:
Not having an FK back to the users table ( it really is pointless ).
IF you have enough ram on the database machine, set it hold onto the cache buffers, but ONLY if you have enough ram or you will put the server into the tank.
If you choose to partition choose you method only after carefully reading and understanding partitioning on Oracle.
Make sure your table space is BIG TABLE when you define it since that will make sure you don't blow through the size limit ( on linux at least ) of a single file.
As to the rest of the databases you deal with will have their individual tuning issues, so each of these are a set of one off conditions that will fit one DB engine but not another.
Remember the unix motto at all times, do one thing and do it well.