对MySQL中表的最大记录数感到困惑

发布于 2024-09-30 05:54:27 字数 748 浏览 0 评论 0原文

我正在使用一个网站分析器,它将用于根据 tomcat 的日志分析我们自己的网站。

现在,我们每天将日志从tomcat推送到数据库(MySQL),现在运行良好。然而我发现了一个潜在的致命问题!

到目前为止,我们将日志推送到数据库中的单个表中,但是日志项很快就会增加,尤其是当我们容纳更多用户时,显然单个表无法保存这么多日志项(而且当使用时也会导致性能低下)从large表执行查询操作)。

并且我们使用hibernate作为持久层,日志表中的每一行都映射到应用程序中LogEntry的java对象。

我想过每个月创建一张新表,但是如何让LogEntry映射到多个表并跨表查询?

另外,每个月的日志数可能不一样,一个极端的例子,日志数(表中的记录)大于db中表的最大容量怎么办?

然后我想到设置一个属性来限制hibernate推送日志到数据库时推送日志的最大数量。如果是这样,我不知道告诉休眠创建一个新表并自动跨表查询。

有什么想法吗?

更新Sandy:

我知道你的意思,也就是说表的最大容量是由操作系统决定的,如果我使用分区,最大容量可能会增加,直到达到最大容量我的磁盘。然而,即使我使用分区,似乎我也不需要关心表的最大容量,但如果表保存太多记录,则会导致性能低下。 (顺便说一句,我们还没有决定删除旧日志。)我想到的另一种方法是创建多个具有相同结构的表,但我使用的是hibernate,所有日志插入和查询都将通过hibernate,并且可以实体(POJO)映射到多个表?

I am working with a web site analyser which will be used to analyse our own site according to the log from tomcat.

Now,we push the log from tomcat to the database (MySQL) everyday, it works well now. However I found a potential and fatal problem !

Until now we push the log to a single table in the database,but the log items will increase rapidly soon especially when we hold more users, obviously a single table can not save so many log items (also it will result in a low performance when do the query operation from the large table).

And we use the hibernate as the persistence layer,each row in the log table is mapped to a java object of LogEntry in the application.

I have thought create a new table each month,but how to make the LogEntry map to more than one tables and query across tables?

Also,the log number of each month maybe not the same, an extreme example, how about the log number (records in the table) is greater than the max capacity of the table in db?

Then I thought set a property to limit the max number of log to be pushed when hibernate push log to db. If so I have no idea to tell the hibernate create a new table and query across table automatically.

Any ideas?

Update to Sandy:

I know your meaning, that's to say the max capability of a table is decided by the OS, and if I use the partitioning, the max capability maybe increase until it up to the max capability of my disk. However even if I use the partition, it seems that I do not need to care about the max capability of the table, but if the table hold too many records, it will result in a low performance. (BTW, we have not decide to delete the old logs yet.) Another way I thought is create more than tables with the same structure,but I am using the hibernate,all of the log inserting and querying will through the hibernate, and can the Entity (POJO) mapped to more than one table?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

把人绕傻吧 2024-10-07 05:54:27

我想过每个月创建一张新表,但是如何让LogEntry映射到多个表并跨表查询?

查看Hibernate Shards(数据库分片是一种水平分区的方法)。虽然这个su项目不是很活跃并且有一些限制(请参阅文档),但它很稳定且可用(Hibernate Shards已 由 Google 的 Max Ross 贡献,他正在内部使用它)。

另外,每个月的日志数可能不一样,一个极端的例子,日志数(表中的记录)大于db中表的最大容量怎么办?

监视您的数据库/表并预测所需的维护。

如果是这样,我不知道告诉休眠创建一个新表并自动查询跨表。

Hibernate 不会自动执行此操作,这将是数据库和分片配置维护的一部分(另请参阅有关 虚拟分片)。

I have thought create a new table each month, but how to make the LogEntry map to more than one tables and query across tables?

Have a look at Hibernate Shards (database sharding is a method of horizontal partitioning). Although this suproject is not very active and has some limitations (refer to the documentation), it's stable and usable (Hibernate Shards has been contributed by Max Ross from Google who is using it internally).

Also,the log number of each month maybe not the same,a extreme example, how about the log number(records in the table) is greater than the max capacity of the table in db?

Monitor your database/tables and anticipate the required maintenance.

If so I have no idea to tell the hibernate create a new table and query accross table automatically.

Hibernate won't do that automatically, this will be part of the maintenance of the database and of the sharding configuration (see also the section about Virtual Shards).

谁许谁一生繁华 2024-10-07 05:54:27

我认为你应该考虑水平分区。

水平分区

这种形式
对段表行进行分区
不同的身体群体
基于行的数据集的形成可以
单独处理(一个
分区)或集体(一对多
分区)。所有列定义为
每组中都可以找到表
分区所以没有实际的表
属性缺失。一个例子
水平分区可能是
包含十年价值的表
历史发票数据为
分为十个不同的
分区,其中每个分区
包含一年的价值
数据.数据。

提高性能 - 扫描期间

MySQL优化器知道的操作
哪些分区包含以下数据
将满足特定查询并且
将仅访问那些必要的
查询执行期间的分区。 对于
例如,一百万行表可能是
分成十个不同的
以范围样式进行分区,以便每个
分区包含 100,000 行
*如果
发出只需要数据的查询
从其中一个分区,以及
表扫描操作是必要的,
只会访问 100,000 行
而不是一百万。
显然,它是
MySQL 采样速度更快
100,000 行比 100 万行多,所以
查询将更快完成。这
索引也能带来同样的好处
可以作为本地访问
分区索引是为
分区表。最后,就是
可以对分区表进行条带化
跨不同的物理驱动器
指定不同的文件
特定的系统/目录路径
分区。这允许物理 I/O
多个时减少争用
分区同时被访问
时间。

查看这篇文章通过分区提高数据库性能

更新

看起来水平分区可以处理大表,但是如果记录数大于表的最大大小怎么办?

实际上,mysql表的最大大小是由操作系统限制决定的。看看这个,然后自己确定。
另一种选择是定期清除旧日志记录(仅当分析不需要它们时)。
创建一个 cron 作业或任何计划任务来执行删除。

I think you should consider horizontal partitioning.

Horizontal Partitioning

this form of
partitioning segments table rows so
that distinct groups of physical
row-based datasets are formed that can
be addressed individually (one
partition) or collectively (one-to-all
partitions). All columns defined to a
table are found in each set of
partitions so no actual table
attributes are missing. An example of
horizontal partitioning might be a
table that contains ten years worth of
historical invoice data being
partitioned into ten distinct
partitions, where each partition
contains a single year's worth of
data.data.

Increased performance - during scan

operations, the MySQL optimizer knows
what partitions contain the data that
will satisfy a particular query and
will access only those necessary
partitions during query execution. For
example, a million row table may be
broken up into ten different
partitions in range style so that each
partition contains 100,000 rows
. *If a
query is issued that only needs data
from one of the partitions, and a
table scan operation is necessary,
only 100,000 rows will be accessed
instead of a million.
Obviously, it is
much quicker for MySQL to sample
100,000 rows than one million so the
query will complete much sooner. The
same benefit is derived should index
access be possible as local
partitioned indexes are created for
partitioned tables. Finally, it is
possible to stripe a partitioned table
across different physical drives by
specifying different file
system/directory paths for specific
partitions. This allows physical I/O
contention to be reduced when multiple
partitions are accessed at the same
time.

Checkout this article Improving Database Performance with Partitioning

Update

It seems that the Horizontal Partitioning can handle the large table, but how about if the number of the record is greater than the max size of the table?

Actually, max size of mysql table is determined by Operating System constraints. Have a look at this, and determine yourself.
Alternative option is to purge old log records periodically, only if they are not required for analysis.
Create a cron job or any scheduled task to do the deleting.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文