处理/压缩多个表中的大型数据集

发布于 2024-12-04 18:35:38 字数 761 浏览 3 评论 0原文

在我们公司的一个应用程序中,我们从我们的服务器收集统计数据(负载、磁盘使用情况等)。由于存在大量数据,并且我们并不总是需要所有数据,因此我们有一个“压缩”例程,它获取原始数据并计算最小值。多个数据点的最大值和平均值,将这些新值存储在同一个表中,并在几周后删除旧值。

现在我的任务是重写这个压缩例程,新例程必须将一年内所有未压缩的数据保留在一个表中,并将“压缩”数据保留在另一个表中。我现在主要关心的是如何处理不断写入数据库的数据以及是否使用“事务表”(我自己的术语,因为我无法想出更好的一个,我不是在谈论提交/回滚事务功能)。

截至目前,我们的数据收集器将所有信息插入名为 ovak_result 的表中,压缩数据最终将存储在 ovak_resultcompressed 中。但是,创建一个名为 ovak_resultuncompressed 的表并仅使用 ovak_result 作为“临时存储”有什么具体的优点或缺点吗? ovak_result 将保持最小,这对压缩例程很有好处,但我需要不断地将所有数据从一个表转移到另一个表中,并且 中会不断地读取、写入和删除ovak_结果

MySQL中有没有什么机制来处理这些事情?

请注意:我们在这里讨论的是相当大的数据集(未压缩表中大约有 100 M 行,压缩表中大约有 1-10 M 行)。此外,我几乎可以做我想做的事情想要软件和硬件配置,所以如果您有任何涉及 MySQL 配置或硬件设置的提示或想法,请提出。)

In an application at our company we collect statistical data from our servers (load, disk usage and so on). Since there is a huge amount of data and we don't need all data at all times we've had a "compression" routine that takes the raw data and calculates min. max and average for a number of data-points, store these new values in the same table and removes the old ones after some weeks.

Now I'm tasked with rewriting this compression routine and the new routine must keep all uncompressed data we have for one year in one table and "compressed" data in another table. My main concerns now are how to handle the data that is continuously written to the database and whether or not to use a "transaction table" (my own term since I cant come up with a better one, I'm not talking about the commit/rollback transaction functionality).

As of now our data collectors insert all information into a table named ovak_result and the compressed data will end up in ovak_resultcompressed. But are there any specific benefits or drawbacks to creating a table called ovak_resultuncompressed and just use ovak_result as a "temporary storage"? ovak_result would be kept minimal which would be good for the compressing routine, but I would need to shuffle all data from one table into another continually, and there would be constant reading, writing and deleting in ovak_result.

Are there any mechanisms in MySQL to handle these kind of things?

(Please note: We are talking about quite large datasets here (about 100 M rows in the uncompressed table and about 1-10 M rows in the compressed table). Also, I can do pretty much what I want with both software and hardware configurations so if you have any hints or ideas involving MySQL configurations or hardware set-up, just bring them on.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我的鱼塘能养鲲 2024-12-11 18:35:38

尝试阅读ARCHIVE 存储引擎


回复你的澄清。好吧,从你的描述中我没明白你的意思。仔细阅读,我发现您确实提到了最小值、最大值和平均值。

因此,您需要的是一个能够更新大型数据集的聚合计算的物化视图。一些 RDBMS 品牌(例如 Oracle)具有此功能,但 MySQL 没有。

一个尝试解决此问题的实验性产品称为 FlexViews (http://code.google.com/p/flexviews/)。这是 MySQL 的开源配套工具。您将查询定义为针对原始数据集的视图,FlexViews 持续监视 MySQL 二进制日志,当它看到相关更改时,它只更新视图中需要更新的行。

它非常有效,但它对可用作视图的查询类型有一些限制,而且它也是用 PHP 代码实现的,因此如果更新基表的流量非常高,它的速度就不够快。

Try reading about the ARCHIVE storage engine.


Re your clarification. Okay, I didn't get what you meant from your description. Reading more carefully, I see you did mention min, max, and average.

So what you want is a materialized view that updates aggregate calculations for a large dataset. Some RDBMS brands such as Oracle have this feature, but MySQL doesn't.

One experimental product that tries to solve this is called FlexViews (http://code.google.com/p/flexviews/). This is an open-source companion tool for MySQL. You define a query as a view against your raw dataset, and FlexViews continually monitors the MySQL binary logs, and when it sees relevant changes, it updates just the rows in the view that need to be updated.

It's pretty effective, but it has a few limitations in the types of queries you can use as your view, and it's also implemented in PHP code, so it's not fast enough to keep up if you have really high traffic updating your base table.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文