我可以在一个表列中存储大量数据(<=4GB)。但这是个好主意吗?

发布于 2024-08-18 07:33:16 字数 596 浏览 2 评论 0 原文

长话短说,我正在开发的应用程序的一部分需要在数据库中存储大量数据,以便应用程序的另一部分稍后获取。通常这将是< 2000 行,但有时会超过 300,000 行。数据需要暂时存储,之后可以删除。

我一直在思考各种想法,今天想到了一件事。 LONGTEXT 数据类型最多可以存储 2^32 字节,相当于 4 GB。现在,要在一个表行中塞入很多东西。请注意,数据最多可能不会超过 60-80 MB。但我的问题是,实际上这样做是个好主意吗?

我目前正在使用的两个解决方案是这样的:

  • 将所有数据作为单独的行插入到“临时”表中,该表在完成后将被截断。
  • 将所有数据作为序列化字符串插入到行中的 LONGTEXT 列中,完成后将删除该列。

纯粹从性能角度来看,将数据存储为可能超过 300,000 个单独的行,还是存储为 60 MB 的 LONGTEXT 条目会更好?

如果这是一次清洗,我可能会选择 LONGTEXT 选项,因为它将使应用程序中获取数据的部分更容易编写。它还可以与另一个部分更好地结合,从而提高应用程序的整体性能。

我将不胜感激对此的任何想法。

To make a long story short, one part of the application I'm working on needs to store a somewhat large volume of data in a database, for another part of the application to pick up later on. Normally this would be < 2000 rows, but can occasionally exceed 300,000 rows. The data needs to be temporarily stored and can be deleted afterwards.

I've been playing around with various ideas and one thing came to mind today. The LONGTEXT datatype can store a maximum of 2^32 bytes, which equates to 4 GB. Now, that's a lot of stuff to cram into one table row. Mind you, the data would probably not exceed 60-80 MB at the very most. But my question is, is it a good idea to actually do that?

The two solutions I'm currently toying with using are something like this:

  • Inserting all data as individual rows into a "temporary" table that would be truncated after finish.
  • Inserting all data as a serialized string into a LONGTEXT column in a row that would be deleted after finish.

Purely from a performance perspective, would it be better to store the data as potentially >300,000 individual rows, or as a 60 MB LONGTEXT entry?

If it's a wash, I will probably go with the LONGTEXT option, as it would make the part of the application that picks up the data easier to write. It would also tie in better with yet another part, which would increase the overall performance of the application.

I would appreciate any thoughts on this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

川水往事 2024-08-25 07:33:16

将所有这些数据序列化为LONGTEXT...亵渎!! :)

说真的,我突然想到,如果你这样做,你将别无选择,只能将其全部提取为一个巨大的部分。另一方面,如果将其分散到单独的行中,则可以让前端以较小的批次获取它。

至少给自己这个选择似乎是明智之举。 (请记住,低估一次数据的未来大小要求可能是一个致命错误!)

如果您设计正确的表,我非常怀疑分布在 300.000 行上的 60MiB 数据是否会产生任何影响?比获取 60MiB 文本并在前端解析它的效率要低。

最终的问题是:你认为你的前端解析文本的效率比 MySQL 获取文本的效率高吗?

Serializing all that data into a LONGTEXT... blasphemy!! :)

Seriously though, it occurs to me that if you do that, you would have no choice than to extract it all in one, giant, piece. If you spread it into individual rows, on the other hand, you can have your front-end fetch it in smaller batches.

At least giving yourself that option seems the smart thing to do. (Keep in mind that underestimating the future size requirements of once data can be a fatal error!)

And if you design your tables right, I doubt very much that 60MiB of data spread over 300.000 rows would be any less efficient than fetching 60MiB of text and parsing that on the front-end.

Ultimately the question is: do you think your front-end can parse the text more efficiently than MySQL can fetch it?

素衣风尘叹 2024-08-25 07:33:16

只要您使用内存存储引擎,这应该没问题。在 MySQL 中,这意味着使用 MEMORY 存储引擎而不是 InnoDB 或 MyISAM。否则磁盘使用将使您的应用程序崩溃。

This should be fine as long as you use a memory storage engine. In MySQL, this means using the MEMORY storage engine instead of InnoDB or MyISAM. Otherwise disk usage will bring your app to its knees.

知你几分 2024-08-25 07:33:16

什么样的数据以及如何使用这些数据?在应用程序的内存中存储和处理它可能会更好。至少,它会快得多,并且不会加载数据库引擎。

What kind of data and how it will be used? Probably it will be much better to store and process it in memory of your application. At least, it will be much faster and will not load DB engine.

别念他 2024-08-25 07:33:16

您始终可以将其以 300,000 行格式存储在数据库中,并使用 memcached 缓存数据,这样您就不必再次执行此操作。请注意,memcached 将其存储在计算机的内存中,因此如果您使用大量此类数据,您可以对其设置较低的过期时间。但是 memcached 显着加快了获取数据的时间,因为您不必在每次页面加载时都进行查询。

You could always store it in the database as the 300,000 row format and use memcached to cache the data so you don't have to do it again. Please note that memcached stores it in the memory of the machine so if your using a lot of this data you may way to set a low expire on it. But memcached significantly speeds up the time to fetch data because you dont have to do queries every page load.

温馨耳语 2024-08-25 07:33:16

如果您只想写入一个大型临时 BLOB,您可能会考虑写入共享文件系统上的临时文件。

If you're going to just be writing a large, temporary BLOB you might consider writing to a temporary file on a shared file system instead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文