存储大量文本(存储到数据库或作为文件?)的最佳实践是什么?压缩它怎么样?
我正在构建一个网络应用程序,用于处理用户和客户之间的内部电子邮件和其他常见的中小型文本块。 存储这些数据的最佳方法是什么? 在数据库 (MySQL) 中还是作为数千个单独的文件? 压缩它怎么样(PHP 的 gzcompress() 或 MySQL 的压缩功能)?
这不是公共应用程序,因此用户负载将很小(一次少于 20 个用户)。 然而,应用程序内每天都会有大量的通信来回进行,因此我预计随着时间的推移,数据量会变得相当大(这就是我想压缩它的原因)。
我想将数据保存在数据库中,以便于访问和移植,但我在这里看到的一些有关图像的线程建议使用文件存储。 你怎么认为?
谢谢你, Seth
编辑澄清:我不需要对文本进行任何类型的搜索,这就是为什么我倾向于压缩它以节省空间。
I'm building a web-app that handles internal emails and other frequent small-to-medium sized chunks of text between users and clients. What's the best method for storing this data? In a database (MySQL) or as thousands of individual files? What about compressing it (PHP's gzcompress() or MySQL's compression features)?
This will not be a public application, so the user load will be minimal (less than 20 users at a time). However, there will be a lot of communication going back-and-forth every day within the app, so I expect the amount of data to grow quite large as time goes by (which is why I'd like to compress it).
I'd like to keep the data in a database for ease of access and portability, but some of the threads I've seen on here regarding images have suggested using file storage. What do you think?
Thank you,
Seth
Edit for clarification: I do not require any sort of searching of the text, which is why I would lean toward compressing it to save on space.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
对于已经采用特定格式的图像和文档(excel、word 文档、pdf 文件等),我更喜欢文件存储。 但对于原始文本,我可能宁愿使用数据库。 跨机器复制以进行故障转移更容易,您可以对文本进行子字符串搜索,尽管我不知道用于压缩它的特定算法,但我认为数据库将是更好的方法。 但前提是您已经只有文本并且它只是文本。 对于任何其他格式的文档,我更喜欢使用文件存储。
除非我遗漏了什么,否则如果只是文本,我会使用 CLOB 而不是 BLOB。
For images and documents that are already in a specific format (excel, word documents, pdf files, etc) I prefer file storage. But for just raw text I would probably rather use a database. It is easier to replicate across machines for failover, you can do substring searches over the text and although I don't know of a specific algorithm to use to compress it, I would think that a database would be a better way to go. But only if you already have just the text and it is only text. Any other format of document I would prefer using file storage.
And unless I am missing something I would use a CLOB instead of a BLOB, if it is only text.
将文件保留在数据库中的主要原因之一是使其与您存储的其他数据保持一致。 进行备份、(重新)部署预定义数据集等将变得更加容易。此外,更容易保证事务完整性。
将文本存储为文件的好处之一可能是使用网络服务器更容易提供它们,如果这是使用文件的唯一剩余好处,您可以考虑在网络服务器上缓存文件 - 这将为您带来很多好处数据库的备份和事务处理很简单,但同时可以加速 http 请求。
One of the main reasons for keeping the files in a database is to keep it consistent with the rest of the data that you are storing. It will be easier to make backups, (re)deploy with predefined datasets etc. Furthermore it's easier to guarantee transactional integrity.
One of the benefits of storing text as files could be that it is easier to serve them using a webserver, if this is the only remaining benefit of using files you could look into caching the files on the webserver -- that will give you much of the easy backup and transactions of the database but at the same time allow some speedup for http requests.
我会选择使用数据库。 您描述了一个要存储大量消息的场景。 您没有提供有关系统的太多信息,但我猜测您可能希望对消息进行排序、分组和应用其他几个属性。 将消息及其属性保存在数据库中而不是使用文件存储会更容易,也可能更快。
当谈到压缩时,我不知道哪种方法最有效。 在选择之前您可能应该尝试两者。
I would have chosen to use a DB. You describe a scenario where you are going to store a large quantity of messages. You do not provide much information about the system, but i would guess that you probably would like to sort, group and apply several other properties to the messages. It would be much easier and probably faster to keep the message with its attributes in a DB instead of using file storage.
When it comes to compression I do not know which of the methods is most effective. You should probably try both before choosing.
我想知道这个“中等块”有多大。 如果文本只是书面消息(因此小于 10 KB),那么压缩会使它们变得更小,并且不会对数据库增长产生太大影响。 通过单个查询即可获得所有可用内容,而不必单独获取文件内容,这使得开发和维护变得更加容易。
I wonder how big is this "medium chunk". If the text is just written messages (so less than 10 KB), then compressing makes them even smaller and there wouldn't be big impact on database growth. It makes developing and maintenance also much easier to have everything available with singl query and not having to get the file contents separately.