缓存 EAV 数据 - XML 还是 NoSQL / MongoDB?
我正在构建一个严重依赖于 EAV 的网络应用程序用于存储数据的模式。这基本上意味着对象的每个属性在大型数据库表中都有自己的行。我使用 MySQL 来存储所有内容。这是我存储的内容的一个非常简单的示例...
OBJECTS ATTRIBUTES
objId | type objId | attribute | value
============= =========================
1 | fruit 1 | color | green
2 | fruit 1 | shape | round
3 | book 2 | color | red
我知道有些人讨厌 EAV,但我需要能够在不修改数据库模式的情况下任意添加新的对象属性,并且到目前为止它对我来说工作得很好。
正如我认为其他人在使用 EAV 数据结构构建系统时发现的那样,这种方法的弱点是检索多个对象以及每个对象的属性。目前,我的应用程序一次仅显示 10 个对象,因此我只需查询 EAV 表 10 次(每个对象一次),而且速度仍然非常快。但是,我想消除此限制并允许一次性获取数百个对象。我还希望能够以比目前更灵活的方式查询对象。
使用 SQL 连接执行此操作会很糟糕,因此我正在考虑缓存数据。平均而言,数据库每 1 次写入获得约 300 次读取,因此我认为这是一个很好的缓存候选者。
到目前为止,这些是我提出的选项...
XML 数据库列:每次执行写入时,更新对象 包含所有对象属性的表。这对于快速读取数据来说是可行的,但是查询隐藏在数据库表中的 XML 数据却很混乱。
XML 文件:每次执行写入时,都会将一个 XML 文件写入磁盘,其中包含每个对象及其属性。这样做的好处是我可以使用 XQuery 来查询对象。
NoSQL(例如 MongoDB):也许我应该在像 MongoDB 这样的无模式数据库上构建系统。重新编写整个应用程序以使用 MongoDB 将非常耗时,但我突然意识到我可以将它用作缓存。例如,每次将数据写入 EAV 存储时,MongoDB 中的等效对象都会更新,然后用于读取和查询。
最初我认为 XML 文件是最好的方法,但我发现该文件变得非常大且难以管理。目前我倾向于使用 MongoDB。我知道为一个应用程序运行两个数据库服务器似乎很疯狂,但我认为它适合我的情况。
我很想听听您对此的想法。
I'm building a web app that relies heavily on the EAV pattern for storing data. This basically means that each attribute of an object has it's own row in a massive database table. I'm using MySQL to store everything. This is a very simplified example of what I'm storing...
OBJECTS ATTRIBUTES
objId | type objId | attribute | value
============= =========================
1 | fruit 1 | color | green
2 | fruit 1 | shape | round
3 | book 2 | color | red
I know some people hate EAV, but I need to be able to add new object attributes arbitrarily without modifying the database schema, and it's working very well for me so far.
As I think anyone else finds when building a system using an EAV data structure, the weakness of this approach is the retrieval of multiple objects together with each object's attributes. At the moment my app only displays 10 objects at a time, so I just query my EAV table 10 times (once for each object) and it's still very fast. However, I'd like to remove this limitation and allow hundreds of objects to be fetched in one go. I also want to be able to query objects in a more flexible way than I'm doing currently.
Doing this with SQL joins would be hideous, so I'm considering caching the data. On average the database gets about 300 reads for every 1 write, so I think this it's a good candidate for caching.
So far these are the options I've come up with...
XML database column: Every time a write is performed, update an XML text column in the objects table containing all the object's attributes. This would work for reading the data quickly, but querying XML data hidden in a database table is messy.
XML file: Every time a write is performed, write an XML file to disk which contains each object and it's attributes. This has the benefit that I can then use XQuery to query the objects.
NoSQL (eg. MongoDB): Perhaps I should have built the system on a schemaless database like MongoDB. Re-writing the entire app to use MongoDB would be quite time consuming, but it struck me that I could use it as a cache. So for example, every time data is written to the EAV store, the equivalent object would be updated in MongoDB which would then be used for reads and queries.
Originally I thought an XML file would be the best approach, but I can see the file getting really big and unmanageable. At the moment I'm leaning towards using MongoDB. I know it seems crazy running two database servers for one app, but I think it could work in my case.
I'd love to hear your thoughts on this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我只看到两种方式,评论中都提到了这两种方式。
首先,您可以真正迁移到像 Mongo 这样的面向文档的数据库 - 这适合作为 EAV 的替代品。由于它没有 JOIN 和其他逻辑,因此它会非常快并且会稍微扩展。 (因此,也许您将能够避免使用缓存)。
其次,您可以使用特定的缓存工具(例如Redis或Mongo或Memcached)将每个查询结果保存一段时间。
但我想把我们的注意力转向这个系统的未来。什么是计划加载和缩放?
如果你想减少系统负载,我认为最好的方法是迁移到面向文档的db。
或者,如果您想立即获得结果(缓存数据以供读取) - 可以通过使用缓存工具来实现,甚至[如果可能]在网络级别(例如 nginx 支持开箱即用的 memcached)。
因此,像往常一样,您应该在一次性成本和持续成本之间找到平衡。
I see only two ways, both of them were mentioned in comments.
First, you can really migrate to document-oriented db like Mongo - this is suitable as alternative to EAV. Since it'll be no JOINs and other logic, it'll be very fast and slightly scaled. (So, perhaps you'll be able to avoid using cache).
Second, you can use specific tool for caching like Redis or Mongo or Memcached to save every query result for some time.
But I want to turn our mind to the future of this system. What is planned loading and scaling?
If you want to reduce system load, I think the best way is to migrate to document-oriented db.
Or, if you want to have result immediately (cache data for reading) - it can be reached by using caching tool, even [if possible] on network level (for example nginx support memcached out of the box).
So, as usual, you should find balance between one-time and continious costs.