实体属性值数据库与严格关系模型电子商务
可以肯定地说,EAV/CR 数据库模型很糟糕。 也就是说,
问题:应该使用什么数据库模型、技术或模式来处理描述可以在运行时更改的电子商务产品的属性“类”?
在良好的电子商务中数据库中,您将存储选项类别(例如电视分辨率,然后每台电视都有一个分辨率,但下一个产品可能不是电视并且没有“电视分辨率”)。 如何存储它们、高效搜索并允许用户使用描述其产品的变量字段设置产品类型? 如果搜索引擎发现客户通常根据控制台深度搜索电视,您可以将控制台深度添加到字段中,然后在运行时为每种电视产品类型添加单个深度。
优秀的电子商务应用程序有一个很好的共同功能,它们会显示一组产品,然后有“向下钻取”侧菜单,您可以在其中看到“电视分辨率”作为标题,以及该应用程序的前五个最常见的电视分辨率找到设置。 您单击其中一个,它只会显示该分辨率的电视,您可以通过在侧面菜单上选择其他类别来进一步深入了解。 这些选项将是在运行时添加的动态产品属性。
进一步讨论:
长话短说,互联网上是否有任何链接或模型描述可以“学术地”修复以下设置?我感谢诺埃尔·肯尼迪提出的建议类别表,但需求可能不止于此。 下面我以不同的方式描述它,试图强调其重要性。 我可能需要修正视角来解决问题,或者我可能需要更深入地研究 EAV/CR。
喜欢 EAV/CR 模型的积极反应。 我的开发人员同事都说了 Jeffrey Kemp 在下面谈到的内容:“新实体必须由专业人士建模和设计”(断章取义,请阅读下面他的回复)。 问题是:
- 实体每周添加和删除属性
(搜索关键字决定未来属性) - 新实体每周到达
(产品由零件组装而成) - 旧实体每周离开
(已存档,不太受欢迎,季节性)
客户希望为产品添加属性有两个原因:
- 部门/关键字搜索/类似产品之间的比较图表
- 消费者结帐前的产品
配置属性必须有意义,而不仅仅是关键词搜索。 如果他们想比较所有带有“生奶油糖霜”的蛋糕,他们可以单击蛋糕,单击生日主题,单击生奶油糖霜,然后检查所有有趣的蛋糕,因为它们都有生奶油糖霜。 这不是特定于蛋糕,只是一个例子。
It is safe to say that the EAV/CR database model is bad. That said,
Question: What database model, technique, or pattern should be used to deal with "classes" of attributes describing e-commerce products which can be changed at run time?
In a good E-commerce database, you will store classes of options (like TV resolution then have a resolution for each TV, but the next product may not be a TV and not have "TV resolution"). How do you store them, search efficiently, and allow your users to setup product types with variable fields describing their products? If the search engine finds that customers typically search for TVs based on console depth, you could add console depth to your fields, then add a single depth for each tv product type at run time.
There is a nice common feature among good e-commerce apps where they show a set of products, then have "drill down" side menus where you can see "TV Resolution" as a header, and the top five most common TV Resolutions for the found set. You click one and it only shows TVs of that resolution, allowing you to further drill down by selecting other categories on the side menu. These options would be the dynamic product attributes added at run time.
Further discussion:
So long story short, are there any links out on the Internet or model descriptions that could "academically" fix the following setup? I thank Noel Kennedy for suggesting a category table, but the need may be greater than that. I describe it a different way below, trying to highlight the significance. I may need a viewpoint correction to solve the problem, or I may need to go deeper in to the EAV/CR.
Love the positive response to the EAV/CR model. My fellow developers all say what Jeffrey Kemp touched on below: "new entities must be modeled and designed by a professional" (taken out of context, read his response below). The problem is:
- entities add and remove attributes weekly
(search keywords dictate future attributes) - new entities arrive weekly
(products are assembled from parts) - old entities go away weekly
(archived, less popular, seasonal)
The customer wants to add attributes to the products for two reasons:
- department / keyword search / comparison chart between like products
- consumer product configuration before checkout
The attributes must have significance, not just a keyword search. If they want to compare all cakes that have a "whipped cream frosting", they can click cakes, click birthday theme, click whipped cream frosting, then check all cakes that are interesting knowing they all have whipped cream frosting. This is not specific to cakes, just an example.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
http://code .google.com/p/xee/source/browse/trunk/XeePhotoshopLoader.m?spec=svn28&r=11#107
内部模型充其量是古怪的,就像有人将架构放入一个令人难以置信的游戏中,将其密封并将其放入油漆粉碎机中...
现实世界:我正在开发一个中间件履行应用程序,这里是获取地址信息的查询之一。
懒惰地精确获取订单的地址信息
--
摘要: 仅在以下情况下使用 Magento:
http://code.google.com/p/xee/source/browse/trunk/XeePhotoshopLoader.m?spec=svn28&r=11#107
The internal models are wacky at best, like someone put the schema into a boggle game, sealed that and put it in a paint shacker...
Real world: I'm working on a midware fulfilment app and here are one the queries to get address information.
Exacts address information for an order, lazily
--
Summary: Only use Magento if:
我很惊讶没有人提到 NoSQL 数据库。
我从未在生产环境中实践过 NoSQL(刚刚测试了 MongoDB,印象深刻),但 NoSQL 的全部意义在于能够在同一个“文档”中保存具有不同属性的项目。
I'm surprised nobody mentioned NoSQL databases.
I've never practiced NoSQL in a production context (just tested MongoDB and was impressed) but the whole point of NoSQL is being able to save items with varying attributes in the same "document".
在性能不是主要要求的情况下,例如在 ETL 类型的应用程序中,EAV 具有另一个明显的优势:差异保存。
我已经实现了许多应用程序,其中首要要求是能够查看域对象从第一个“版本”到当前状态的历史记录。 如果该域对象具有大量属性,则意味着每次更改都需要将新行插入到其相应的表中(不是更新,因为历史记录会丢失,而是插入)。 假设这个域对象是一个 Person,我有 50 万个 Person 需要跟踪,在 Person 的生命周期中,各种属性平均有 100 多次更改。 再加上只有 1 个主要域对象的应用程序很少见,您很快就会推测数据库的大小将很快失去控制。
一个简单的解决方案是仅保存对主要域对象的差异更改,而不是重复保存冗余信息。
所有模型都会随着时间的推移而变化,以反映新的业务需求。 时期。 使用 EAV 只是我们工具箱中可供使用的工具之一; 但它不应该被自动归类为“坏”。
Where performance is not a major requirement, as in an ETL type of application, EAV has another distinct advantage: differential saves.
I've implemented a number of applications where an over-arching requirement was the ability to see the history of a domain object from its first "version" to it's current state. If that domain object has a large number of attributes, that means each change requires a new row be inserted into it's corresponding table (not an update because the history would be lost, but an insert). Let's say this domain object is a Person, and I have 500k Persons to track with an average of 100+ changes over the Persons life-cycle to various attributes. Couple that with the fact that rare is the application that has only 1 major domain object and you'll quickly surmize that the size of the database would quickly grow out of control.
An easy solution is to save only the differential changes to the major domain objects rather than repeatedly saving redundant information.
All models change over time to reflect new business needs. Period. Using EAV is but one of the tools in our box to use; but it should never be automatically classified as "bad".
我正在努力解决同样的问题。 您可能会感兴趣查看以下关于两个现有电子商务解决方案的讨论:Magento (EAV) 和 Joomla(常规关系结构):
https://forum.virtuemart.net/index.php?topic=58686.0
看起来,Magento 的 EAV 性能确实令人惊叹。
这就是为什么我倾向于标准化结构。 为了克服灵活性的缺乏,我正在考虑将来添加一些可以编辑的单独的数据字典(XML 或单独的数据库表),并基于此,用于显示和比较产品类别与新属性集的应用程序代码将是与 SQL 脚本一起生成。
在这种情况下,这种架构似乎是最佳选择——同时灵活且高性能。
该问题可能是在实际环境中频繁使用 ALTER TABLE 造成的。 我正在使用 Postgres,因此它的 MVCC 和事务性 DDL 将有望减轻痛苦。
I'm struggling with the same issue. It may be interesting for you to check out the following discussion on two existing ecommerce solutions: Magento (EAV) and Joomla (regular relational structure):
https://forum.virtuemart.net/index.php?topic=58686.0
It seems, that Magento's EAV performance is a real showstopper.
That's why I'm leaning towards a normalized structure. To overcome the lack of flexibility I'm thinking about adding some separate data dictionary in the future (XML or separate DB tables) that could be edited, and based on that, application code for displaying and comparing product categories with new attributes set would be generated, together with SQL scripts.
Such architecture seems to be the sweetspot in this case - flexible and performant at the same time.
The problem could be frequent use of ALTER TABLE in live environment. I'm using Postgres, so its MVCC and transactional DDL will hopefully ease the pain.
如果它只是关于产品目录属性,因此这些属性的验证要求相当有限,那么 EAV 唯一真正的缺点是查询性能,甚至当您的查询处理多个具有属性的“事物”(产品)时,这只是一个问题,查询“给我 id 234 的产品的所有属性”的性能虽然不是最佳的,但仍然很快。
一种解决方案是仅将 SQL 数据库/EAV 模型用于产品目录的管理/编辑端,并采用一些流程将产品非规范化为可搜索的内容。 由于您已经拥有属性,因此您很可能需要分面,因此可以是 Solr 或 ElasticSearch。 这种方法基本上避免了 EAV 模型的所有缺点,并且增加的复杂性仅限于在更新时将完整产品序列化为 JSON。
If it's just about the product catalog attributes and hence validation requirements for those attributes are rather limited, the only real downside to EAV is query performance and even that is only a problem when your query deals with multiple "things" (products) with attributes, the performance for the query "give me all attributes for the product with id 234" while not optimal is still plenty fast.
One solution is to use the SQL database / EAV model only for the admin / edit side of the product catalog and have some process that denormalizes the products into something that makes it searchable. Since you already have attributes and hence it's rather likely that you want faceting, this something could be Solr or ElasticSearch. This approach avoids basically all downsides to the EAV model and the added complexity is limited to serializing a complete product to JSON on update.
我仍然投票支持 EAV 在最低意义的原子级别进行建模。 让面向特定用户群体的标准、技术和应用来决定内容模型、属性、颗粒的重复需求等。
I still vote for modeling at the lowest-meaningful atomic-level for EAV. Let standards, technologies and applications that gear toward certain user community to decide content models, repetition needs of attributes, grains, etc.
EAV 有许多缺点:
一旦应用程序中的数据量超过一定大小,该数据的检索和操作的效率可能会变得越来越低。
您无法为所有需要的字段定义外键。
EAV has many drawbacks:
Once the amount of data in the application grows beyond a certain size, the retrieval and manipulation of that data is likely to become less and less efficient.
You can't define foreign keys for all the fields needed.
我有一个稍微不同的问题:我想要存储更像电子表格的东西,而不是许多具有稀疏值的属性(这可能是使用 EAV 的一个很好的理由)。 工作表中的列可以更改,但工作表中的所有单元格都将包含数据(不是稀疏的)。
我做了一个一小组测试来对两种设计进行基准测试:一个使用 EAV,另一个使用用于存储单元格数据的 Postgres ARRAY。
EAV
数组
两种模式都在适当的列上有索引,并且索引由规划器使用。
事实证明,基于数组的模式速度快了一个数量级 用于插入和查询。 从快速测试来看,两者似乎都是线性扩展的。 不过,测试还不是很彻底。 欢迎提出建议和叉子 - 它们获得了麻省理工学院的许可。
I have a slightly different problem: instead of many attributes with sparse values (which is possibly a good reason to use EAV), I want to store something more like a spreadsheet. The columns in the sheet can change, but within a sheet all cells will contain data (not sparse).
I made a small set of tests to benchmark two designs: one using EAV, and the other using a Postgres ARRAY to store cell data.
EAV
Array
Both schemas have indexes on appropriate columns, and the indexes are used by the planner.
It turned out the array-based schema was an order of magnitude faster for both inserts and queries. From quick tests, it seemed that both scaled linearly. The tests aren't very thorough, though. Suggestions and forks welcome - they're under an MIT licence.
我能想到一些一般的优点和缺点,在某些情况下,一种比另一种更好:
选项 1,EAV 模型:
由用户添加?)
报告
不可能
选项 2,分别对每个实体建模:
需求和设计
由专业人士设计
实体
理解和调试
选项 3,组合(“正确”模型实体,但为某些/所有实体的自定义属性添加“扩展”)< /strong>
* 我不确定选项 3 是否一定会在设计阶段节省任何时间。
我个人倾向于选项 2,并尽可能避免 EAV。 然而,对于某些场景,用户需要 EAV 带来的灵活性; 但这需要付出巨大的代价。
There's a few general pros and cons I can think of, there are situations where one is better than the other:
Option 1, EAV Model:
be added by users?)
reports
impossible
Option 2, Modelling each entity separately:
requirements and design
designed by a professional
entity
understand and debug
Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)
* I'm not sure if Option 3 would necessarily save any time in the design phase.
Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.
不,这不对。 只是它们是关系数据库的低效使用。 纯粹的键/值存储非常适合此模型。
现在,回答您真正的问题:如何存储各种属性并使它们可搜索?
只需使用 EAV 即可。 在你的情况下,这将是一个额外的表。 在属性名称和值上对其进行索引,大多数 RDBM 会使用前缀压缩来对属性名称重复进行压缩,从而使其变得非常快速和紧凑。
当您使用 EAV/CR 替换“真实”字段时,它会变得丑陋。 与所有工具一样,过度使用它是“不好的”,并且会给它带来不好的形象。
No, it's not. It's just that they're an inefficient usage of relational databases. A purely key/value store works great with this model.
Now, to your real question: How to store various attributes and keep them searchable?
Just use EAV. In your case it would be a single extra table. index it on both attribute name and value, most RDBMs would use prefix-compression to on the attribute name repetitions, making it really fast and compact.
EAV/CR gets ugly when you use it to replace 'real' fields. As with every tool, overusing it is 'bad', and gives it a bad image.