用于处理动态分类法的专用多面搜索引擎 - 仅有助于性能还是灵活性？

发布于 2024-08-17 18:47:29 字数 1200 浏览 12 评论 0原文

一段时间以来，我一直在考虑如何使用类似于 eBay 的分类法和依赖于特定产品类别的属性来对典型的电子商务网站进行建模。

第一次尝试是在 EAV 和 Table Per Class 数据库继承建模之间进行选择。我选择后者是因为性能，但它的意思是为每个特定（类别树中的叶子）产品类别创建专用表，并将特定类别属性（例如电视的分辨率）建模为单独的列。

虽然性能良好，但如果您需要向现有类别添加属性或添加新类别，则此设置并不灵活。对于每个此类更改，都需要以下内容：

更改/创建表
用于按特定属性过滤此类类别的新
表单用于生成用于搜索和过滤的数据库查询的新代码
一些新的视图模型/DTO 和视图，用于呈现新类别的产品

应对这种复杂性我认为 xml 甚至 excel 文件中需要对这些属性进行某种元表示（甚至在应用程序之外），以便每次更改时都可以自动生成所有提到的代码（sql/orm 查询、应用程序代码、模板）。因此它可以帮助开发，但仍然需要测试和额外的部署。

那时我了解到 eBay 并没有真正使用关系数据库进行搜索，而且他们的分类非常灵活，他们可以很快添加新的叶类别。此外，它们的类别可能不是来自关系数据库中建模的分层树的类别，而只是搜索属性（方面）。

在快速浏览了最有前途的专用分面搜索设置（单独的 Solr 实例）后，我不确定它是否可以帮助我灵活地适应分类法更改，因为通常 Solr 只是以某种方式镜像关系数据库，因此特定类别属性仍然必须在数据库中建模为 DBMS 元数据，因此例如。动态生成用于过滤属性的 UI 表单将很困难，除非：

1）我将使用 EAV fasion 将数据保留在 RDBMS 中，并使用 SOLR 搜索克服其性能问题（但仍然存在 EAV 混乱、没有数据完整性强制等问题）

2）我将只在 RDBMS 中保留属性字典（即仅保留它们的名称和类型），并将特定属性值存储在 SOLR 中，将其用作除搜索工具之外的非关系数据存储。我也不相信这个解决方案（即使可能），因为应用程序将与 solr 紧密耦合（即产品版本管理 CRUD 将直接与 SOLR 交互）。

你有什么想法？您认为对于任何类型的此类（高性能）分类法灵活性，代码生成都是不可避免的吗？你会怎么处理？也许数据库中 EAV 风格的一些单独的数据字典只是用于代码生成目的？我想我也可以使用 MongoDB 之类的东西，但 UI 代码生成（运行时或非运行时）仍然需要某种元数据。

这里有很多问题，但我不想将其分解为更小的问题，因为我对处理更大类此类问题时的通用设计方法感兴趣。

原文

I've been thinking for a while about modeling typical ecommerce site with ebay-like taxonomy and attributes dependent on a particular product category.

First attempt was choosing between EAV and Table Per Class db inheritance modeling. I've chosen the latter because of the performance, but what it meant was creating dedicated table for each specific (leaf in the category tree) product category with specific category attributes (like resolution for TVs) modeled as a separate column.

While performant this setup is not flexible if you need adding attributes to the existing categories or adding new categories. For each such change following is needed:

Alter/create table
New form for filtering withing such category by specific attributes
New code for generating db queries for searching and filtering
Some new viewmodels/DTOs and views for presenting products from new categories

To cope with that complexity I think some kind of meta representation of those attributes is needed (even outside of the application) in xml or even excel file, so that on each change all mentioned code could be auto-generated (sql/orm queries, application code, templates). So it can help with development, but still testing and extra deployment is needed.

At that point I've learned that ebay doesn't really use relational db for search, and that their taxonomy is so flexible, that they can quite quickly add new leaf categories. Also their categories aren't probably categories from a hierarchical tree modeled in relational db, but just search attributes (facets).

After having a quick look into most promising dedicated faceted search setup (separate Solr instance) I'm not sure whether it could help me in being flexible to taxonomy changes since usually Solr just mirrors somehow relational DB, so specific category attributes would still have to be modelled in DB as DBMS metadata, so eg. dynamic generating UI forms for filtering attributes would be hard unless:

1) I would keep the data in RDBMS using EAV fasion and overcome its performance problems with using SOLR search (but there still would be problems with EAV messiness, no data integrity enforcement etc)

2) I would keep just the attributes dictionary (ie. just their names and types) in RDBMS and store the specific attribute values in SOLR using it as kind of non-relational data store apart from search facility. I'm not convinced to this solution either (even if it's possible) since application would be coupled to tight with solr (ie. product edition admin CRUD would interact with SOLR directly).

What are your thoughts? Do you think that for any kind of such (performant) taxonomy flexibility code generation is inevitable? How would you handle that? Maybe some separate data dictionary in EAV fashion in DB just for code generation purposes? I guess I could also use something like MongoDB, but the UI code generation (runtime or not) would still need some kind of metadata.

There's lot of question here, but I didn't want to break it up into smaller questions since I'm interested in a general design approach when dealing with a bigger class of such problems.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

知你几分 2024-08-24 18:47:29

我并不声称对所有这些都有明确的答案（这是一个相当开放式的问题，您应该尝试将其分解为更小的部分，这取决于您的实际要求，事实上，我很想投票关闭它）但我会评论一些事情：

我会忘记在 RDBMS 上对此进行建模。分面搜索在关系架构中不起作用。
IMO 这不是代码生成的正确位置。您应该设计代码，使其不会随着数据更改而更改（我不是在谈论架构更改）。
在 Excel 电子表格上存储元数据/属性似乎是一个非常糟糕的主意。我将构建一个 UI 来编辑它，它将存储在 Solr / MongoDB / CouchDB / 您选择管理它的任何内容上。
Solr 不“只是镜像关系数据库”。事实上，Solr 完全独立于关系数据库。最常见的情况之一是将数据从 RDBMS 转储到 Solr（在此过程中对数据进行非规范化），但 Solr 足够灵活，可以在没有任何关系数据源的情况下工作。
Solr 中的分层分面仍然是研究中的一个悬而未决的问题。目前正在研究两种不同的方法（SOLR-64，SOLR-792)

回复收藏 0 原文

日暮斜阳 2024-08-24 18:47:29

如果您对不同类型的产品有不同类型的类别怎么办？

以 eBay 为例，我们的产品可以是书籍或电视/显示器。

书籍有书名和 ISBN，并且可能属于科幻类别，或者属于色情类别，或者属于非小说类别，或者属于自传类别。或者，也许您有一本属于非小说类、自传体色情类别的书。

显示器有屏幕分辨率和功耗（？），并且可能属于平板类别、CRT 类别或高清类别。

从纯粹的关系角度来看，您也许可以像这样建模：

[Product]-(1)------(1)-[  Book  ]-(n)------(m)-[ book_category ]
| id    |              | title  |              |  name         |
| price |              | ISBN   |
| ...   |
| ...   |-(1)---(1)-[   display  ]-(n)------(m)-[ display_category ]
                    | resolution |              |  name            |
                    |   watts    |

您将拥有不同的属性和类别取决于产品的类型/类别。

请参阅超类型和超类型亚型

What if you had different types of categories for different types of products?

Taking the eBay example, we would have Products that can be either Books or TV/Displays.

Books have title and ISBN, and may be in the sci-fi category, or in the erotic category, or in the non-fiction category, or autobiographical category. Or maybe you have a book that is in the non-fiction, autobiographical erotic categories.

Displays have screen resolution and watt-consumption (?), and may be in the flat-screen category, CRT category, or HD category.

From a purely relational point of view, you could maybe model this like so:

[Product]-(1)------(1)-[  Book  ]-(n)------(m)-[ book_category ]
| id    |              | title  |              |  name         |
| price |              | ISBN   |
| ...   |
| ...   |-(1)---(1)-[   display  ]-(n)------(m)-[ display_category ]
                    | resolution |              |  name            |
                    |   watts    |

Instead of modeling attributes dependent on a particular product category, you would have different properties and categories dependent on the type/class of product.

See supertypes & subtypes

回复收藏 0 原文

~没有更多了~