动态数据库/键-值/实体-键值困境

发布于 2024-12-22 08:28:38 字数 1194 浏览 1 评论 0原文

我多年来一直在对关系数据库进行编程,但现在遇到了一个不寻常且棘手的问题:

我正在构建一个应用程序,需要具有非常快速且易于定义的实体(由用户)。然后可以创建、更新、删除这些实体的实例。

我可以想到两个选项。

选项 1 - 动态创建表

第一个选项是编写一个引擎来动态生成表,并将数据插入其中。然而,这将变得非常棘手,因为每个查询也需要是动态的,或者至少是动态创建的存储过程等。

选项 2 - 实体 - 键 - 值模式

这是我唯一现实的选择可以想到,我有5个表结构:

EntityTypes

EntityTypeID int

EntityTypeName nvarchar(50)

Entities

EntityID int

EntityTypeID int

FieldTypes

FieldTypeID int

FieldTypeName nvarchar(50)

SQLtype int

FieldValues

EntityID int

FIeldID int

Value nvarchar(MAX)

Fields

FieldID int

FieldName nvarchar(50)

FieldTypeID int

“FieldValues”表的工作方式有点像数据仓库事实表,我所有的插入/更新都可以通过填充“键/值”表值参数并将其传递给 SPROC(以避免多次插入/更新)。

所有的表都会被大量索引,我最终会进行许多自连接来获取数据。

我读过很多关于键/值数据库有多糟糕的文章,但对于这个问题它似乎仍然是最好的。

现在我的问题!

  • 除了这两个选项之外,任何人都可以建议另一种方法或模式吗?
  • 对于中等规模的数据集(最多 100 万行),选项二是否可行?
  • 我可以使用选项 2 进行进一步优化吗?

非常感谢任何指导和建议!

I have been programming relational database for many years, but now have come across an unusual and tricky problem:

I am building an application that needs to have very quick and easily defined entities (by the user). Instances of these entities could then be created, updated, deleted etc.

There are two options I can think of.

Option 1 - Dynamically created tables

The first option is to write an engine to dynamically generate the tables, and insert the data into these. However, this would become very tricky, as every query would also need to be dynamic, or at least dynamically created stored procedures etc.

Option 2 - Entity - Key - Value Pattern

This is the only realistic option I can think of, where I have 5 table structure:

EntityTypes

EntityTypeID int

EntityTypeName nvarchar(50)

Entities

EntityID int

EntityTypeID int

FieldTypes

FieldTypeID int

FieldTypeName nvarchar(50)

SQLtype int

FieldValues

EntityID int

FIeldID int

Value nvarchar(MAX)

Fields

FieldID int

FieldName nvarchar(50)

FieldTypeID int

The "FieldValues" table would work a little like a datawarehouse fact table, and all my inserts/updates would work by filling a "Key/Value" table valued parameter and passing this to a SPROC (to avoid multiple inserts/updates).

All the tables would be heavily indexed, and I would end up doing many self joins to obtain the data.

I have read a lot about how bad Key/Value databases are, but for this problem it still seems to be the best.

Now my questions!

  • Can anyone suggest another approach or pattern other than these two options?
  • Would option two be feasible for medium sized datasets (1 million rows max)?
  • Are there further optimizations for option 2 I could use?

Any direction and advice much appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

溺ぐ爱和你が 2024-12-29 08:28:38

就我个人而言,我只会使用“noSQL”(键/值)数据库,例如 MongoDB

但如果您需要使用关系数据库,则选择 2。这种模型的一个很好的例子是 Alfresco 数据字典(Alfresco 是一个企业内容管理系统)。它的设计与您所描述的类似,尽管它们具有多个字段值列(对于数据库中可用的每个简单类型)。如果您添加一个好的缓存系统(例如 Ehcache),它应该可以正常工作。

Personally I would just use a "noSQL" (key/value) database like MongoDB.

But if you need to use a relational database option 2 is the way to go. A good example of that kind of model is the Alfresco Data Dictionary (Alfresco is an enterprise content management system). It's design is similar to what you describe, although they have multiple columns for field values (for every simple type available in the database). If you add a good cache system to that (for example Ehcache) it should work fine.

剩余の解释 2024-12-29 08:28:38

正如其他人建议使用 NoSQL 一样,我想说的是,在我看来,无模式数据库确实最适合没有模式的用例。

从描述和您提出的架构来看,您的案例实际上并不是“无架构”,而是似乎是“用户定义的架构”。

事实上,您提出的模式看起来与关系数据库的内部元模式非常相似。 (你有点在关系数据库之上构建一个关系数据库,根据我的经验,这不是一个好主意,因为这个“元数据库”对于任何基本操作来说至少有两倍的开销和复杂性 - 表将变得非常大,这不能很好地扩展,并且数据将难以查询和更新,问题将难以调试,等等。)

对于这样的用例,您可能需要 DDL:数据定义语言。

您没有说明您正在使用哪个 SQL 数据库,但大多数 SQL 数据库(例如 MySQL、PostgreSQL 和 MS-SQL)都支持 SQL 语法的 DDL 扩展的某些方言,这使您可以操作实际的架构。

我过去已经针对像您这样的用例成功地完成了此操作。它适用于架构很少更改且每个用户的数据量相对较低的情况。 (对于大量或频繁的架构更新,您可能需要无架构或某种其他类型的 NoSQL 数据库。)

您可能需要一些侧面的表来存储不适合 SQL 架构的其他字段信息 - 您可能想要复制某些架构那里也有信息,因为从实际模式中读回可能很困难或效率低下。

确保对字段信息表和模式的原子更新可能需要事务,而您的数据库引擎可能不支持事务 - PostgreSQL 至少支持事务模式更新。

在安全方面您必须保持警惕 - 您不想让用户创建、存储或删除他们不应该做的事情。

如果它适合您的用例,请考虑不仅使用单独的表,还考虑使用单独的数据库,这些数据库也可以使用 DDL 按需创建和销毁。如果每个客户都拥有不能、不应该或不需要跨客户查询的数据集合的所有权,则这可能适用。 (可以说,这些很少见 - 通常,您至少需要跨客户进行分析或某些东西,但在某些情况下,每个客户“拥有”一个独立的、托管的 wiki、商店或某种 CMS/DMS。)

(我在您的文章中看到评论说您已经决定使用 NoSQL,因此为了完整性起见,请在此处发布此选项。)

As others have suggested NoSQL, I'm going to say that, in my opinion, schemaless databases really is best suited for use-cases with no schema.

From the description, and the schema you came up with, it looks like your case is not in fact "no schema", but rather it seems to be "user-defined schema".

In fact, the schema you came up with looks very similar to the internal meta-schema of a relational database. (You're sort of building a relational database on top of a relational database, which in my experience is not a good idea, as this "meta-database" will have at least twice the overhead and complexity for any basic operation - tables will get very large, which doesn't scale well, and the data will be difficult to query and update, problems will be difficult to debug, and so on.)

For use-cases like that, you probably want DDL: Data Definition Language.

You didn't say which SQL database you're using, but most SQL databases (such as MySQL, PostgreSQL and MS-SQL) support some dialect of DDL extensions to SQL syntax, which let you manipulate the actual schema.

I've done this successfully for use-cases like yours in the past. It works well for cases where the schema rarely changes, and the data volumes are relatively low for each user. (For high volumes or frequent schema updates, you might want schemaless or some other type of NoSQL database.)

You might need some tables on the side for additional field information that doesn't fit in SQL schema - you may want to duplicate some schema information there as well, as this can be difficult or inefficient to read back from actual schema.

Ensuring atomic updates to your field information tables and the schema probably requires transactions, which may not be supported by your database engine - PostgreSQL at least does support transactional schema updates.

You have to be vigilant when it comes to security - you don't want to open yourself up to users creating, storing or deleting things they're not supposed to.

If it suits your use-case, consider using not only separate tables, but separate databases, which can also by created and destroyed on demand using DDL. This could be applicable if each customer has ownership of data collections that can't, shouldn't, or don't need to be queried across customers. (Arguably, these are rare - typically, you want at least analytics or something across customers, but there are cases where each customer "owns" an isolated, hosted wiki, shop or CMS/DMS of some sort.)

(I saw in your comment that you already decided on NoSQL, so just posting this option here for completeness.)

最美不过初阳 2024-12-29 08:28:38

听起来这可能是寻找问题的解决方案。您的域有可能被重构吗?如果没有——还有希望。

  • 选项 2 的可扩展性很大程度上取决于自定义对象的宽度。可以动态创建多少个字段?当每个实体有 100 个字段时,100 万个实体可能会很麻烦...高效的索引可以使性能可以承受。

  • 对于另一种选择 - 您可以拥有一个数据表,其中包含一些字符串字段、一些双精度字段和一些整数字段。例如,包含 String1、String2、String3、Int1、Int2、Int3 的表。第二个表包含定义用户对象并映射“CustomObjectName”=> 的行。 String1,等等。读取 INFORMATION_SCHEMA 的存储过程和一些动态 sql 将能够读取模式表并返回强类型记录集...

  • 另一个选项(对于最新版本的 SQL Server)是存储带有 id 的行,类型名称和包含对象数据的 XML 文档的 XML 字段。在 MS Sql Server 中,可以直接查询,甚至可以根据架构进行验证。

It sounds like this might be a solution in search of a problem. Is there any chance your domain can be refactored? If not - theres still hope.

  • Your scalability for option 2 will depend a lot on the width of the custom objects. How many fields can be created dynamically? 1 million entities when each entity has 100 fields could be a drag... Efficient indexing could make performance bearable.

  • For another option - you could have one data table that has a few string fields, a few double fields, and a few integer fields. For example, a table with String1, String2, String3, Int1, Int2, Int3. A second table with have rows that define a user object and map your "CustomObjectName" => String1, and such. A stored procedure reading INFORMATION_SCHEMA and some dynamic sql would be able to read the schema table and return a strongly typed recordset...

  • Yet another option (for recent versions of SQL Server) would be to store a row with an id, a type name, and an XML field that contains a XML document that contains the object data. In MS Sql Server this can be queried against directly, and maybe even validated against a schema.

森林迷了鹿 2024-12-29 08:28:38

就我个人而言,我会花时间定义尽可能多的属性,而不是对所有事情都使用 EAV。您肯定知道一些属性。那么您只需要 EAv 来处理真正针对客户的事情。

但如果一切都必须是 EAV,那么 nosql 数据库就是最佳选择。或者您可以使用关系数据库来处理某些内容,而使用 nosql 数据库来处理其余内容。

PErsonally I would take the time to define as many attritbutes as you can ratheer than use EAV for everything. Surely you know some of the attributes. Then you only need EAv for the things that are truly client specific.

But if all must be EAV, then a nosql databse is the way to go. Or you can use a relationsla datbase for some stuff and a nosql database for the rest.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文