动态数据模型

发布于 2024-08-17 05:17:29 字数 2514 浏览 4 评论 0原文

我有一个项目,需要在运行时为特定对象提供用户定义的属性(在此示例中假设为人员对象)。该项目将有许多不同的用户(1000 多个),每个用户都为自己的“Person”对象集定义自己独特的属性。

(例如,用户 #1 将具有一组定义的属性,这些属性将应用于该用户“拥有”的所有人员对象。将其乘以 1000 个用户,这就是该应用程序将使用的最低用户数。)这些属性将用于查询人员对象并返回结果。

我认为这些是我可以使用的可能方法。我将使用 C#(以及任何版本的 .NET 3.5 或 4),并且可以自由支配数据存储区的用途。 (我有 mysql 和 mssql 可用,尽管可以自由使用任何软件,只要它符合要求)

我是否错过了任何东西,或者在我的评估中做出了任何不正确的假设?

在这些选择中,您会选择什么解决方案?

  1. 混合 EAV 对象模型。 (使用普通关系模型定义数据库,并为Person表创建一个“属性包”表)。

    缺点:每个查询有很多连接。表现不佳。可以达到查询中使用的联接/表数量的限制。

    我已经制作了一个快速示例,它具有 Subsonic 2.x 'esqe 界面:

    Select().From().Where ... 等
    

    它会生成正确的连接,然后在 C# 中过滤+透视返回的数据,以返回配置有正确类型的数据集的数据表。

    我还没有对该解决方案进行负载测试。它基于 Microsoft 白皮书中的 EA 建议: SQL Server 2008 RTM 最佳文档用于性能和可扩展性的语义数据建模实践

  2. 允许用户在运行时动态创建/更改对象的表。我相信 NHibernate 在使用动态属性时在后台执行此解决方案,如其中所述

    http://bartreyserhove.blogspot.com /2008/02/dynamic-domain-mode-using-nhibernate.html

    缺点:

    随着系统的增长,定义的列数会变得非常大,并可能达到最大列数。如果有 1000 个用户,每个用户的“Person”对象都有 10 个不同的属性,那么我们需要一个包含 10k 列的表。在这种情况下不可扩展。

    我想我可以允许每个用户使用一个人员属性表,但如果有 1000 个用户启动,那就是 1000 个表加上应用程序中的其他 10 个表。

    我不确定这是否可以扩展 - 但看起来并非如此。如果我不正确,请纠正我!

  3. 使用 NoSQL 数据存储,例如 CouchDb / MongoDb

    据我所知,这些尚未在基于字符串的大型应用程序中得到验证,并且还处于开发阶段的早期阶段。如果我的评估不正确,有人可以告诉我吗?

    http://www.eflorenzano.com/blog/post/ Why-couchdb-sucks/

  4. 使用people表中的XML列来存储属性

    缺点 - 查询时没有索引,因此需要检索和查询每一列才能返回结果集,从而导致查询性能较差。

  5. 将对象图序列化到数据库。

    缺点 - 查询时没有索引,因此需要检索和查询每一列才能返回结果集,从而导致查询性能较差。

  6. berkelyDB 的 C# 绑定

    从我在这里读到的内容:http://www.dinosaurtech。 com/2009/berkeley-db-c-bindings/

    <块引用>

    Berkeley Db 确实被证明是有用的,但正如 Robert 指出的那样 – 没有简单的界面。您的整个 wOO 包装器必须手动编码,并且所有索引都是手动维护的。它比 SQL / linq-to-sql 困难得多,但这就是您为荒谬的速度付出的代价。

    看起来开销很大 - 但是如果有人可以提供有关如何在 C# 中维护索引的教程的链接 - 它可能会很受欢迎。

  7. SQL/RDF 混合。 奇怪的是我之前没有想到这一点。与选项 1 类似,但不是“属性包”表,只是 XREF 到 RDF 存储? 查询涉及 2 个步骤 - 查询 RDF 存储以查找符合正确属性的人员,返回人员对象,并在 SQL 查询中使用这些人员对象的 ID 返回关系数据。额外的开销,但可能是一个常见问题。

I have a project that requires user-defined attributes for a particular object at runtime (Lets say a person object in this example). The project will have many different users (1000 +), each defining their own unique attributes for their own sets of 'Person' objects.

(Eg - user #1 will have a set of defined attributes, which will apply to all person objects 'owned' by this user. Mutliply this by 1000 users, and that's the bottom line minimum number of users the app will work with.) These attributes will be used to query the people object and return results.

I think these are the possible approaches I can use. I will be using C# (and any version of .NET 3.5 or 4), and have a free reign re: what to use for a datastore. (I have mysql and mssql available, although have the freedom to use any software, as long as it will fit the bill)

Have I missed anything, or made any incorrect assumptions in my assessment?

Out of these choices - what solution would you go for?

  1. Hybrid EAV object model. (Define the database using normal relational model, and have a 'property bag' table for the Person table).

    Downsides: many joins per / query. Poor performance. Can hit a limit of the number of joins / tables used in a query.

    I've knocked up a quick sample, that has a Subsonic 2.x 'esqe interface:

    Select().From().Where  ... etc
    

    Which generates the correct joins, then filters + pivots the returned data in c#, to return a datatable configured with the correctly typed data-set.

    I have yet to load test this solution. It's based on the EA advice in this Microsoft whitepaper:
    SQL Server 2008 RTM Documents Best Practices for Semantic Data Modeling for Performance and Scalability

  2. Allow the user to dynamically create / alter the object's table at run-time. This solution is what I believe NHibernate does in the background when using dynamic properties, as discussed where

    http://bartreyserhove.blogspot.com/2008/02/dynamic-domain-mode-using-nhibernate.html

    Downsides:

    As the system grows, the number of columns defined will get very large, and may hit the max number of columns. If there are 1000 users, each with 10 distinct attributes for their 'Person' objects, then we'd need a table holding 10k columns. Not scalable in this scenario.

    I guess I could allow a person attribute table per user, but if there are 1000 users to start, that's 1000 tables plus the other 10 odd in the app.

    I'm unsure if this would be scalable - but it doesn't seem so. Someone please correct me if I an incorrect!

  3. Use a NoSQL datastore, such as CouchDb / MongoDb

    From what I have read, these aren't yet proven in large scale apps, based on strings, and are very early in development phase. IF I am incorrect in this assessment, can someone let me know?

    http://www.eflorenzano.com/blog/post/why-couchdb-sucks/

  4. Using XML column in the people table to store attributes

    Drawbacks - no indexing on querying, so every column would need to be retrieved and queried to return a resultset, resulting in poor query performance.

  5. Serializing an object graph to the database.

    Drawbacks - no indexing on querying, so every column would need to be retrieved and queried to return a resultset, resulting in poor query performance.

  6. C# bindings for berkelyDB

    From what I read here: http://www.dinosaurtech.com/2009/berkeley-db-c-bindings/

    Berkeley Db has definitely proven to be useful, but as Robert pointed out – there is no easy interface. Your entire wOO wrapper has to be hand coded, and all of your indices are hand maintained. It is much more difficult than SQL / linq-to-sql, but that’s the price you pay for ridiculous speed.

    Seems a large overhead - however if anyone can provide a link to a tutorial on how to maintain the indices in C# - it could be a goer.

  7. SQL / RDF hybrid.
    Odd I didn't think of this before. Similar to option 1, but instead of an "property bag" table, just XREF to a RDF store?
    Querying would them involve 2 steps - query the RDF store for people hitting the correct attributes, to return the person object(s), and use the ID's for these person object in the SQL query to return the relational data. Extra overhead, but could be a goer.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

墨落画卷 2024-08-24 05:17:29

Windows 上的 ESENT 数据库引擎大量用于此类半结构化数据。一个例子是 Microsoft Exchange,它与您的应用程序一样,拥有数千个用户,每个用户都可以定义自己的一组属性(MAPI 命名属性)。 Exchange 使用稍微修改的 ESENT 版本。

ESENT 具有许多功能,可以支持具有大量元数据需求的应用程序:每个 ESENT 表可以定义大约 32K 列;可以在运行时添加表、索引和列;稀疏列不设置时不占用任何记录空间;模板表可以减少元数据本身使用的空间。大型应用程序通常拥有数千个表/索引。

在这种情况下,您可以为每个用户创建一个表,并在表中创建每个用户的列,从而在您想要查询的任何列上创建索引。这与某些版本的 Exchange 存储数据的方式类似。这种方法的缺点是 ESENT 没有查询引擎,因此您必须在 MakeKey/Seek/MoveNext 调用时手工制作查询。

ESENT 的托管包装器位于:

http://managementesent.codeplex.com/

The ESENT database engine on Windows is used heavily for this kind of semi-structured data. One example is Microsoft Exchange which, like your application, has thousands of users where each user can define their own set of properties (MAPI named properties). Exchange uses a slightly modified version of ESENT.

ESENT has a lot of features that enable applications with large meta-data requirements: each ESENT table can have about ~32K columns defined; tables, indexes and columns can be added at runtime; sparse columns don't take up any record space when not set; and template tables can reduce the space used by the meta-data itself. It is common for large applications to have thousands of tables/indexes.

In this case you can have one table per user and create the per-user columns in the table, creating indexes on any columns that you want to query. That would be similar to the way that some versions of Exchange store their data. The downside of this approach is that ESENT doesn't have a query engine so you will have to hand-craft your queries as MakeKey/Seek/MoveNext calls.

A managed wrapper for ESENT is here:

http://managedesent.codeplex.com/

合约呢 2024-08-24 05:17:29

在 EAV 模型中,您不必有很多联接,因为您只需拥有查询过滤所需的联接即可。对于结果集,将属性条目作为单独的行集返回。
这就是我们在 EAV 实施中所做的事情。

例如,查询可能会返回扩展属性“年龄”> 的人员。 18:

属性表:

1        Age
2        NickName

第一个结果集:

PersonID Name
1        John
2        Mary

第二个结果集:

PersonID PropertyID Value
1        1         24
1        2         'Neo'
2        1         32
2        2         'Pocahontas'

对于第一个结果集,您需要“年龄”扩展属性的内部联接
查询基本的 Person 对象实体部分:

select p.ID, p.Name from Persons p
join PersonExtendedProperties pp
on p.ID = pp.PersonID
where pp.PropertyName = 'Age'
and pp.PropertyValue > 18 -- probably need to convert to integer here

对于第二个结果集,我们将第一个结果集与 PersonExtendedProperties 表进行外连接,以获取其余的扩展属性。这是一个“窄”结果集,我们不旋转 sql 中的属性,因此这里不需要多个联接。

实际上,我们对不同类型使用单独的表,以避免数据类型转换,使扩展属性建立索引并易于查询。

In a EAV model you don't have to have many joins, as you can just have the joins you need for the query filtering. For the resultset, return property entries as a separate rowset.
That is what we are doing in our EAV implementation.

For example, a query might return persons with extended property 'Age' > 18:

Properties table:

1        Age
2        NickName

First resultset:

PersonID Name
1        John
2        Mary

second resultset:

PersonID PropertyID Value
1        1         24
1        2         'Neo'
2        1         32
2        2         'Pocahontas'

For the first resultset, you need an inner join for the 'age' extended property
to query the basic Person object entity part:

select p.ID, p.Name from Persons p
join PersonExtendedProperties pp
on p.ID = pp.PersonID
where pp.PropertyName = 'Age'
and pp.PropertyValue > 18 -- probably need to convert to integer here

For the second resultset, we are making an outer join of the first resultset with PersonExtendedProperties table to get the rest of the extended properties. It's a 'narrow' resultset, we do not pivot the properties in sql, so we don't need multiple joins here.

Actually we use separate tables for different types to avoid data type conversion, to have extended properties indexed and easily queriable.

金兰素衣 2024-08-24 05:17:29

我的建议:

允许将属性标记为可索引。对可索引属性的数量以及每个对象的列有较小的硬性限制。对所有对象中的总列类型有很大的硬性限制。

将索引实现为与主数据表连接的单独表(每个索引一个)(主表具有较大的对象唯一键)。 (然后可以根据需要创建/删除索引表)。

序列化数据,包括索引列,并将索引属性放入其专用索引表的第一类关系列中。使用 JSON 而不是 XML 来节省表空间。实施短列名称策略(或长显示名称和短存储名称策略)以节省空间并提高性能。

使用夸克作为字段标识符(但仅在主引擎中以节省 RAM 并加速某些读取操作 - 在所有情况下都不要依赖夸克指针比较)。

我对你的选择的看法:

1 是可能的。性能显然会低于不存储字段 ID 列的情况。

2 是一个否,一般来说,数据库引擎并不喜欢动态模式更改。但如果您的数据库引擎擅长于此,则可能是肯定的。

3 可能。

4 是的,虽然我会使用 JSON。

5 好像 4 只是优化程度较低?

6 听起来不错;如果乐于尝试新事物并且对可靠性和性能感到满意,但通常会选择更主流的技术,那么就会选择。我还想将协调事务所涉及的引擎数量减少到比这里更少的程度。

编辑:当然,尽管我推荐了一些东西,但这里可能没有通用的正确答案——用您的数据分析各种数据模型和方法,看看什么最适合您的应用程序。

编辑:更改了上次编辑的措辞。

My recommendation:

Allow properties to be marked as indexable. Have a smallish hard limit on number of indexable properties, and on columns per object. Have a large hard limit on total column types in all objects.

Implement indexes as separate tables (one per index) joined with main table of data (main table has large unique key for object). (Index tables can then be created/dropped as required).

Serialize the data, including the index columns, plus put the index propertoes in first class relational columns in their dedicated index tables. Use JSON instead of XML to save space in the table. Enforce short column name policy (or long display name and short stored name policy) to save space and increase performance.

Use quarks for field identifiers (but only in the main engine to save RAM and speed some read operations -- don't rely on quark pointer comparison in all cases).

My thought on your options:

1 is a possible. Performance clearly will be lower than if field ID columns not stored.

2 is a no in general DB engines not all happy about dynamic schema changes. But a possible yes if your DB engine is good at this.

3 Possible.

4 Yes though I'd use JSON.

5 Seems like 4 only less optimized??

6 Sounds good; would go with if happy to try something new and also if happy about reliability and performance but usually would want to go with more mainstream technology. I'd also like to reduce the number of engines involved in coordinating a transaction to less then would be true here.

Edit: But of course though I've recommened something there can be no general right answer here -- profile various data models and approaches with your data to see what runs best for your application.

Edit: Changed last edit wording.

柠檬色的秋千 2024-08-24 05:17:29

假设您对每个用户可以定义的自定义属性数量设置了限制 N;只需向 Person 表添加 N 个额外列即可。然后有一个单独的表,用于存储每个用户的元数据,以描述如何为每个用户解释这些列的内容。读入数据后,与 #1 类似,但无需加入即可拉入自定义属性。

Assuming you an place a limit, N, on how many custom attributes each user can define; just add N extra columns to the Person table. Then have a separate table where you store per-user metadata to describe how to interpret the contents of those columns for each user. Similar to #1 once you've read in the data, but no joins needed to pull in the custom attributes.

源来凯始玺欢你 2024-08-24 05:17:29

对于与您的问题类似的问题,我们使用了“XML Column”方法(您的方法调查中的第四种方法)。但您应该注意,许多数据库 (DBMS) 支持 xml 值的索引。

我建议您为 Person 使用一张表,其中包含一个 xml 列以及其他常见列。换句话说,设计具有所有人员记录共用的列的人员表,并为动态和不同的属性添加单个 xml 列。

我们正在使用Oracle。它支持 xml 类型的索引。支持两种类型的索引:1- XMLIndex,用于索引 xml 中的元素和属性,2- Oracle Text Index,用于在 xml 的文本字段中启用全文搜索。

例如,在 Oracle 中,您可以创建如下索引:

CREATE INDEX index1 ON table_name (XMLCast(XMLQuery ('$p/PurchaseOrder/Reference' 
  PASSING XML_Column AS "p" RETURNING CONTENT) AS VARCHAR2(128)));

并且在选择查询中支持 xml-query:

SELECT count(*) FROM purchaseorder
  WHERE XMLCast(XMLQuery('$p/PurchaseOrder/Reference'
  PASSING OBJECT_VALUE AS "p" RETURNING CONTENT)
  AS INTEGER) = 25;

据我所知,其他数据库(例如 PostgreSQL 和 MS SQL Server(但不包括 mysql))支持 xml 值的此类索引模型。

参见:
http://docs.oracle.com/cd/E11882_01 /appdev.112/e23094/xdb_indexing.htm#CHDEADIH

For a problem similar to your problem, we have used the "XML Column" approach (the fourth one in your survey of methods). But you should note that many databases (DBMS) support index for xml values.

I recommend you to use one table for Person which contains one xml column along with other common columns. In other words, design the Person table with columns that are common for all person records and add a single xml column for dynamic and differing attributes.

We are using Oracle. it supports index for its xml-type. Two types of indices are supported: 1- XMLIndex for indexing elements and attributes within an xml, 2- Oracle Text Index for enabling full-text search in text fields of the xml.

For example, in Oracle you can create an index such as:

CREATE INDEX index1 ON table_name (XMLCast(XMLQuery ('$p/PurchaseOrder/Reference' 
  PASSING XML_Column AS "p" RETURNING CONTENT) AS VARCHAR2(128)));

and xml-query is supported in select queries:

SELECT count(*) FROM purchaseorder
  WHERE XMLCast(XMLQuery('$p/PurchaseOrder/Reference'
  PASSING OBJECT_VALUE AS "p" RETURNING CONTENT)
  AS INTEGER) = 25;

As I know, other databases such as PostgreSQL and MS SQL Server (but not mysql) support such index models for xml value.

see also:
http://docs.oracle.com/cd/E11882_01/appdev.112/e23094/xdb_indexing.htm#CHDEADIH

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文