使用 RDBMS 作为事件源存储

发布于 2024-11-29 12:03:06 字数 689 浏览 1 评论 0原文

如果我使用 RDBMS(例如 SQL Server)来存储事件源数据,架构会是什么样子?

我见过一些抽象意义上的变体,但没有具体的。

例如,假设有一个“产品”实体,对该产品的更改可能以以下形式出现:价格、成本和描述。我很困惑我是否会:

  1. 有一个“ProductEvent”表,其中包含产品的所有字段,其中每个更改都意味着该表中的新记录,加上“谁、什么、哪里、为什么、何时以及如何” “(WWWWWH)酌情。当成本、价格或描述发生变化时,会添加一个全新的行来代表产品。
  2. 将产品成本、价格和描述存储在通过外键关系连接到产品表的单独表中。当这些属性发生更改时,请根据需要使用 WWWWWH 写入新行。
  3. 将 WWWWWH 以及表示事件的序列化对象存储在“ProductEvent”表中,这意味着事件本身必须在我的应用程序代码中加载、反序列化和重播,以便为给定产品重新构建应用程序状态。

我尤其担心上面的选项2。极端情况下,产品表几乎是每个属性一个表,加载给定产品的应用程序状态将需要从每个产品事件表加载该产品的所有事件。我觉得这张桌子爆炸的味道不对。

我确信“这取决于”,虽然没有单一的“正确答案”,但我试图了解什么是可以接受的,什么是完全不可接受的。我还知道 NoSQL 可以在这里提供帮助,其中事件可以存储在聚合根中,这意味着只需向数据库发送一个请求即可获取事件以从中重建对象,但我们并没有使用 NoSQL 数据库所以我正在寻找替代方案。

If I were using an RDBMS (e.g. SQL Server) to store event sourcing data, what might the schema look like?

I've seen a few variations talked about in an abstract sense, but nothing concrete.

For example, say one has a "Product" entity, and changes to that product could come in the form of: Price, Cost and Description. I'm confused about whether I'd:

  1. Have a "ProductEvent" table, that has all the fields for a product, where each change means a new record in that table, plus "who, what, where, why, when and how" (WWWWWH) as appropriate. When cost, price or description are changed, a whole new row as added to represent the Product.
  2. Store product Cost, Price and Description in separate tables joined to the Product table with a foreign key relationship. When changes to those properties occur, write new rows with WWWWWH as appropriate.
  3. Store WWWWWH, plus a serialised object representing the event, in a "ProductEvent" table, meaning the event itself must be loaded, de-serialised and re-played in my application code in order to re-build the application state for a given Product.

Particularly I worry about option 2 above. Taken to the extreme, the product table would be almost one-table-per-property, where to load the Application State for a given product would require loading all events for that product from each product event table. This table-explosion smells wrong to me.

I'm sure "it depends", and while there's no single "correct answer", I'm trying to get a feel for what is acceptable, and what is totally not acceptable. I'm also aware that NoSQL can help here, where events could be stored against an aggregate root, meaning only a single request to the database to get the events to rebuild the object from, but we're not using a NoSQL db at the moment so I'm feeling around for alternatives.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

陈独秀 2024-12-06 12:03:06

事件存储不需要了解事件的特定字段或属性。否则,对模型的每次修改都将导致必须迁移数据库(就像老式的基于状态的持久性一样)。因此我根本不推荐选项 1 和 2。

以下是 Ncqrs 中使用的架构。正如您所看到的,表“Events”将相关数据存储为 CLOB(即 JSON 或 XML)。这对应于您的选项 3(只是没有“ProductEvents”表,因为您只需要一个通用“Events”表。在 Ncqrs 中,到聚合根的映射是通过“EventSources”表进行的,其中每个 EventSource 对应于一个实际的聚合根。)

Table Events:
    Id [uniqueidentifier] NOT NULL,
    TimeStamp [datetime] NOT NULL,

    Name [varchar](max) NOT NULL,
    Version [varchar](max) NOT NULL,

    EventSourceId [uniqueidentifier] NOT NULL,
    Sequence [bigint], 

    Data [nvarchar](max) NOT NULL

Table EventSources:
    Id [uniqueidentifier] NOT NULL, 
    Type [nvarchar](255) NOT NULL, 
    Version [int] NOT NULL

Jonathan Oliver 的 Event Store 实现 的 SQL 持久化机制基本上包含一个名为“Commits”的表,带有 BLOB 字段“Payload”。这与 Ncqrs 中的几乎相同,只是它以二进制格式序列化事件的属性(例如,添加了加密支持)。

Greg Young 建议采用类似的方法,Greg 网站上有大量记录

他的原型“事件”表的架构如下:

Table Events
    AggregateId [Guid],
    Data [Blob],
    SequenceNumber [Long],
    Version [Int]

The event store should not need to know about the specific fields or properties of events. Otherwise every modification of your model would result in having to migrate your database (just as in good old-fashioned state-based persistence). Therefore I wouldn't recommend option 1 and 2 at all.

Below is the schema as used in Ncqrs. As you can see, the table "Events" stores the related data as a CLOB (i.e. JSON or XML). This corresponds to your option 3 (Only that there is no "ProductEvents" table because you only need one generic "Events" table. In Ncqrs the mapping to your Aggregate Roots happens through the "EventSources" table, where each EventSource corresponds to an actual Aggregate Root.)

Table Events:
    Id [uniqueidentifier] NOT NULL,
    TimeStamp [datetime] NOT NULL,

    Name [varchar](max) NOT NULL,
    Version [varchar](max) NOT NULL,

    EventSourceId [uniqueidentifier] NOT NULL,
    Sequence [bigint], 

    Data [nvarchar](max) NOT NULL

Table EventSources:
    Id [uniqueidentifier] NOT NULL, 
    Type [nvarchar](255) NOT NULL, 
    Version [int] NOT NULL

The SQL persistence mechanism of Jonathan Oliver's Event Store implementation consists basically of one table called "Commits" with a BLOB field "Payload". This is pretty much the same as in Ncqrs, only that it serializes the event's properties in binary format (which, for instance, adds encryption support).

Greg Young recommends a similar approach, as extensively documented on Greg's website.

The schema of his prototypical "Events" table reads:

Table Events
    AggregateId [Guid],
    Data [Blob],
    SequenceNumber [Long],
    Version [Int]
关于从前 2024-12-06 12:03:06

GitHub 项目 CQRS.NET 有一些具体示例,说明如何在一些不同的技术。在撰写本文时, 中有一个实现使用 Linq2SQLSQL 架构 搭配使用它,有一个MongoDB,一个用于<一个href="https://github.com/Chinchilla-Software-Com/CQRS/blob/master/Framework/Azure/Cqrs.Azure.DocumentDb/Events/AzureDocumentDbEventStore.cs" rel="noreferrer">DocumentDB (CosmosDB,如果您在 Azure 中)和一个使用 EventStore (如上所述)。 Azure 中还有更多类似表存储和 Blob 存储的功能,它们与平面文件存储非常相似。

我想这里的要点是它们都遵循相同的原则/合同。它们都将信息存储在一个位置/容器/表中,它们使用元数据来识别一个事件与另一个事件,并“仅”按原样存储整个事件 - 在某些情况下,按原样在支持技术中进行序列化。因此,根据您选择文档数据库、关系数据库甚至平面文件,有几种不同的方法可以达到事件存储的相同目的(如果您在任何时候改变主意并发现需要迁移或支持,这会很有用)不止一种存储技术)。

作为该项目的开发人员,我可以分享一些关于我们所做的选择的见解。

首先,我们发现(即使使用唯一的 UUID/GUID 而不是整数)出于战略原因,序列 ID 的出现有很多原因,因此仅拥有一个 ID 对于键来说不够唯一,因此我们将主 ID 键列与数据/对象类型来创建真正的(就您的应用程序而言)唯一的密钥。我知道有些人说你不需要存储它,但这取决于你是新手还是必须与现有系统共存。

出于可维护性的原因,我们坚持使用单个容器/表/集合,但我们确实为每个实体/对象使用了一个单独的表。我们在实践中发现,这意味着应用程序需要“CREATE”权限(一般来说这不是一个好主意......一般来说,总是存在例外/排除)或者每次新的实体/对象出现或部署时,新的实体/对象都会被创建。需要制作存储容器/桌子/收藏品。我们发现这对于本地开发来说非常缓慢,并且对于生产部署来说也是个问题。你可能不这么认为,但这就是我们的真实经历。

另一件要记住的事情是,要求操作 X 发生可能会导致许多不同的事件发生,因此了解命令/事件/任何有用的事件生成的所有事件。它们也可能跨越不同的对象类型,例如在购物车中推动“购买”可能会触发帐户和仓储事件。消费应用程序可能想知道所有这些,因此我们添加了 CorrelationId。这意味着消费者可以请求因其请求而引发的所有事件。您将在 中看到这一点架构

特别是对于 SQL,我们发现如果没有充分利用索引和分区,性能确实会成为瓶颈。请记住,如果您使用快照,则需要以相反的顺序流式传输事件。我们尝试了一些不同的索引,发现在实践中,需要一些额外的索引来调试生产中的实际应用程序。您将再次在 架构

其他生产中的元数据在基于生产的调查期间非常有用,时间戳让我们深入了解事件持续与引发的顺序。这为我们在一个特别严重的事件驱动系统上提供了一些帮助,该系统引发了大量事件,为我们提供了有关网络等事物的性能以及整个网络上的系统分布的信息。

The GitHub project CQRS.NET has a few concrete examples of how you could do EventStores in a few different technologies. At time of writing there is an implementation in SQL using Linq2SQL and a SQL schema to go with it, there's one for MongoDB, one for DocumentDB (CosmosDB if you're in Azure) and one using EventStore (as mentioned above). There's more in Azure like Table Storage and Blob storage which is very similar to flat file storage.

I guess the main point here is that they all conform to the same principal/contract. They all store information in a single place/container/table, they use metadata to identify one event from another and 'just' store the whole event as it was - in some cases serialised, in supporting technologies, as it was. So depending on if you pick a document database, relational database or even flat file, there's several different ways to all reach the same intent of an event store (it's useful if you change you mind at any point and find you need to migrate or support more than one storage technology).

As a developer on the project I can share some insights on some of the choices we made.

Firstly we found (even with unique UUIDs/GUIDs instead of integers) for many reasons sequential IDs occur for strategic reasons, thus just having an ID wasn't unique enough for a key, so we merged our main ID key column with the data/object type to create what should be a truly (in the sense of your application) unique key. I know some people say you don't need to store it, but that will depend on if you are greenfield or having to co-exist with existing systems.

We stuck with a single container/table/collection for maintainability reasons, but we did play around with a separate table per entity/object. We found in practise that meant either the application needed "CREATE" permissions (which generally speaking is not a good idea... generally, there's always exceptions/exclusions) or each time a new entity/object came into existence or was deployed, new storage containers/tables/collections needed to be made. We found this was painfully slow for local development and problematic for production deployments. You may not, but that was our real-world experience.

Another things to remember is that asking action X to happen may result in many different events occurring, thus knowing all the events generated by a command/event/what ever is useful. They may also be across different object types e.g. pushing "buy" in a shopping cart may trigger account and warehousing events to fire. A consuming application may want to know all of this, so we added a CorrelationId. This meant a consumer could ask for all events raised as a result of their request. You'll see that in the schema.

Specifically with SQL, we found that performance really became a bottleneck if indexes and partitions weren't adequately used. Remember events will needs to be streamed in reverse order if you are using snapshots. We tried a few different indexes and found that in practise, some additional indexes were needed for debugging in-production real-world applications. Again you'll see that in the schema.

Other in-production metadata was useful during production based investigations, timestamps gave us insight into the order in which events were persisted vs raised. That gave us some assistance on a particularly heavily event driven system that raised vast quantities of events, giving us information about the performance of things like networks and the systems distribution across the network.

分开我的手 2024-12-06 12:03:06

那么您可能想看看 Datomic。

Datomic 是一个灵活的数据库,基于时间的事实,支持查询和连接,具有弹性可扩展性和 ACID 事务。

我在此处写了一个详细的答案,

您可以观看来自Stuart Halloway 解释 Datomic 的设计此处

由于 Datomic 及时存储事实,因此您可以将其用于事件采购用例等等。

Well you might wanna give a look at Datomic.

Datomic is a database of flexible, time-based facts, supporting queries and joins, with elastic scalability, and ACID transactions.

I wrote a detailed answer here

You can watch a talk from Stuart Halloway explaining the design of Datomic here

Since Datomic stores facts in time, you can use it for event sourcing use cases, and so much more.

若有似无的小暗淡 2024-12-06 12:03:06

我认为随着领域模型的发展,解决方案(1 和 2)很快就会成为一个问题。新的字段被创建,有些字段的含义发生了变化,有些字段可能不再使用。最终你的表将有几十个可为空的字段,加载事件将会很混乱。

另外,请记住,事件存储只能用于写入,您只能查询它来加载事件,而不是聚合的属性。它们是独立的事物(这就是 CQRS 的本质)。

解决方案 3 人们通常会做的事情,有很多方法可以实现。

例如,EventFlow CQRS 与 SQL Server 一起使用时会创建一个具有以下架构的表:

CREATE TABLE [dbo].[EventFlow](
    [GlobalSequenceNumber] [bigint] IDENTITY(1,1) NOT NULL,
    [BatchId] [uniqueidentifier] NOT NULL,
    [AggregateId] [nvarchar](255) NOT NULL,
    [AggregateName] [nvarchar](255) NOT NULL,
    [Data] [nvarchar](max) NOT NULL,
    [Metadata] [nvarchar](max) NOT NULL,
    [AggregateSequenceNumber] [int] NOT NULL,
 CONSTRAINT [PK_EventFlow] PRIMARY KEY CLUSTERED 
(
    [GlobalSequenceNumber] ASC
)

其中:

  • GlobalSequenceNumber :简单的全局识别,可用于在创建投影(读取模型)时排序或识别丢失的事件。
  • BatchId:以原子方式插入的事件组的标识(TBH,不知道为什么这会有用)
  • AggregateId:聚合
  • 数据< 的标识/strong>:序列化事件
  • 元数据:事件中的其他有用信息(例如用于反序列化的事件类型、时间戳、命令的发起者 ID 等)
  • AggregateSequenceNumber:同一聚合中的序列号(如果您不能乱序进行写入,这很有用,因此您可以使用此字段来实现乐观并发)

但是,如果您从头开始创建,我建议遵循 YAGNI 原则,并使用您的用例所需的最少字段。

I think solution (1 & 2) can become a problem very quickly as your domain model evolves. New fields are created, some change meaning, and some can become no longer used. Eventually your table will have dozens of nullable fields, and loading the events will be mess.

Also, remember that the event store should be used only for writes, you only query it to load the events, not the properties of the aggregate. They are separate things (that is the essence of CQRS).

Solution 3 what people usually do, there are many ways to acomplish that.

As example, EventFlow CQRS when used with SQL Server creates a table with this schema:

CREATE TABLE [dbo].[EventFlow](
    [GlobalSequenceNumber] [bigint] IDENTITY(1,1) NOT NULL,
    [BatchId] [uniqueidentifier] NOT NULL,
    [AggregateId] [nvarchar](255) NOT NULL,
    [AggregateName] [nvarchar](255) NOT NULL,
    [Data] [nvarchar](max) NOT NULL,
    [Metadata] [nvarchar](max) NOT NULL,
    [AggregateSequenceNumber] [int] NOT NULL,
 CONSTRAINT [PK_EventFlow] PRIMARY KEY CLUSTERED 
(
    [GlobalSequenceNumber] ASC
)

where:

  • GlobalSequenceNumber: Simple global identification, may be used for ordering or identifying the missing events when you create your projection (readmodel).
  • BatchId: An identification of the group of events that where inserted atomically (TBH, have no idea why this would be usefull)
  • AggregateId: Identification of the aggregate
  • Data: Serialized event
  • Metadata: Other usefull information from event (e.g. event type used for deserialize, timestamp, originator id from command, etc.)
  • AggregateSequenceNumber: Sequence number within the same aggregate (this is usefull if you cannot have writes happening out of order, so you use this field to for optimistic concurrency)

However, if you are creating from scratch I would recomend following the YAGNI principle, and creating with the minimal required fields for your use case.

天暗了我发光 2024-12-06 12:03:06

可能的提示是设计后跟“缓慢变化的维度”(类型 = 2)应该可以帮助您涵盖:

  • 事件发生的顺序(通过代理键)
  • 每个状态的持久性(有效从 - 有效到)

左折叠功能也应该可以实施,但您需要考虑未来查询的复杂性。

Possible hint is design followed by "Slowly Changing Dimension" (type=2) should help you to cover:

  • order of events occurring (via surrogate key)
  • durability of each state (valid from - valid to)

Left fold function should be also okay to implement, but you need to think of future query complexity.

橙味迷妹 2024-12-06 12:03:06

我认为这将是一个迟到的答案,但我想指出,如果您的吞吐量要求不高,那么使用 RDBMS 作为事件源存储是完全可能的。我只想向您展示我构建的事件溯源分类账的示例来进行说明。

https://github.com/andrewkkchan/client-ledger-service
以上是事件溯源分类账 Web 服务。
https://github.com/andrewkkchan/client-ledger-core-db
上面我使用 RDBMS 来计算状态,这样您就可以享受 RDBMS 带来的所有优势,例如事务支持。
https://github.com/andrewkkchan/client-ledger-core-memory
我还有另一个消费者要在内存中处理以处理突发。

有人会说上面的实际事件存储仍然存在于 Kafka 中——因为 RDBMS 的插入速度很慢,尤其是当插入总是附加时。

除了已经为这个问题提供的非常好的理论答案之外,我希望代码可以帮助您提供一个说明。

I reckon this would be a late answer but I would like to point out that using RDBMS as event sourcing storage is totally possible if your throughput requirement is not high. I would just show you examples of an event-sourcing ledger I build to illustrate.

https://github.com/andrewkkchan/client-ledger-service
The above is an event sourcing ledger web service.
https://github.com/andrewkkchan/client-ledger-core-db
And the above I use RDBMS to compute states so you can enjoy all the advantages coming with a RDBMS like transaction support.
https://github.com/andrewkkchan/client-ledger-core-memory
And I have another consumer to be processing in memory to handle bursts.

One would argue the actual event store above still lives in Kafka-- as RDBMS is slow for inserting especially when the inserting is always appending.

I hope the code help give you an illustration apart from the very good theoretical answers already provided for this question.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文