类应该支持接口，但这需要以侵入式方式向类添加逻辑。我们能阻止这种情况吗？

发布于 2024-10-01 04:05:20 字数 738 浏览 3 评论 0原文

我有一个 C++ 应用程序，它从数据库加载大量数据，然后对这些数据执行算法（这些算法是 CPU 和数据密集型的，这就是我事先加载所有数据的方式），然后保存所有已存储的数据。改回数据库。

数据库部分与应用程序的其余部分很好地分开。事实上，应用程序不需要知道数据来自哪里。应用程序甚至可以在文件上启动（在这种情况下，单独的文件模块将文件加载到应用程序中，最后将所有数据保存回文件中）。

现在：

数据库层只想将更改的实例保存回数据库（而不是完整的数据），因此它需要知道应用程序更改了哪些内容。
另一方面，应用程序不需要知道数据来自哪里，因此它不希望被迫为其数据的每个实例保留更改状态。

为了使我的应用程序及其数据结构尽可能与加载和保存数据的层（可以是数据库或文件）分开，我不想用有关实例自启动以来是否更改的信息来污染应用程序数据结构或不。

但为了使数据库层尽可能高效，需要一种方法来确定哪些数据已被应用程序更改。

复制所有数据并在保存时比较数据并不是一种选择，因为数据很容易填满数 GB 内存。

向应用程序数据结构添加观察者也不是一个选择，因为应用程序算法内的性能非常重要（循环所有观察者并调用虚拟函数可能会导致算法中的重要性能瓶颈）。

还有其他解决方案吗？或者，如果我不想以侵入性方式向应用程序类添加逻辑，我是否会尝试变得过于“模块化”？在这些情况下，务实是不是更好？

ORM工具如何解决这个问题呢？它们是否还强制应用程序类保持某种更改状态，或者是否强制这些类具有更改观察者？

原文

I have a C++ application that loads lots of data from a database, then executes algorithms on that data (these algorithms are quite CPU- and data-intensive that's way I load all the data before hand), then saves all the data that has been changed back to the database.

The database-part is nicely separate from the rest of the application. In fact, the application does not need to know where the data comes from. The application could even be started on file (in this case a separate file-module loads the files into the application and at the end saves all data back to the files).

Now:

the database layer only wants to save the changed instances back to the database (not the full data), therefore it needs to know what has been changed by the application.
on the other hand, the application doesn't need to know where the data comes from, hence it does not want to feel forced to keep a change-state per instance of its data.

To keep my application and its datastructures as separate as possible from the layer that loads and saves the data (could be database or could be file), I don't want to pollute the application data structures with information about whether instances were changed since startup or not.

But to make the database layer as efficient as possible, it needs a way to determine which data has been changed by the application.

Duplicating all data and comparing the data while saving is not an option since the data could easily fill several GB of memory.

Adding observers to the application data structures is not an option either since performance within the application algorithms is very important (and looping over all observers and calling virtual functions may cause an important performance bottleneck in the algorithms).

Any other solution? Or am I trying to be too 'modular' if I don't want to add logic to my application classes in an intrusive way? Is it better to be pragmatic in these cases?

How do ORM tools solve this problem? Do they also force application classes to keep a kind of change-state, or do they force the classes to have change-observers?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

仙女 2024-10-08 04:05:20

如果您无法复制数据并进行比较，那么显然您需要某种记录来记录已发生的变化。那么问题是如何更新这些记录。

ORM 工具可以（如果他们愿意）通过在对象中保留标志来解决问题，说明数据是否已更改，如果更改了又是什么。听起来好像您正在使原始数据结构可供应用程序使用，而不是具有可以更新标志的整齐封装的变异器的对象。

因此，ORM 通常不需要应用程序详细跟踪更改。应用程序通常必须说明要保存哪个对象，但 ORM 然后会计算出需要将哪些对象持久保存到数据库才能执行此操作，并可能在那里应用优化。

我想这意味着，用您的话来说，ORM 正在以某种松散的方式向数据结构添加观察者。它不是一个外部观察者，它是知道如何改变自身的对象，但当然，记录变化的内容会产生一些开销。

一种选择是为数据结构提供“慢速”变异器，它更新标志，以及“快速”直接访问，以及将对象标记为脏的函数。然后，应用程序可以选择是使用可能较慢的变体（允许其忽略问题），还是使用可能较快的变体（要求它在开始之前（或完成后，可能取决于什么）将对象标记为脏。你对事务和不一致的中间状态所做的事情）。

然后，您将遇到两种基本情况：

我循环遍历一组非常大的对象，有条件地对其中一些对象进行单个更改。为了简化应用程序，请使用“慢”突变器。
我对同一个对象进行了许多不同的更改，并且我非常关心访问器的性能。使用“快速”变异器，它可能直接公开数据中的某些数组。您可以通过了解有关持久性模型的更多信息来获得性能。

计算机科学中只有两个难题：缓存失效和命名事物。

Phil Karlton

If you can't copy the data and compare, then clearly you need some kind of record somewhere of what has changed. The question, then, is how to update those records.

ORM tools can (if they want) solve the problem by keeping flags in the objects, saying whether the data has been changed or not, and if so what. It sounds as though you're making raw data structures available to the application, rather than objects with neatly encapsulated mutators that could update flags.

So an ORM doesn't normally require applications to track changes in any great detail. The application generally has to say which object(s) to save, but the ORM then works out what needs persisting to the DB in order to do that, and might apply optimizations there.

I guess that means that in your terms, the ORM is adding observers to the data structures in some loose sense. It's not an external observer, it's the object knowing how to mutate itself, but of course there's some overhead to recording what has changed.

One option would be to provide "slow" mutators for your data structures, which update flags, and also "fast" direct access, and a function that marks the object dirty. It would then be the application's choice whether to use the potentially-slower mutators that permit it to ignore the issue, or the potentially-faster mutators which require it to mark the object dirty before it starts (or after it finishes, perhaps, depending what you do about transactions and inconsistent intermediate states).

You would then have two basic situations:

I'm looping over a very large set of objects, conditionally making a single change to a few of them. Use the "slow" mutators, for application simplicity.
I'm making lots of different changes to the same object, and I really care about the performance of the accessors. Use the "fast" mutators, which perhaps directly expose some array in the data. You gain performance in return for knowing more about the persistence model.

There are only two hard problems in Computer Science: cache invalidation and naming things.

Phil Karlton

回复收藏 0 原文

~没有更多了~