在应用程序中推送数据更改与拉取数据更改

发布于 2024-09-29 17:19:54 字数 1168 浏览 0 评论 0原文

假设您有一个由两层组成的应用程序:

  • A:数据层,存储从数据库或文件加载的所有数据
  • B:在漂亮的用户界面(例如图形报告)中显示数据的层

现在,数据 层A 层发生了变化。我们有 2 种方法来确保 B 层的报告得到正确更新。

第一种方法是 PUSH 方法。 A 层通过观察者通知 B 层,以便 B 层可以更新其报告。

PUSH 方法有几个缺点:

  • 如果数据多次更改(例如在加载期间或在更改大量数据的算法中),则观察者会执行多次。这可以通过引入一种缓冲来解决(防止在仍在更改时调用观察者),但这可能非常棘手,并且经常会忘记进行正确的缓冲调用。
  • 如果大量数据发生更改,观察者调用可能会导致应用程序无法接受的开销。

另一种方法是 PULL 方法。 A 层只记住哪些数据被更改,并且不发送任何通知(A 层被标记为脏)。在用户执行操作(可能是运行算法或加载文件或其他内容)之后,我们检查所有用户界面组件,并要求它们自行更新。 在这种情况下,B 层被要求更新自身。首先,它会检查其底层(A 层)是否有脏污。如果是,它将获取更改并自行更新。如果A层没有脏,报告就知道它没有任何关系。

最佳解决方案取决于具体情况。就我的情况而言,PUSH 方法似乎要好得多。

如果我们有超过 2 层,情况就会变得更加困难。假设我们有以下 4 个层:

  • A:存储从数据库或文件加载的所有数据的数据层
  • B:使用数据层(A 层)的层,例如使用复杂的过滤器过滤来自 A 的数据过滤功能
  • C:使用 B 层的层,例如将 B 层的数据聚合为较小的信息
  • D:解释 C 层结果并以良好的图形方式将其呈现给用户的报告

在本例中,PUSHING这些变化几乎肯定会带来更高的开销。

另一方面,PULLING 更改要求:

  • D 层必须调用 C 层来询问它是否是脏的
  • C 层必须调用 B 层来询问它是否是脏的
  • B 层必须调用 A 层来询问它是否是脏的

如果没有任何更改,那么在您知道实际上没有任何更改并且您不必执行任何操作之前执行的调用量会相当大。看起来我们试图通过不使用 PUSH 来避免的性能开销现在又回到了 PULL 方法中,因为有很多调用询问是否有任何内容是脏的。

是否有模式可以以良好且高性能(低开销)的方式解决此类问题?

Suppose you have an application that consists of two layers:

  • A: A data layer that stores all the data loaded from a database or from a file
  • B: A layer that shows the data in a nice user interface, e.g. a graphical report

Now, data is changed in layer A. We have 2 approaches to make sure that the reports from layer B are correctly updated.

The first approach is the PUSH approach. Layer A notifies layer B via observers so layer B can update its reports.

There are several disadvantages in the PUSH approach:

  • If data is changed multiple times (e.g. during load or in algorithms that change much data) the observers are executed many times. This can be solved by introducing a kind of buffering (prevent calling observers while you are still changing), but this can be very tricky and making the right buffering calls is often forgotten.
  • If much data is changed, the observer calls may cause an overhead that is not acceptible in the application.

The other approach is the PULL approach. Layer A just remembers which data was changed and sends out no notifications (layer A is flagged dirty). After the action that was executed by the user (could be running an algorithm or loading a file or something else), we check all of our user interface components, and ask them to update themselves.
In this case layer B is asked to update itself. First it will check if any of its underlying layers (layer A) is dirty. If it is, it will get the changes and update itself. If layer A was not dirty, the report knew it had nothing to do.

The best solution depends on the situation. In my situation, the PUSH approach seems much better.

The situation becomes much more difficult if we have more than 2 layers. Suppose we have the following 4 layers:

  • A: A data layer that stores all the data loaded from a database or from a file
  • B: A layer that uses the data layer (layer A), e.g. to filter the data from A using a complex filter function
  • C: A layer that uses layer B, e.g. to aggregate data from layer B into smaller pieces of information
  • D: A report that interprets the results of layer C and presents it in a nice graphical way to the user

In this case, PUSHING the changes will almost certainly introduce a much higher overhead.

On the other hand, PULLING the changes requires that:

  • layer D has to call layer C to ask if it is dirty
  • layer C has to call layer B to ask if it is dirty
  • layer B has to call layer A to ask if it is dirty

If nothing has been changed the amount of calls to execute before you know that actually nothing has been changed and you don't have to do anything is rather big. It seems like the performance overhead that we try to avoid by not using the PUSH, is now coming back to use in the PULL approach because of the many calls to ask if anything is dirty.

Are there patterns that solve this kind of problem in a nice and high-performance (low overhead) way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

三岁铭 2024-10-06 17:19:54

不,没有免费的午餐,也没有灵丹妙药。这一切都取决于精心设计。您几乎已经涵盖了常用技术,它正在巧妙地应用它们,这需要小心并避免假设。

我对你的两个说法提出疑问:

你暗示PUSH通知的控制过于困难。我本以为在很多情况下你倾向于拥有一个主计算引擎,它可以获取数据并进行计算。引擎肯定会在某个时刻停止,此时它可以发送“新数据就绪”事件,该事件可以包含有关更改内容的更细粒度的信息。

你说进行4次层间调用太昂贵了。这样做的依据是什么?与什么相比?如果您担心乘数因子(10 个 D 实例)调用(5 个 C 实例)调用(2 个 B 实例)调用(1 个 A 实例),因此 A 受到 100 次调用,那么我们肯定会进行优化吗?每个级别都可以说“如果我当前正在呼叫或我最近听到了答案,则无需再次呼叫”。

当我们考虑层的扩展优势时,一些廉价的查询可能并不过分。

No. No free lunch, no silver bullet. It's all down to careful design. You've pretty much covered common techniques it's applying them cleverly which needs care and avoidance of assumptions.

I query two of your statements:

You imply that the controlling of PUSH notifications is unduly difficult. I would have expected that in many cases you tend to have a master computation engine, which grabs data and does calculations. The engine must surely stop at some point, and at that point it can send the "New Data Ready" event, which can contain finer-grained information about what's changed.

You say that make 4 inter-layer calls is too expensive. What's the basis for that? compared with what? If youa re concerned by the mutiplier factor (10 D instances) call ( 5 C instances ) call (2 B instances) call (1 A instance) so A gets hit with 100 calls, then surely we optimise? Each level can say "If I'm currently calling or I heard the answer recently, no need to call again".

When we consider the scaling benefits of the layers a few cheap queries may not be excessive.

月寒剑心 2024-10-06 17:19:54

通过数据管理器推送,并压缩在 n 纳秒内发生的更改。
数据管理器实现发布-订阅。

这意味着数据生产者仅依赖于数据管理器,而数据消费者仅获取数据。

(对于消费者来说,存在依赖性逆转。)

这使得所有数据流管道在您的粘合代码中变得明确。
订阅可以提前设置,因此消费者不需要知道它是如何工作的。

数据管理器可以使用它自己的线程来调用订阅者通知,这将生产者与消费者巧妙地解耦。
您可以轻松压缩更改,因为数据管理器仅使用 1 个线程来通知,它可以通过计时器“通知”,并且当它醒来时,它只能看到最新的状态。

Push via a data manager, and compress changes that occur in less than n nanoseconds.
Data manager implements publish-subscribe.

This mean data producers only depend on the data manager, and data consumers only get data.

(there's a dependency reversal for consumers.)

This makes all the dataflow plumbing explicit in your glue code.
The subscriptions can be set up ahead of time, so the conusmers don't need to know how that works.

The data manager can use it's own thread to call subscriber notifications, this decouples producers from consumers neatly.
You can compress changes easily because the data manager uses only 1 thread to notify with, it can be "notified" via a timer, and when it wakes up, it only sees the latest state.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文