基于网络应用程序中的交易数据以可扩展的方式计算结果？

发布于 2024-11-28 16:27:17 字数 1040 浏览 1 评论 0原文

我使用存储交易的网络应用程序数据（例如“日期 y 的金额 x”，但更复杂）并根据所有相关交易的详细信息提供计算结果[1]。我们投入了大量时间来确保这些计算高效执行，因为它们是应用程序的交互式部分：即用户单击按钮并等待查看结果。我们有信心，对于当前的数据水平，我们可以优化数据库获取和计算，以在可接受的时间内完成。然而，我担心所花费的时间仍然会随着交易数量的增长而线性增长[2]。我想说的是，我们可以处理更多数量级的事务，而不会导致性能过度下降。

我正在寻找有效的技术、技术、模式或算法，以提高基于交易数据的计算的可扩展性。

然而，任何建议都存在真正且重大的限制：

我们目前必须支持两种高度不兼容的数据库实现：MySQL 和 Oracle。因此，例如，使用数据库特定的存储过程的维护成本大约是两倍。
实际的交易比给出的示例交易更复杂，计算中涉及的业务逻辑也很复杂，并且经常变化。因此，将计算直接存储在 SQL 中并不是我们可以轻松维护的。
之前保存的任何交易都可以随时修改（例如，交易日期可以向前或向后移动一年），并且计算预计会立即更新。这对缓存策略产生连锁反应。
用户可以跨大空间、多维度进行查询。为了解释这一点，请考虑能够计算任何给定日期、任何特定交易类型的结果，其中交易由多个任意条件过滤。这使得预先计算用户想要看到的结果变得困难。
我们的应用程序的一个实例托管在客户的公司网络的硬件上。因此，我们不能轻易投入金钱来解决 CPU 和内存方面的问题（即使这些实际上是瓶颈）。

我意识到这是非常开放和普遍的，但是......

对于实现可扩展的解决方案有什么建议吗？

[1] 其中“相关”可以是：查询的日期；交易类型；用户类型；配方选择；等等
[2] 诚然，这比之前的性能有所改进，之前的 ORM 的 n+1 问题所花费的时间要么呈指数增长，要么至少有更陡峭的梯度。

原文

Possible duplicate:
Database design: Calculating the Account Balance

I work with a web app which stores transaction data (e.g. like "amount x on date y", but more complicated) and provides calculation results based on details of all relevant transactions[1]. We are investing a lot of time into ensuring that these calculations perform efficiently, as they are an interactive part of the application: i.e. a user clicks a button and waits to see the result. We are confident, that for the current levels of data, we can optimise the database fetching and calculation to complete in an acceptable amount of time. However, I am concerned that the time taken will still grow linearly as the number of transactions grow[2]. I'd like to be able to say that we could handle an order of magnitude more transactions without excessive performance degradation.

I am looking for effective techniques, technologies, patterns or algorithms which can improve the scalability of calculations based on transaction data.

There are however, real and significant constraints for any suggestion:

We currently have to support two highly incompatible database implementations, MySQL and Oracle. Thus, for example, using database specific stored procedures have roughly twice the maintenance cost.
The actual transactions are more complex than the example transaction given, and the business logic involved in the calculation is complicated, and regularly changing. Thus having the calculations stored directly in SQL are not something we can easily maintain.
Any of the transactions previously saved can be modified at any time (e.g. the date of a transaction can be moved a year forward or back) and calculations are expected to be updated instantly. This has a knock-on effect for caching strategies.
Users can query across a large space, in several dimensions. To explain, consider being able to calculate a result as it would stand at any given date, for any particular transaction type, where transactions are filtered by several arbitrary conditions. This makes it difficult to pre-calculate the results a user would want to see.
One instance of our application is hosted on a client's corporate network, on their hardware. Thus we can't easily throw money at the problem in terms of CPUs and memory (even if those are actually the bottleneck).

I realise this is very open ended and general, however...

Are there any suggestions for achieving a scalable solution?

[1] Where 'relevant' can be: the date queried for; the type of transaction; the type of user; formula selection; etc.
[2] Admittedly, this is an improvement over the previous performance, where an ORM's n+1 problems saw time taken grow either exponentially, or at least a much steeper gradient.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初与友歌 2024-12-05 16:27:17

我曾针对类似的要求进行过工作，并提出了一些建议。这在很大程度上取决于您的数据的可能性。很难让每种情况都能快速实现，但您可以针对常见情况进行优化，并为其他情况提供足够的硬件资源。

总结

我们每天、每周和每月创建总结。对于我们来说，大部分交易发生在当天。旧的交易也可能发生变化。我们保留一个批次，并在其下保存各个交易记录。每个批次都有一个状态来指示是否可以使用事务摘要（在表 batch_summary 中）。如果汇总批次中的旧事务发生更改，则作为此事务的一部分，批次 会被标记以指示汇总不可信。后台作业稍后将重新计算摘要。

然后，我们的软件会在可能的情况下使用摘要，并在没有摘要的情况下回退到各个交易。

我们尝试了 Oracle 的物化视图，但最终还是采用了我们自己的总结流程。

限制要求

您的要求听起来非常广泛。人们可能会倾向于将所有查询字段放在网页上并让用户选择字段和输出结果的任意组合。这使得优化变得非常困难。我建议更深入地了解他们实际需要做什么，或者过去已经做了什么。在非选择性维度上进行查询可能没有意义。

在我们针对某些查询的应用程序中，将日期范围限制为不超过 1 个月。我们已将一些功能与基于日期的摘要保持一致。例如，您可以获得 2011 年 1 月整个时间的结果，但无法获得 2011 年 1 月 5 日至 20 日的结果。

为缓慢操作提供用户界面反馈

有时，我们发现很难将某些内容优化为比几分钟。我们将工作交给后台服务器，而不是加载非常慢的网页。当我们得到答案时，用户可以发出请求并继续处理他们的事务。