基于网络应用程序中的交易数据以可扩展的方式计算结果?
可能重复:
数据库设计:计算帐户余额
我使用存储交易的网络应用程序数据(例如“日期 y 的金额 x”,但更复杂)并根据所有相关交易的详细信息提供计算结果[1]。我们投入了大量时间来确保这些计算高效执行,因为它们是应用程序的交互式部分:即用户单击按钮并等待查看结果。我们有信心,对于当前的数据水平,我们可以优化数据库获取和计算,以在可接受的时间内完成。然而,我担心所花费的时间仍然会随着交易数量的增长而线性增长[2]。我想说的是,我们可以处理更多数量级的事务,而不会导致性能过度下降。
我正在寻找有效的技术、技术、模式或算法,以提高基于交易数据的计算的可扩展性。
然而,任何建议都存在真正且重大的限制:
- 我们目前必须支持两种高度不兼容的数据库实现:MySQL 和 Oracle。因此,例如,使用数据库特定的存储过程的维护成本大约是两倍。
- 实际的交易比给出的示例交易更复杂,计算中涉及的业务逻辑也很复杂,并且经常变化。因此,将计算直接存储在 SQL 中并不是我们可以轻松维护的。
- 之前保存的任何交易都可以随时修改(例如,交易日期可以向前或向后移动一年),并且计算预计会立即更新。这对缓存策略产生连锁反应。
- 用户可以跨大空间、多维度进行查询。为了解释这一点,请考虑能够计算任何给定日期、任何特定交易类型的结果,其中交易由多个任意条件过滤。这使得预先计算用户想要看到的结果变得困难。
- 我们的应用程序的一个实例托管在客户的公司网络的硬件上。因此,我们不能轻易投入金钱来解决 CPU 和内存方面的问题(即使这些实际上是瓶颈)。
我意识到这是非常开放和普遍的,但是......
对于实现可扩展的解决方案有什么建议吗?
[1] 其中“相关”可以是:查询的日期;交易类型;用户类型;配方选择;等等
[2] 诚然,这比之前的性能有所改进,之前的 ORM 的 n+1 问题所花费的时间要么呈指数增长,要么至少有更陡峭的梯度。
Possible duplicate:
Database design: Calculating the Account Balance
I work with a web app which stores transaction data (e.g. like "amount x on date y", but more complicated) and provides calculation results based on details of all relevant transactions[1]. We are investing a lot of time into ensuring that these calculations perform efficiently, as they are an interactive part of the application: i.e. a user clicks a button and waits to see the result. We are confident, that for the current levels of data, we can optimise the database fetching and calculation to complete in an acceptable amount of time. However, I am concerned that the time taken will still grow linearly as the number of transactions grow[2]. I'd like to be able to say that we could handle an order of magnitude more transactions without excessive performance degradation.
I am looking for effective techniques, technologies, patterns or algorithms which can improve the scalability of calculations based on transaction data.
There are however, real and significant constraints for any suggestion:
- We currently have to support two highly incompatible database implementations, MySQL and Oracle. Thus, for example, using database specific stored procedures have roughly twice the maintenance cost.
- The actual transactions are more complex than the example transaction given, and the business logic involved in the calculation is complicated, and regularly changing. Thus having the calculations stored directly in SQL are not something we can easily maintain.
- Any of the transactions previously saved can be modified at any time (e.g. the date of a transaction can be moved a year forward or back) and calculations are expected to be updated instantly. This has a knock-on effect for caching strategies.
- Users can query across a large space, in several dimensions. To explain, consider being able to calculate a result as it would stand at any given date, for any particular transaction type, where transactions are filtered by several arbitrary conditions. This makes it difficult to pre-calculate the results a user would want to see.
- One instance of our application is hosted on a client's corporate network, on their hardware. Thus we can't easily throw money at the problem in terms of CPUs and memory (even if those are actually the bottleneck).
I realise this is very open ended and general, however...
Are there any suggestions for achieving a scalable solution?
[1] Where 'relevant' can be: the date queried for; the type of transaction; the type of user; formula selection; etc.
[2] Admittedly, this is an improvement over the previous performance, where an ORM's n+1 problems saw time taken grow either exponentially, or at least a much steeper gradient.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我曾针对类似的要求进行过工作,并提出了一些建议。这在很大程度上取决于您的数据的可能性。很难让每种情况都能快速实现,但您可以针对常见情况进行优化,并为其他情况提供足够的硬件资源。
总结
我们每天、每周和每月创建总结。对于我们来说,大部分交易发生在当天。旧的交易也可能发生变化。我们保留一个
批次
,并在其下保存各个交易
记录。每个批次都有一个状态来指示是否可以使用事务摘要(在表batch_summary
中)。如果汇总批次中的旧事务发生更改,则作为此事务的一部分,批次
会被标记以指示汇总不可信。后台作业稍后将重新计算摘要。然后,我们的软件会在可能的情况下使用摘要,并在没有摘要的情况下回退到各个交易。
我们尝试了 Oracle 的物化视图,但最终还是采用了我们自己的总结流程。
限制要求
您的要求听起来非常广泛。人们可能会倾向于将所有查询字段放在网页上并让用户选择字段和输出结果的任意组合。这使得优化变得非常困难。我建议更深入地了解他们实际需要做什么,或者过去已经做了什么。在非选择性维度上进行查询可能没有意义。
在我们针对某些查询的应用程序中,将日期范围限制为不超过 1 个月。我们已将一些功能与基于日期的摘要保持一致。例如,您可以获得 2011 年 1 月整个时间的结果,但无法获得 2011 年 1 月 5 日至 20 日的结果。
为缓慢操作提供用户界面反馈
有时,我们发现很难将某些内容优化为比几分钟。我们将工作交给后台服务器,而不是加载非常慢的网页。当我们得到答案时,用户可以发出请求并继续处理他们的事务。
I have worked against similar requirements, and have some suggestions. Alot of this depends on what is possible with your data. It is difficult to make every case imaginable quick, but you can optimize for the common cases and have enough hardware grunt available for the others.
Summarise
We create summaries on a daily, weekly and monthly basis. For us, most of the transactions happen in the current day. Old transactions can also change. We keep a
batch
and under this the individualtransaction
records. Each batch has a status to indicate if the transaction summary (in tablebatch_summary
) can be used. If an old transaction in a summarised batch changes, as part of this transaction thebatch
is flagged to indicate that the summary is not to be trusted. A background job will re-calculate the summary later.Our software then uses the summary when possible and falls back to the individual transactions where there is no summary.
We played around with Oracle's materialized views, but ended up rolling our own summary process.
Limit the Requirements
Your requirements sound very wide. There can be a temptation to put all the query fields on a web page and let the users choose any combination of fields and output results. This makes it very difficult to optimize. I would suggest digging deeper into what they actually need to do, or have done in the past. It may not make sense to query on very unselective dimensions.
In our application for certain queries it is to limit the date range to not more than 1 month. We have in aligned some features to the date-based summaries. e.g. you can get results for the whole of Jan 2011, but not 5-20 Jan 2011.
Provide User Interface Feedback for Slow Operations
On occasions we have found it difficult to optimize some things to be shorter than a few minutes. We shirt a job off to a background server rather than have a very slow loading web page. The user can fire off a request and go about their business while we get the answer.
我建议使用物化视图。物化视图允许您像存储表一样存储视图。因此,您需要完成的所有复杂查询都是在用户查询之前预先计算的。
当然,棘手的部分是当物化视图所基于的表发生变化时更新物化视图。这里有一篇关于它的好文章: Update Materialized view when urderlying table改变。
如果没有 MySQL 中的插件,物化视图(目前还)不可用,并且实现起来非常复杂。但是,由于您有 Oracle,我建议您查看上面的 链接 了解如何在 Oracle 中添加物化视图。
I would suggest using Materialized Views. Materialized Views allow you to store a View as you would a table. Thus all of the complex queries you need to have done are pre-calculated before the user queries them.
The tricky part is of course updating the Materialized View when the tables it is based on change. There's a nice article about it here: Update materialized view when urderlying tables change.
Materialized Views are not (yet) available without plugins in MySQL and are horribly complicated to implement otherwise. However, since you have Oracle I would suggest checking out the link above for how to add a Materialized View in Oracle.