在这种情况下可以使用NoSql进行报告吗?
情况
我正在考虑构建一个基于 NoSQL 的应用程序,作为现有基于 Excel 的财务风险管理报告工具的替代方案。简而言之,我的问题围绕使用 NoSQL 的适用性,考虑以下因素
- 主要源数据(csv 文件)来自另一个应用程序,实际上是当前交易的报告以及基于市场变动的相关估值计算。这是一个固定的来源,不会改变。报告行计数的范围可以从区区 1,5k 行到超过 65k 行。数据量并不是很大,但增长速度相当线性。还有其他几个支持数据源。
- 报告格式相当一致,但报告内容可以是动态的。即,大多数报告允许企业根据业务需求决定他们希望看到哪些附加柱状数据。
- 目前的报告涉及对上述报告的拼接和切割;在这种情况下,请考虑枢轴、图表、聚合、附加计算等。这里有一些我不太了解的复杂内容。
- 这不是一个事务系统,而是一个风险管理系统,因此使用的源数据存在假设和预期的时间延迟。它将主要是大量阅读。
- 报告通常仅与当天(最重要)相关,并且需要为源数据(在 #1 中列出)中的每次更改维护先前运行的历史记录,以进行进一步分析。
- 这不是一个简单的应用程序,但我的感觉是 Excel 的扩展性不够好且不够快(六个月前这是梦想成真,而且确实如此)。有太多隐藏的业务规则是少数人知道的,并且通过这个练习/重写将迫使所有这些表面发生。从业务和开发的角度来看,我们有太多的总线因素。
- 该解决方案总体上需要满足动态报告或更确切地说是数据的动态呈现。与 Excel 相比,我认为速度并不是真正的问题(我假设我的解决方案会更快) - 但是如果要使用真正的动态查询,它们需要在合理的时间内完成(<1 分钟) 。
为什么我考虑使用 NoSQL?
首先,对于 NoSQL,我完全是个菜鸟,所以我目前的理解可能还不够成熟。我对 NoSQL 进行了一些修改和尝试,但还没有达到我目前正在考虑的规模。
我考虑 NoSQL 的主要原因是源数据。虽然实际格式(csv 文件)无关紧要,但动态列方面数据的动态性质可能会导致基于 SQL 的方法受到严格限制且不灵活,因为表结构相当静态。然而,NoSQL 文档能够处理这个问题。
第二个原因是数据格式的变化需要每天动态地满足。使用基于 SQL 的解决方案迫使我们遵守企业级变更管理流程(针对 SQL 数据库的变更),这是费力且繁琐的。所以我想,我的目标是在我的应用程序和解决方案中拥有足够的灵活性,以绕过所有的官僚机构。 (如果您打算评论企业变革管理的奇迹和好处,请不要!)
最后一个原因,有点自私,我想尝试一些不同的东西。
我完全承认我没有详细考虑过这个问题,因此我提出这个问题的原因是因为我知道我遗漏了一些需要考虑的非常相关的方面。如果基于 SQL 的解决方案更合适,您能否根据列出的 6 点进行详细说明。
目前,这仍处于一个非常探索的阶段——在我考虑提出这种类型的解决方案之前,我需要先把所有事情都安排好。
The Situation
I am considering building a NoSQL-based application as an alternative to an existing Excel based financial risk management reporting tool. In short, my question revolves around the suitability of using NoSQL considering the following
- The main source data(csv files) comes from another application and are actually reports of current transactions and associated valuation calculations based on market movements. This is a fixed source and will not change. Reports row counts can range from a meagre 1,5k rows to over 65k rows. Not really massive amounts of data but this is on a fairly linear rate of increase. There are several other supporting data sources.
- The report formats are fairly consistent, however the report content can be dynamic. i.e. most reports allows for business to decide what additional columnar data they would like see based on the business requirement.
- Reporting as it occurs at the moment involves splicing and dicing the above reports; in this case think pivots, graphs, aggregations, additional calculations etc. There some complex stuff here which I don't know much about.
- This is not a transactional system but rather a risk management system, so there is an assumed and expected time delay with the source data being used. It will primarily be read-heavy.
- Reporting typically is only relevant for the current day (most important) and a history of previous runs needs to be maintained for every change in the source data (listed in #1) for further analysis.
- This is no simple application, but my feeling is that Excel is not scaling well enough and fast enough (six months ago this was the dream come true and it was). There are too many hidden business rules that are known to a few and going through this exercise/rewrite will force all of this surface. We have too many bus-factors from a business and development perspective.
- The solution overall needs to cater for dynamic reporting or rather dynamic presentation of the data. When compared to Excel, I think that speed is not really an issue (I'm assuming my solution will be faster) - however if truly dynamic queries were to be used, they need to complete in a reasonable time (<1 minute).
Why I considered using NoSQL?
Firstly, I'm a complete noob when it comes to NoSQL so my current understanding may be under-developed.I have tinkered and played around with NoSQL a bit but nothing to the scale of what I'm currently considering.
The main reason I considered NoSQL was due the source data. While the actual format(csv files) is irrelevant, the dynamic nature of the data in terms of dynamic columns may me think the a SQL-base approach would be severely restricted and inflexible since table structures are pretty static. NoSQL documents however would be able handle this.
The second reason, is that changes to data formats need to catered for on the fly, on a day-to-day basis. Using a SQL based solution, forces us to conform to enterprise level change management processes (for changes to a SQL database) which are laborious and painstakingly cumbersome. So I guess, my objective here is to have enough flexilbilty in my application and solution to bypass the bureaucracy of it all. (If you intend commenting about the wonders and benefits of enterprise change management, don't!)
The last reason, and somewhat selfish, I want to try something different.
I fully concede that I have not thought about this in full detail, thus the reason for my question since I know I am missing some very relevant aspects for consideration. If a SQL based solution is more appropriate, can you elaborate based on the 6 listed points.
Right now, this is still in a very exploratory phase - I need to get all my ducks in a row before I even considered proposing this type of solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
关键问题是如何定义报告。
如果报告都是自定义代码,并且您可以合理地设置新的自定义索引或映射归约查询来获取报告的简单数据表,那么使用 NoSQL 可能是有意义的。
如果您需要由最终用户定义或配置报告,那么除了 Excel 或基于 SQL 的报告工具之外,您确实没有其他合理的选择。
您还需要考虑如何使用动态列 - 无模式存储对于只需要在找到记录后显示的列非常有效,但对于查询则不太有效。使用 SQL,所有列都是可查询的。许多 NoSQL 系统通过知道大多数列永远不会包含在查询中来提高性能。
The key question is how the reports will be defined.
If reports are all custom code and you can reasonably set up a new custom index or map reduce query to get a simple table of data for the report then it may make sense to use NoSQL.
If you need reports to be defined or configured by end users you really have no reasonable option other than excel or a SQL based reporting tool.
You also need to consider how the dynamic columns will be used - schemaless stores work well for columns that only need to be displayed after you find a record, but not so well for queries. With SQL, all columns are queryable. A lot of NoSQL systems get their performance improvements by knowing that most columns will never be included in a query.