返回大量数据时,数据库存储过程是否会出现性能不佳的情况?
在 VoltDB 禁止事项
http://community.voltdb.com/DosAndDonts
下,他们指出
不要创建返回大量数据的查询(例如 SELECT * FROM FOO,没有任何限制)特别是对于多分区 交易。存储返回的数据要保守 程序。
这是我的用例之一。存储过程的某些方面是否使其不适合此类查询,或者是否是 VoltDB 特有的?在这种情况下,性能是否会下降到比 Postgres 等传统 RDBMS 更差的水平?
编辑:我的查询不完全是 select * from foo 但我需要选择特定日期范围内的所有金融交易,这可能超过 1 亿行
Under VoltDB Don'ts
http://community.voltdb.com/DosAndDonts
They state
Don't create queries that return large volumes of data (such as SELECT
* FROM FOO with no constraints) especially for multi-partition
transactions. Be conservative in the data returned by stored
procedures.
This is one of the use cases that I have. Is there some aspect of stored procedures that make them unsuitable for this type of query or is it something specific to VoltDB? Under this scenario, would performance degrade to a level that would be worse than a traditional RDBMS such as Postgres?
Edit: My query is not quite a select * from foo but I will need to select all financial transactions between certain date ranges and this could exceed 100m rows
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
所有数据库都需要支付物化和 I/O 成本才能将大型结果集传输回用户。
不过,我可以具体谈谈 VoltDB。
在VoltDB中,存储过程都是事务。即使选择数据库大部分的结果集也与其他并发过程完全隔离。该结果集中的元组需要在内部暂时缓冲(例如用于跨分区排序或限制),然后返回给用户。
需要对结果保持完全隔离(可能需要许多毫秒(或秒)的 I/O 才能返回给用户),以及在多分区过程的协调节点上发生的聚合,两者结合限制了最大结果集大小。
我怀疑未来的版本将解决此限制 - 许多人的数据访问要求与您所描述的类似。
All databases need to pay the materialization and i/o costs to transfer a large result set back to a user.
However, I can speak specifically to VoltDB.
In VoltDB, stored procedures are all transactions. Even a result set that selects a large portion of the database is a fully isolated from other concurrent procedures. The tuples in that result set need to be momentarily buffered internally (for example for cross-partition ordering or limiting) and then returned to the user.
The combination of needing to maintain full isolation over a result that can take many milliseconds (or seconds) of I/O to return to the user and the aggregation that happens at the coordinating node of a multi-partition procedure limits the maximum result set size.
I suspect a future release will address this limitation - many people have data-access requirements similar to what you describe.
性能问题是大数据传输的问题。对于所有数据库来说通常都是如此。
当数据库需要返回大量数据时,需要使用大量资源(例如内存、CPU、网络IO),从而降低性能。
仅网络 IO 就可能是一个问题,就好像它是大量数据一样,在数据传输完成之前没有其他任何东西可以通过网络。
The performance issue is that of large data transfers. This is generally true for all databases.
When a database needs to return large amounts of data, it needs to use lots of resources (memory, CPU, network IO for example), degrading performance.
The network IO alone can be an issue, as if it is a significant amount of data, nothing else could go through the network till the data transfer has completed.
如果您正在查询 SELECT * FROM FOO 以便将结果显示到用户界面或类似的用例,则分页是限制返回数据大小和事务执行时间的好方法。 a 中有一个分页示例。
如果您尝试从 VoltDB 中提取数据以导出到另一个数据库或系统,更好的方法是使用 VoltDB 的 导出。
If you are querying SELECT * FROM FOO in order to display the results to a user interface, or a similar use case, then paging is a good approach to limit the size of the data returned and the execution time of the transaction. There is a paging example in a.
If you are trying to extract data from VoltDB to export to another database or system, the better approach is use VoltDB's Export.