postgresql 查询的大型结果集
我正在对 postgresql 数据库中的表运行查询。 数据库位于远程计算机上。 该表有大约 30 个使用 postgresql 分区功能的子表。
该查询将返回一个大型结果集,大约有 180 万行。
在我的代码中,我使用 spring jdbc 支持,方法 JdbcTemplate.query,但是我的 RowCallbackHandler 未被调用。
我最好的猜测是 postgresql jdbc 驱动程序(我使用版本 8.3-603.jdbc4)在调用我的代码之前将结果累积在内存中。 我认为 fetchSize 配置 可以控制这个,但我尝试了,没有任何变化。 我按照 postgresql 手册推荐这样做。
当我使用 Oracle XE 时,该查询运行良好。 但我正在尝试迁移到 postgresql,因为 Oracle XE 中不提供分区功能。
我的环境:
- Postgresql 8.3
- Windows Server 2008 Enterprise 64位
- JRE 1.6 64位
- Spring 2.5.6
- Postgresql JDBC驱动程序8.3-603
I'm running a query against a table in a postgresql database. The database is on a remote machine. The table has around 30 sub-tables using postgresql partitioning capability.
The query will return a large result set, something around 1.8 million rows.
In my code I use spring jdbc support, method JdbcTemplate.query, but my RowCallbackHandler is not being called.
My best guess is that the postgresql jdbc driver (I use version 8.3-603.jdbc4) is accumulating the result in memory before calling my code. I thought the fetchSize configuration could control this, but I tried it and nothing changes. I did this as postgresql manual recomended.
This query worked fine when I used Oracle XE. But I'm trying to migrate to postgresql because of the partitioning feature, which is not available in Oracle XE.
My environment:
- Postgresql 8.3
- Windows Server 2008 Enterprise 64-bit
- JRE 1.6 64-bit
- Spring 2.5.6
- Postgresql JDBC Driver 8.3-603
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
为了使用游标检索数据,除了设置获取大小之外,还必须将 ResultSet 类型设置为 ResultSet.TYPE_FORWARD_ONLY(默认值),并将自动提交设置为 false。 您链接到的文档中引用了这一点,但您没有明确提及您执行了这些步骤。
请小心 PostgreSQL 的分区方案。 它确实对优化器做了非常可怕的事情,并且可能会导致不应该出现的大量性能问题(取决于数据的具体情况)。 无论如何,你的行只有 1.8M 行吗? 考虑到它已建立适当的索引,没有理由需要仅根据大小对其进行分区。
In order to use a cursor to retrieve data you have to set the ResultSet type of ResultSet.TYPE_FORWARD_ONLY (the default) and autocommit to false in addition to setting a fetch size. That is referenced in the doc you linked to but you didn't explicitly mention that you did those steps.
Be careful with PostgreSQL's partitioning scheme. It really does very horrible things with the optimizer and can cause massive performance issues where there should not be (depending on specifics of your data). In any case, is your row only 1.8M rows? There is no reason that it would need to be partitioned based on size alone given that it is appropriately indexed.
我敢打赌,您的应用程序没有一个客户端同时需要 180 万行。 您应该想出一种明智的方法将结果分成更小的部分,并为用户提供迭代它们的机会。
这就是谷歌所做的。 当您进行搜索时,可能会出现数百万次点击,但它们一次会返回 25 页,以为您会在第一页中找到所需内容。
如果它不是客户端,并且结果正在以某种方式进行处理,我建议让数据库处理所有这些行并简单地返回结果。 仅仅为了在中间层进行计算而返回 180 万行是没有意义的。
如果这两者都不适用,那么你就遇到了真正的问题。 是时候重新考虑一下了。
在阅读了后面的回复后,我觉得这更像是一个报告解决方案,应该批量处理或实时计算并存储在不属于交易系统的表中。 将 180 万行引入中间层来计算移动平均值是不可能扩展的。
我建议重新定位自己 - 开始将其视为报告解决方案。
I'm betting that there's not a single client of your app that needs 1.8M rows all at the same time. You should think of a sensible way to chunk the results into smaller pieces and give users the chance to iterate through them.
That's what Google does. When you do a search there might be millions of hits, but they return 25 pages at a time with the idea that you'll find what you want in the first page.
If it's not a client, and the results are being massaged in some way, I'd recommend letting the database crunch all those rows and simply return the result. It makes no sense to return 1.8M rows just to do a calculation on the middle tier.
If neither of those apply, you've got a real problem. Time to rethink it.
After reading the later responses it sounds to me like this is more of a reporting solution that ought to be crunched in batch or calculated in real time and stored in tables that are not part of your transactional system. There's no way that bringing 1.8M rows to the middle tier for calculating moving averages can scale.
I'd recommend reorienting yourself - start thinking about it as a reporting solution.
fetchSize 属性的工作方式如 postgres 手册中所述。
我的错误是我将 auto commit = false 设置为连接池中的连接,该连接不是准备好的语句所使用的连接。
感谢您的所有反馈。
The fetchSize property worked as described at postgres manual.
My mistake was that I was setting auto commit = false to a connection from a connection pool that was not the connection being used by the prepared statement.
Thanks for all the feedback.
我完成了上述所有操作,但我需要最后一步:确保调用包装在事务中并将事务设置为只读,以便不需要回滚状态。
我添加了这个:
@Transactional(readOnly = true)
干杯。
I did everything above, but I needed one last piece: be sure the call is wrapped in a transaction and set the transaction to read only, so that no rollback state is required.
I added this:
@Transactional(readOnly = true)
Cheers.