CachedRowSet 比 ResultSet 慢?
在我的 java 代码中,我使用 select 语句访问 Oracle 数据库表。 我收到很多行(大约 50.000 行),因此 rs.next() 需要一些时间来处理所有行。
using ResultSet, the processing of all rows (rs.next) takes about 30 secs
我的目标是加快这个过程,所以我更改了代码,现在使用 CachedRowSet
:
using CachedRowSet, the processing of all rows takes about 35 secs
我不明白为什么 CachedRowSet
比正常的 慢ResultSet
,因为 CachedRowSet
会一次性检索所有数据,而 ResultSet
会在每次调用 rs.next
时检索数据。
这是代码的一部分:
try {
stmt = masterCon.prepareStatement(sql);
rs = stmt.executeQuery();
CachedRowSet crset = new CachedRowSetImpl();
crset.populate(rs);
while (rs.next()) {
int countStar = iterRs.getInt("COUNT");
...
}
} finally {
//cleanup
}
In my java code, I access an oracle database table with an select statement.
I receive a lot of rows (about 50.000 rows), so the rs.next()
needs some time to process all of the rows.
using ResultSet, the processing of all rows (rs.next) takes about 30 secs
My goal is to speed up this process, so I changed the code and now using a CachedRowSet
:
using CachedRowSet, the processing of all rows takes about 35 secs
I don't understand why the CachedRowSet
is slower than the normal ResultSet
, because the CachedRowSet
retrieves all data at once, while the ResultSet
retrieves the data every time the rs.next
is called.
Here is a part of the code:
try {
stmt = masterCon.prepareStatement(sql);
rs = stmt.executeQuery();
CachedRowSet crset = new CachedRowSetImpl();
crset.populate(rs);
while (rs.next()) {
int countStar = iterRs.getInt("COUNT");
...
}
} finally {
//cleanup
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
CachedRowSet 将结果缓存在内存中,即您不再需要连接。因此它首先“慢”。
-> http://download.oracle.com /javase/1,5.0/docs/api/javax/sql/rowset/CachedRowSet.html
CachedRowSet caches the results in memory i.e. that you don't need the connection anymore. Therefore it it "slower" in the first place.
-> http://download.oracle.com/javase/1,5.0/docs/api/javax/sql/rowset/CachedRowSet.html
CachedRowSet
与 postgres jdbc 驱动程序耦合存在问题。CachedRowSet
需要知道列的类型,以便它知道要创建哪些 java 对象(天知道它还从数据库中获取了什么!)。
因此,它会进行更多次往返数据库以获取列元数据。
在非常大的数量下,这成为一个真正的问题。
如果数据库位于远程服务器上,由于网络延迟,这也是一个真正的问题。
我们已经使用 CachedRowSet 多年了,刚刚发现了这一点。我们现在实现了自己的 CachedRowSet ,因为我们从来没有使用过它的任何花哨的东西。
我们对所有类型执行 getString 并自行转换,因为这似乎是最快的方法。
这显然不是获取大小的问题,因为 postgres 驱动程序默认获取所有内容。
There is an issue with
CachedRowSet
coupled together with a postgres jdbc driver.CachedRowSet
needs to know the types of the columns so it knows which java objects to create(god knows what else it fetches from DB behind the covers!).
It therefor makes more roundtrips to the DB to fetch column metadata.
In very high volumes this becomes a real problem.
If the DB is on a remote server, this is a real problem as well because of network latency.
We've been using
CachedRowSet
for years and just discovered this. We now implement our ownCachedRowSet
, as we never used any of it's fancy stuff anyway.We do
getString
for all types and convert ourselves as this seems the quickest way.This clearly wasn't an issue with fetch size as postgres driver fetches everything by default.
是什么让您认为每次调用 rs.next() 时
ResultSet
都会检索数据?这取决于具体的实现方式——如果它一次获取一个块,我不会感到惊讶;很可能是相当大的一块。我怀疑您基本上看到了将所有数据复制到 CachedRowSet 中并然后访问所有数据所需的时间 - 基本上您无需进行额外的复制操作目的。
What makes you think that
ResultSet
will retrieve the data each timers.next()
is called? It's up to the implementation exactly how it works - and I wouldn't be surprised if it fetches a chunk at a time; quite possibly a fairly large chunk.I suspect you're basically seeing the time it takes to copy all the data into the
CachedRowSet
and then access it all - basically you've got an extra copying operation for no purpose.使用普通的 ResultSet,您可以通过 RowPrefetch 和 FetchSize 获得更多优化选项。
这些优化了 while 循环中的网络传输块和处理,因此 rs.next() 始终有要处理的数据。
FetchSize 默认设置为 10(Oracle 最新版本),但据我所知 RowPrefetch 未设置。因此意味着网络传输根本没有优化。
Using normal ResultSet you can get more optimization options with RowPrefetch and FetchSize.
Those optimizes the network transport chunks and processing in the while loop, so the rs.next() has always a data to work with.
FetchSize has a default set to 10(Oracle latest versions), but as I know RowPrefetch is not set. Thus means network transport is not optimized at all.
我在使用/不使用
CachedRowSet
和不同fetchSize
的情况下进行了一些性能测试(使用 Exasol DB 和最新的 7.xx JDBC 驱动程序)测试和代码非常简单;
CachedRowSet
ResultSet 填充 Map 第一次尝试(使用
CachedRowSet):
结果:
第二次尝试(填充
HashMap
而不是CachedRowSet
):结果:
所以确实,看起来与直接使用
ResultSet
相比,CachedRowSet
实现的性能较差,但它提供了一种通用方法,允许使用不同的表结构。免责声明:
我只使用了一种类型的数据集(即包含 50K 个简单项目的表格)。
我建议您在决定您的方法之前先根据您的具体用例进行测试。
I've done some performance tests with/without
CachedRowSet
and with differentfetchSize
(using Exasol DB and the latest 7.x.x JDBC driver)My tests and code are very straight forward;
CachedRowSet
ResultSet
First Try (using a
CachedRowSet
):Results:
Second Try (Populating a
HashMap
rather than aCachedRowSet
):Results:
So indeed, looks like the
CachedRowSet
implementation has poor performance comparing to using theResultSet
directly, but it provides a generic approach which allows to use different tables structures.Disclaimer:
I used only one type of data set (i.e. a table of 50K simple items).
I suggest that you test that on your specific use case before deciding on your approach.