Oracle CLOB 性能
我正在使用 JDBC 对 Oracle 10g 运行查询(使用最新的驱动程序和 UCP 作为数据源),以检索 CLOB(平均 20k 个字符)。然而性能似乎相当糟糕:批量检索 100 个 LOB 平均需要 4 秒。从我的观察来看,该操作既不是 I/O,也不是 CPU,也不是网络限制。
我的测试设置如下所示:
PoolDataSource dataSource = PoolDataSourceFactory.getPoolDataSource();
dataSource.setConnectionFactoryClassName("...");
dataSource.setConnectionPoolName("...");
dataSource.setURL("...");
dataSource.setUser("...");
dataSource.setPassword("...");
dataSource.setConnectionProperty("defaultRowPrefetch", "1000");
dataSource.setConnectionProperty("defaultLobPrefetchSize", "500000");
final LobHandler handler = new OracleLobHandler();
JdbcTemplate j = new JdbcTemplate(dataSource);
j.query("SELECT bigClob FROM ...",
new RowCallbackHandler() {
public void processRow(final ResultSet rs) throws SQLException {
String result = handler.getClobAsString(rs, "bigClob");
}
});
}
我尝试了获取大小,但没有成功。我做错了什么吗?使用 JDBC 时有没有办法加快 CLOB 检索速度?
I am running queries against an Oracle 10g with JDBC (using the latest drivers and UCP as DataSource) in order to retrieve CLOBs (avg. 20k characters). However the performance seems to be pretty bad: the batch retrieval of 100 LOBs takes 4s in average. The operation is also neither I/O nor CPU nor network bound judging from my observations.
My test setup looks like this:
PoolDataSource dataSource = PoolDataSourceFactory.getPoolDataSource();
dataSource.setConnectionFactoryClassName("...");
dataSource.setConnectionPoolName("...");
dataSource.setURL("...");
dataSource.setUser("...");
dataSource.setPassword("...");
dataSource.setConnectionProperty("defaultRowPrefetch", "1000");
dataSource.setConnectionProperty("defaultLobPrefetchSize", "500000");
final LobHandler handler = new OracleLobHandler();
JdbcTemplate j = new JdbcTemplate(dataSource);
j.query("SELECT bigClob FROM ...",
new RowCallbackHandler() {
public void processRow(final ResultSet rs) throws SQLException {
String result = handler.getClobAsString(rs, "bigClob");
}
});
}
I experimented with the fetch sizes but to no avail. Am I doing something wrong? Is there a way to speed up CLOB retrieval when using JDBC?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
查询中是否有 Order By?如果必须排序,10K 行就相当多了。
此外,与检索整个 CLOB 相比,检索 PK 并不是一个公平的测试。 Oracle 将可能有很多行的表行存储在一个块中,但每个 CLOB(如果它们> 4K)将存储在行外,每个都存储在一系列块中。因此,扫描 PK 列表将会很快。另外,PK 上可能有索引,因此 Oracle 可以快速扫描索引块,甚至不需要访问表。
4 秒确实看起来有点高,但是需要能够从磁盘读取并通过网络传输到 Java 程序的大小为 2MB。网络可能是一个问题。如果您执行会话的 SQL 跟踪,它将准确地指出时间花费在哪里(磁盘读取或网络)。
Is there an Order By in the query? 10K rows is quite a lot if it has to be sorted.
Also, retrieving the PK is not a fair test versus retrieving the entire CLOB. Oracle stores the table rows with probably many in a block, but each of the CLOBs (if they are > 4K) will be stored out of line, each in a series of blocks. Scanning the list of PK's is therefore going to be fast. Also, there is probably an index on the PK, so Oracle can just quickly scan the index blocks and not even access the table.
4 seconds does seem a little high, but it is 2MB that needs to be possible read from disk and transported over the network to your Java program. Network could be an issue. If you perform an SQL trace of the session it will point you at exactly where the time is being spent (disk reads or network).
我过去使用oracle LOB类型数据存储大数据的经验并不好。当它低于 4k 时就可以了,因为它像 varchar2 一样存储在本地。一旦超过 4k,您就会开始看到性能下降。也许,自从几年前我上次尝试以来,情况可能有所改善,但以下是我过去发现的内容供您参考:
由于客户端需要通过 Oracle 服务器获取 LOB,您可能会考虑以下有趣的情况。
如果是 oracle 则使用其他数据类型进行缓存
决定缓存它。由于 clob 数据是
一般大,所以可能会推其他
如果出现以下情况,数据
oracle 决定不缓存它,并且
将数据流式传输到客户端。
你还没有遇到过。您将看到您的应用程序是否删除了 lob,并且 Oracle 尝试重用该 lob。我不知道oracle是否支持在线对lob进行磁盘碎片整理(他们有索引,但我们之前尝试过需要很长时间)。
您提到平均 20k 的 100 个 lob 需要 4 秒,因此每个 lob 需要 40 毫秒。请记住,每个 lob 都需要通过单独的 Lob 定位器检索(默认情况下它不在结果集中)。我假设这是每个 lob 的额外往返(我对此不是 100% 确定,因为这是不久前的事)如果是这种情况,我假设按串行顺序每个往返至少有 5 毫秒的额外时间, 正确的?如果是这样,您的性能首先会受到顺序 lob 提取的限制。您应该能够通过跟踪 sql 执行与 lob 内容获取所花费的时间来验证这一点。或者,您可以按照帖子中先前答案的建议排除 lob 列来验证这一点,这应该告诉您它是否与 lob 相关。
祝你好运
My past experience of using oracle LOB type data to store large data has not been good. It is fine when it is under 4k since it store it locally like varchar2. Once it is over 4k, you start seeing performance degrade. Perhaps, things may have improved since I last tried it a couple of years ago, but here are the things I found in the past for your information:
As clients need to get LOBs via oracle server, you may consider the following interesting situation.
cache with other data type if oracle
decide to cache it. As clob data are
general big, so it may push other
data
oracle decide not to cache it, and
stream the data to the client.
that you haven't encountered yet. You will see if your applications delete lobs, and oracle tries to reuse the lob. I don't know if oracle support online defragmenting the disk for lob (they have for indexes, but it takes long time when we tried it previous).
You mentioned 4s for 100 lobs of avg 20k, so it's 40ms per lobs. Remember each lob needs to have to retrieved via separate Lob locater (it is not in the result set by default). That is an additional round trip for each lob, I assume (I am not 100% sure on this since it was a while ago) If that is the case, I assume that will be at least 5ms extra time per round trip in serial order, right? If so, your performance is already first limited by sequential lob fetches. You should be able to verify this by tracking the time spent in sql execution vs lob content fetching. Or you can verify this by excluding the lob column as suggested by the previous answer in the post, which should tell you if it is lob related.
Good luck
我遇到了类似的问题,发现 JDBC Lob 在访问 lob 时进行网络调用。
从 Oracle 11.2g JDBC 驱动程序开始,您可以使用预取。
这使得访问速度提高了 10 倍...
I had a similar issue and found the JDBC Lobs making a network call when accessin the lobs.
As of Oracle 11.2g JDBC Driver you can use a prefetch.
This speeded up access by 10 times...
感谢所有有用的建议。尽管被标记为问题的答案,但我的答案是似乎没有好的解决方案。我尝试使用并行语句、不同的存储特性、预排序的温度。桌子和其他东西。该行动似乎不受任何通过痕迹或解释计划可见的特征的约束。当涉及 CLOB 时,甚至查询并行性似乎也很粗略。
毫无疑问,除了 atm 之外,在 11g 环境中还有更好的选择来处理大型 CLOB(尤其是压缩)。我被10g卡住了。
我现在选择对数据库进行额外的往返,在其中将 CLOB 预处理为大小优化的二进制 RAW。在以前的部署中,这一直是一个非常快的选项,并且可能值得维护离线计算缓存。缓存将失效并使用持久进程和 AQ 进行更新,直到有人提出更好的想法。
Thanks for all the helpful suggestions. Despite being flagged as answer to the problem my answer is that there seems to be no good solution. I tried using parallel statements, different storage characteristics, presorted temp. tables and other things. The operation seems not to be bound to any characteristic visible through traces or explain plans. Even query parallelism seems to be sketchy when CLOBs are involved.
Undoubtedly there would be better options to deal with with large CLOBs (especially compression) in an 11g environment but atm. I am stuck with 10g.
I have opted now for an additional roundtrip to the database in which I'll preprocess the CLOBs into a size optimized binary RAW. In previous deployments this has always been a very fast option and will likely be worth the trouble of maintaining an offline computed cache. The cache will be invalided and update using a persistent process and AQ until someone comes up with a better idea.