Oracle 10g 小 Blob 或 Clob 未内联存储?
根据我读过的文档,CLOB 或 BLOB 的默认存储是内联的,这意味着如果它的大小小于大约 4k,那么它将保存在表中。
但是,当我在 Oracle (10.2.0.1.0) 中的虚拟表上对此进行测试时,Oracle Monitor(由 Allround Automations 提供)的性能和响应表明它与表一起被保留。
这是我的测试场景...
create table clobtest ( x int primary key, y clob, z varchar(100) )
;
insert into clobtest
select object_id, object_name, object_name
from all_objects where rownum < 10001
;
select COLUMN_NAME, IN_ROW
from user_lobs
where table_name = 'CLOBTEST'
;
这显示: Y YES(表明 Oracle 将在行中存储 clob)
select x, y from CLOBTEST where ROWNUM < 1001 -- 8.49 seconds
select x, z from CLOBTEST where ROWNUM < 1001 -- 0.298 seconds
因此在这种情况下,CLOB 值的最大长度为 30 个字符,因此应始终是内联的。如果我运行 Oracle Monitor,它会为返回的每一行显示一个 LOB.Length,后跟一个 LOB.Read(),这再次表明 clob 值保留在表中。
我也尝试创建这样的表
create table clobtest
( x int primary key, y clob, z varchar(100) )
LOB (y) STORE AS (ENABLE STORAGE IN ROW)
,但得到了完全相同的结果。
有谁对我如何强制(说服、鼓励)Oracle 将 clob 值内联存储在表中有任何建议吗? (我希望获得与读取 varchar2 列 z 类似的响应时间)
更新:如果我运行此 SQL
select COLUMN_NAME, IN_ROW, l.SEGMENT_NAME, SEGMENT_TYPE, BYTES, BLOCKS, EXTENTS
from user_lobs l
JOIN USER_SEGMENTS s
on (l.Segment_Name = s. segment_name )
where table_name = 'CLOBTEST'
,则会得到以下结果...
Y YES SYS_LOB0000398621C00002$$ LOBSEGMENT 65536 8 1
According to the documents I've read, the default storage for a CLOB or BLOB is inline, which means that if it is less than approx 4k in size then it will be held in the table.
But when I test this on a dummy table in Oracle (10.2.0.1.0) the performance and response from Oracle Monitor (by Allround Automations) suggest that it is being held outwith the table.
Here's my test scenario ...
create table clobtest ( x int primary key, y clob, z varchar(100) )
;
insert into clobtest
select object_id, object_name, object_name
from all_objects where rownum < 10001
;
select COLUMN_NAME, IN_ROW
from user_lobs
where table_name = 'CLOBTEST'
;
This shows: Y YES (suggesting that Oracle will store the clob in the row)
select x, y from CLOBTEST where ROWNUM < 1001 -- 8.49 seconds
select x, z from CLOBTEST where ROWNUM < 1001 -- 0.298 seconds
So in this case, the CLOB values will have a maximum length of 30 characters, so should always be inline. If I run Oracle Monitor, it shows a LOB.Length followed by a LOB.Read() for each row returned, again suggesting that the clob values are held outwith the table.
I also tried creating the table like this
create table clobtest
( x int primary key, y clob, z varchar(100) )
LOB (y) STORE AS (ENABLE STORAGE IN ROW)
but got exactly the same results.
Does anyone have any suggestions how I can force (persuade, encourage) Oracle to store the clob value in-line in the table? (I'm hoping to achieve similar response times to reading the varchar2 column z)
UPDATE: If I run this SQL
select COLUMN_NAME, IN_ROW, l.SEGMENT_NAME, SEGMENT_TYPE, BYTES, BLOCKS, EXTENTS
from user_lobs l
JOIN USER_SEGMENTS s
on (l.Segment_Name = s. segment_name )
where table_name = 'CLOBTEST'
then I get the following results ...
Y YES SYS_LOB0000398621C00002$ LOBSEGMENT 65536 8 1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
Oracle LOB 的行为如下。
在以下情况下,LOB 会内联存储:
在以下情况下,LOB 会在行外存储:
现在,这并不是唯一可能影响性能的问题。
如果 LOB 最终没有内联存储,Oracle 的默认行为是避免缓存它们(只有内联 LOB 与行的其他字段一起缓存在缓冲区高速缓存中)。要告诉 Oracle 也缓存非内联 LOB,定义 LOB 时应使用 CACHE 选项。
默认行为是 ENABLE STORAGE IN ROW 和 NOCACHE,这意味着小型 LOB 将被内联,大型 LOB 不会(并且不会被缓存)。
最后,还存在通信协议级别的性能问题。典型的 Oracle 客户端将为每个 LOB 执行 2 次额外的往返来获取它们:
- 用于检索 LOB 的大小并相应地分配内存
- 获取数据本身(假设 LOB 很小)
即使使用数组接口来检索结果,也会执行这些额外的往返。如果您检索 1000 行并且数组大小足够大,则您将支付 1 次往返费用来检索行,并支付 2000 次往返费用来检索 LOB 的内容。
请注意,它不取决于 LOB 是否内联存储。它们是完全不同的问题。
为了在协议级别进行优化,Oracle 提供了一种新的 OCI 动词,用于在一次往返中获取多个 LOB (OCILobArrayRead)。我不知道 JDBC 是否存在类似的东西。
另一种选择是在客户端绑定 LOB,就像它是一个大的 RAW/VARCHAR2 一样。仅当可以定义 LOB 的最大大小时,这才有效(因为必须在绑定时提供最大大小)。这个技巧避免了额外的往返:LOB 只是像 RAW 或 VARCHAR2 一样进行处理。我们在 LOB 密集型应用程序中大量使用它。
一旦优化了往返次数,就可以在网络配置中调整数据包大小 (SDU) 的大小,以更好地适应情况(即有限数量的大往返次数)。它倾向于减少“SQL*Net more data to client”和“SQL*Net more data from client”等待事件。
The behavior of Oracle LOBs is the following.
A LOB is stored inline when:
A LOB is stored out-of-row when:
Now this is not the only issue which may impact performance.
If the LOBs are finally not stored inline, the default behavior of Oracle is to avoid caching them (only inline LOBs are cached in the buffer cache with the other fields of the row). To tell Oracle to also cache non inlined LOBs, the CACHE option should be used when the LOB is defined.
The default behavior is ENABLE STORAGE IN ROW, and NOCACHE, which means small LOBs will be inlined, large LOBs will not (and will not be cached).
Finally, there is also a performance issue at the communication protocol level. Typical Oracle clients will perform 2 additional roundtrips per LOBs to fetch them:
- one to retrieve the size of the LOB and allocate memory accordingly
- one to fetch the data itself (provided the LOB is small)
These extra roundtrips are performed even if an array interface is used to retrieve the results. If you retrieve 1000 rows and your array size is large enough, you will pay for 1 roundtrip to retrieve the rows, and 2000 roundtrips to retrieve the content of the LOBs.
Please note it does not depend on the fact the LOB is stored inline or not. They are complete different problems.
To optimize at the protocol level, Oracle has provided a new OCI verb to fetch several LOBs in one roundtrips (OCILobArrayRead). I don't know if something similar exists with JDBC.
Another option is to bind the LOB on client side as if it was a big RAW/VARCHAR2. This only works if a maximum size of the LOB can be defined (since the maximum size must be provided at bind time). This trick avoids the extra rountrips: the LOBs are just processed like RAW or VARCHAR2. We use it a lot in our LOB intensive applications.
Once the number of roundtrips have been optimized, the packet size (SDU) can be resized in the net configuration to better fit the situation (i.e. a limited number of large roundtrips). It tends to reduce the "SQL*Net more data to client" and "SQL*Net more data from client" wait events.
如果您“希望获得与读取 varchar2 列 z 相似的响应时间”,那么在大多数情况下您会感到失望。
如果您使用 CLOB,我想您需要存储超过 4,000 个字节,对吧?那么如果您需要读取更多字节,则需要更长的时间。
但是,如果您有这样的情况,您使用 CLOB,但您(在某些情况下)只对列的前 4,000 个字节(或更少)感兴趣,那么您就有机会获得类似的性能。
如果您对表使用 DBMS_LOB.SUBSTR 和 ENABLE STORAGE IN ROW CACHE 子句之类的内容,Oracle 似乎可以优化检索。示例:
在我对 10.2.0.4 和 8K 块进行的测试中,这两个查询给出了非常相似的性能:
来自 SQL*Plus 的示例(我多次运行查询以删除物理 IO):
如您所见,一致性获取相当高,但最后两个查询中 SQL*Net 往返和字节几乎相同,这显然在执行时间上产生了很大差异!
但有一个警告:如果您有较大的结果集,一致获取的差异可能会成为一个更可能的性能问题,因为您无法将所有内容保留在缓冲区缓存中,并且最终会产生非常昂贵的物理读取......
祝你好运!
干杯
If you're "hoping to achieve similar response times to reading the varchar2 column z", then you'll be disappointed in most cases.
If you're using a CLOB I suppose you need to store more than 4,000 bytes, right? Then if you need to read more bytes that's going to take longer.
BUT if you have a case where yes, you use a CLOB, but you're interested (in some instances) only in the first 4,000 bytes of the column (or less), then you have a chance of getting similar performance.
It looks like Oracle can optimize the retrieval if you use something like DBMS_LOB.SUBSTR and ENABLE STORAGE IN ROW CACHE clause with your table. Example:
In my tests on 10.2.0.4 and 8K block, these two queries give very similar performance:
Sample from SQL*Plus (I ran the queries multiple times to remove physical IO's):
As you can see, consistent gets are quite higher, but SQL*Net roundtrips and bytes are nearly identical in the last two queries, and that apparently makes a big difference in execution time!
One warning though: the difference in consistent gets might become a more likely performance issue if you have large result sets, as you won't be able to keep everything in buffer cache and you'll end up with very expensive physical reads...
Good luck!
Cheers
事实上,它存储在行内。您可能正在处理使用 LOB 而不是 varchar 的简单开销。没有什么是免费的。 DB 可能不会提前知道在哪里找到该行,因此它可能仍然“遵循指针”并执行额外的工作,以防 LOB 很大。如果你可以使用 varchar,那么你应该这样做。即使是像 2 个 varchars 这样处理 8000 个字符的老技巧也可能以更高的性能解决您的业务案例。
LOBS 速度慢、查询困难等。从积极的方面来说,它们可以是 4G。
有趣的尝试是将超过 4000 字节的内容放入该 clob 中,然后看看性能如何。也许速度差不多?这会告诉您,这是开销,会减慢您的速度。
警告,在某些时候,PC 的网络流量会减慢此类测试的速度。
通过包装计数来最小化这种情况,这将工作与服务器隔离:
select count(*) from (select x,y from clobtest where rownum<1001)
您可以使用“set autot trace”实现类似的效果,但会有也追踪开销。
Indeed, it is stored within the row. You are likely dealing with the simple overhead of using a LOB instead of a varchar. Nothing is free. The DB probably doesn't know ahead of time where to find the row, so it probably still "follows a pointer" and does extra work just in case the LOB is big. If you can get by with a varchar, you should. Even old hacks like 2 varchars to deal with 8000 characters might solve your business case with higher performance.
LOBS are slow, difficult to query, etc. On the positive, they can be 4G.
What would be interesting to try is to shove something just over 4000 bytes into that clob, and see what the performance looks like. Maybe it is about the same speed? This would tell you that it's overhead slowing you down.
Warning, at some point network traffic to your PC slows you down on these kind of tests.
Minimize this by wrapping in a count, this isolates the work to the server:
select count(*) from (select x,y from clobtest where rownum<1001)
You can achieve similar effects with "set autot trace", but there will be tracing overhead too.
对于 CLOB 和 BLOB,有两种间接方式:
LOB 值可能存储在与行的其余部分不同的数据库段中。
当您查询该行时,结果集中仅包含非 LOB 字段,并且访问 LOB 字段需要在客户端和服务器之间进行一次或多次额外的往返(每行!)。
我不太清楚您如何测量执行时间,而且我从未使用过 Oracle Monitor,但您可能主要受到第二个间接的影响。根据您使用的客户端软件,可以减少往返次数。例如,当您使用 ODP.NET 时,该参数称为“InitialLobFetchSize”。
更新:
一一判断两个间接寻址中哪一个相关,您可以运行包含 1000 行的 LOB 查询两次。如果时间从第一次运行到第二次运行显着下降,则为间接 1。在第二次运行时,缓存得到了回报,并且对单独数据库段的访问不再非常相关。如果时间保持大致相同,则这是第二个间接,即客户端和服务器之间的往返行程,在两次运行之间无法改善。
在一个非常简单的查询中,1000 行的时间超过 8 秒表明它是间接 2,因为 1000 行的 8 秒不能真正用磁盘访问来解释,除非您的数据非常分散并且磁盘系统负载很重。
There are two indirections when it comes to CLOBs and BLOBs:
The LOB value might be stored in a different database segment than the rest of the row.
When you query the row, only the non-LOB fields are contained in the result set and accessing the LOB-fields requries one or more additional round trips between the client and the server (per row!).
I don't quite know how you measure the execution time and I've never used Oracle Monitor, but you might primarily be affected by the second indirection. Depending on the client software you use, it is possible to reduce the round trips. E.g. when you use ODP.NET, the parameter is called InitialLobFetchSize.
Update:
One one to tell which of the two indirections is relevant, you can run your LOB query with 1000 rows twice. If the time drops significantly from the first to the second run, it's indirection 1. On the second run, the caching pays off and access to the separate database segment isn't very relevant anymore. If the time stays about the same, it's the second indirection, namely the round trips between the client and the server, which cannot improve between two runs.
The time of more than 8 seconds for 1000 rows in a very simple query indicate it's indirection 2 because 8 seconds for 1000 rows can't really be explained with disk access unless your data is very scattered and your disk system under heavy load.
这是关键信息(如何在没有额外往返的情况下读取LOB),我认为Oracle的文档中没有提供这些信息:
我在加载带有一个 blob 列(14KB => 数千行)的简单表(几 GB)时遇到问题,我对此进行了很长时间的研究,尝试了很多 lob 存储调整(新表空间的 DB_BLOCK_SIZE、lob 存储规范) - CHUNK )、sqlnet.ora 设置、客户端预取属性,但这(在客户端使用 OCCI ResultSet->setBufferData 将 BLOB 视为 LONG RAW)是最重要的事情(说服oracle立即发送blob列,而不是首先发送lob定位器并根据lob定位器单独加载每个lob。
现在我可以获得甚至〜500Mb/s的吞吐量(列<3964B)。
我们的 14KB blob 将被分成多个列 - 因此它将存储在行中,以便从 HDD 获得几乎顺序的读取。对于一个 14KB blob(一列),由于非顺序读取(iostat:合并读取请求量较低),我得到约 150Mbit/s。
注意:不要忘记设置 lob 预取大小/长度:
err = OCIAttrSet(session, (ub4) OCI_HTYPE_SESSION, (void *) &default_lobprefetch_size, 0, (ub4) OCI_ATTR_DEFAULT_LOBPREFETCH_SIZE ,errhp);
但我不知道如何使用 ODBC 连接器实现相同的获取吞吐量。我正在尝试但没有成功。
This is the key information (how to read LOB without extra roundtrips), which is not available in Oracle's documentation I think:
I had problem with loading simple table (few GB) with one blob column ( 14KB => thousands of rows) and I was investigating it for a long time, tried a lot of lob storage tunings (DB_BLOCK_SIZE for new tablespace, lob storage specification - CHUNK ), sqlnet.ora settings, client prefetching attributes, but this (treat BLOB as LONG RAW with OCCI ResultSet->setBufferData on client side) was the most important thing (persuade oracle to send blob column immediately without sending lob locator at first and loading each lob separately based on lob locator.
Now I can get even ~ 500Mb/s throughput (with columns < 3964B).
Our 14KB blob will be separated into multiple columns - so it'll be stored in row to get almost sequential reads from HDD. With one 14KB blob (one column) I get ~150Mbit/s because of non-sequential reads (iostat: low amount of merged read requests).
NOTE: don't forget to set also lob prefetch size/length:
err = OCIAttrSet(session, (ub4) OCI_HTYPE_SESSION, (void *) &default_lobprefetch_size, 0, (ub4) OCI_ATTR_DEFAULT_LOBPREFETCH_SIZE, errhp);
But I don't know how is it possible to achieve the same fetching throughput with ODBC connector. I was trying it without any success.