我遇到了一个我无法弄清楚的问题。这是问题的定义:
我在 Db2/Linux 环境中的 Blob 列中有一些数据。使用 JDK 压缩对 byte[] 进行压缩后,Blob 被写入 DB2(执行此操作的代码在 Linux 环境中运行)。
我正在尝试编写一个简单的程序来读取其中一些数据,将其解压缩(使用 JDK),并从 Windows 环境(我的开发环境)中解压缩的字节数组创建一个字符串。问题是,在我解压 Blob (byte[]) 后,解压后的字节数组的长度通常比预期长 1-3 个字节。我所说的预期是指偏移量和长度字段也存储在数据库中。所以在这种情况下,解压后的字节数组的长度通常会比数据库中存储的长度长,只是几个字节。因此,如果我从解压缩的字节数组创建一个 String 对象,并使用数据库中的 offset 和 length 字段使用 substring(offset, length) 方法创建另一个 String 对象,我的第二个 String (通过使用 substring 方法得到的)是更短。
一个例子是:
数据库记录包含一个 blob,偏移量:0,长度:260,409
解压缩 blob 后 -
compressedByte[].length - 71,212
decompressedByte[].length - 260,412
new String(decompressByte[]).length() - 260,412
new String(decompressByte[]).subString(0, 260,409).length() - 260409
对于其他一些输入记录,我看到的差异是长度在 1-3 个字节之间。
我对这个问题有点困惑,想知道是否有人可以提出任何提示,以便我可以进行更多调试来解决这个问题。我想知道这是否与 Linux 环境中字节的存储/写入方式以及 Windows 中的读取方式有关?感谢您的帮助。
I ran into an issue which I can't figure out. Here is the definition of the problem:
I have some data in a Blob column in Db2/Linux environment. Blob was written into DB2 after the byte[] was compressed using JDK compression (code that does this is running in Linux environment).
I am trying to write a simple program to read some of this data decompress it (using JDK) and create a String from the decompressed byte array in Windows Environment (my development environment). Issue is that after I decompress the Blob (byte[]), length of the decompressed byte array is usually 1-3 bytes longer than expected. What I mean by expected is that the offset and length fields are also being stored in the database. So in this case, length of the decompressed byte array is usually longer than the stored length in database, just a few bytes. So if I create a String object from the decompressed byte array and create another String object using the substring(offset, length) method using the offset and length fields from the database, my second String(the one I got by using substring method) is shorter.
An example would be:
database record contains a blob, offset: 0, length: 260,409
after decompressing the blob -
compressedByte[].length - 71,212
decompressedByte[].length - 260,412
new String(decompressByte[]).length() - 260,412
new String(decompressByte[]).subString(0, 260,409).length() - 260409
For some other input records, the difference I am seeing is anywhere between 1-3 bytes in length.
I am sort of puzzled with this issue and wondering if anyone could suggest any tips so I can do more debugging to figure this issue out. I am wondering whether this could be somehow related to how bytes are being stored/written in Linux environment and how they are being read in Windows? Thanks for your help.
发布评论
评论(2)
我怀疑两个系统的默认编码不同。
或者至少找出 linux 机器上的默认编码是什么(正常的 UTF-8)并跳过步骤 1。
joelonsoftware.com/articles/Unicode.html" rel="nofollow">Joel 谈软件
I suspect the default encoding is different between the two systems.
Or at the least find out what the default encoding is on the linux box is (normal UTF-8) and skip step 1.
A really good examplation of what could be happening here is on Joel on software
String
不是字节的通用持有者。毫无疑问,您的 db2/Linux 环境和 Windows 环境之间的默认字符编码不同,这将导致字节和字符之间的来回转换不同。A
String
is not a general holder for bytes. You will undoubtedly have different default character encodings between your db2/Linux environment and your Windows environment which will be causing the conversion back and forth between bytes and characters to be different.