将 BLOB 从一个数据库读取到另一个数据库
我正在尝试从 mysql 集转移到 postgres 集,而我在移动时遇到问题的字段之一是 mysql LBLOB。我正在尝试将其移至 postgres 中的 LargeObject 类型,但遇到了一些速度问题。我正在 Java/Groovy 中执行此操作,坦率地说,流媒体业务让我感到困惑。
我尝试了两种方法:将 LBLOB 保存在内存中并将其直接写入 LO,然后将 LBLOB 写入磁盘(毕竟它是一个文件),然后将文件读回 LO。
第二种方法快了很多很多倍,我不明白为什么,因为我认为它仍然太慢了。
这是第一种方法。
InputStream ins = rs.getBinaryStream(1);
def b
while ((b = ins.read()) > -1) {
obj.write(b.toInteger().byteValue())
}
“ins”是一个 ByteArrayInputStream,所以我将其读入(到一个 int),然后将其写入“obj”,即 LO。对于 1MB 的文件,这大约需要 7 分钟。我的直觉告诉我这应该比第二个更有效,但事实并非如此。
我将保留写入文件版本的代码片段,但它非常基本。它以相同的方式从数据库读取,但随后将输出写入磁盘上的文件。然后我从磁盘读取文件并将其写入 LO。对于同一文件,该方法大约需要 8 秒。
到底是怎么回事?
I am trying to move from a mysql set to a postgres one, and one of the fields that I'm having trouble moving is a mysql LBLOB. I am trying to move it into a LargeObject type in postgres and I'm having some speed issues. I'm doing this in Java/Groovy and, frankly, the streaming business has me confused.
I've tried two approaches: Hold the LBLOB in memory and write it directly to the LO, and write the LBLOB to disk (it is a file after all) and then read the file back in to the LO.
The second approach is many many times faster, and I can't figure out why because I think it is still entirely too slow.
Here is the first approach.
InputStream ins = rs.getBinaryStream(1);
def b
while ((b = ins.read()) > -1) {
obj.write(b.toInteger().byteValue())
}
"ins" is a ByteArrayInputStream so I read that in (to an int) and then write it to "obj", the LO. This takes about 7 minutes for a 1MB file. My gut tells me this should be more efficient than the second one, but it is not.
I'll spare the code snippet for the write to file version but it is pretty basic. It reads from the db the same way, but then writes the output to a file on disk. Then I go read the file from disk and write it to the LO. That approach takes about 8 seconds for the same file.
What is going on?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这种方法看起来效率不太高。
尝试缓冲 I/O:
适当调整缓冲区大小。
如果文件很大,您可能会发现操作系统在执行磁盘缓存(即磁盘 I/O 以及内存管理)时出现问题。从互联网的这一端很难确定。尝试分析和比较这两种方法(至少,您可以使用
-Xprof
命令行开关)。This approach doesn't look very efficient.
Try buffering the I/O:
Tune the buffer size as appropriate.
If the file is large, you may be seeing issues with the operating system doing disk caching (so disk I/O, plus memory management). It is difficult to say for sure from this end of the internet. Try profiling and comparing both approaches (at a bare minimum, you can use the
-Xprof
command line switch).