Perl DBI 在嵌套 sql 上的性能
我有一个关于 Perl dbi 中嵌套形式的 SQL 性能处理的问题。
基本上,脚本正在做的是
连接到 DB1
连接到 DB2
从DBI获取数据
在 while 循环内,
使用 DB1 中的数据并创建更新或插入语句
在DB2中执行sql
与 DB2 断开
断开DB1的连接
现在的问题是,这是否是最佳实践性能或以下方法是最好的选择。
从 DB1 获取日期
将数据写入暂存文件
从暂存文件中获取数据
处理数据以在 DB2 中更新。
编辑:我在连接到 DB1 后添加了到 DB2 的连接。但这不是我想要表达的重点。这更多地涉及服务器从数据库服务器和存储数据的磁盘读取数据的 I/O 和系统调用的性能问题。
如果你看看第一种方法。
在一个循环内。 该脚本从 DB1 获取一行数据 并等待返回的数据在 DB2 中得到处理。
如果有数千条记录,当您对服务器进行数千次系统调用以从磁盘读取数据时,对性能有何影响
i have a question on a performance processing sql in a nested forms in perl dbi.
bascially, what the script is doing is
connect to DB1
connect to DB2
fetch data from DBI
inside a while loop,
use data from DB1 and create an update or insert statement
execute the sql in DB2
disconenct from DB2
disconnect DB1
now the question is, is this the best practice when it comes to the performance or below method is the best alternative.
get date from DB1
write the data in a staging file
get data from the staging file
process the data to udpate in DB2.
Edit: i have added connect to DB2 after connect to DB1. but that is not the point i am trying to make. this is more on the question on the performance of I/O and system call that server is making to read from the database server and disk where the data is stored.
if you look at the first approach.
inside a loop.
the script gets one line of data from DB1
and wait until the returned data get processed in DB2.
if there is thousands of records, what is the impact on the performance when you are making thousnds of system calls to the server to read the data from the disk
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我对 Perl 只有中等的经验,但那项工作涉及 Perl 和数据库。据我了解,通常最好在循环之外连接到数据库——这样可以最大限度地减少重复连接、断开连接和重新连接的开销。
所以...
I only have moderate experience with Perl, but that work dealt with Perl and databases. As I understand it, it is typically best to connect to databases outside of loops -- that way you minimize the overhead of connecting, disconnecting, and reconnecting repeatedly.
So...
,对性能的影响是什么?考虑到典型的小或平均记录大小,数千条记录并不算多。您不知道数据实际上是从磁盘读取的,还是驻留在内存的缓冲区中。所以你无法确定系统调用的次数。你为什么要费心呢?
如果性能对您很重要,我建议您在典型的现实场景中测试这两种方法。
如果您正在处理锁定数据库和长时间运行的进程(无论出于何种原因),最好立即获取所需的所有数据(临时文件方法),这样您就不会持有源上的锁以及整个处理期间的目标服务器。
您的选择还可能取决于您对批量导入数据的工具或 API 的熟悉程度。例如,用于 MSSQL 的
bcp.exe
是一个用于导出和导入数据的便捷工具。大多数其他数据库都有类似的工具。Thousands of records, that's not much, given typical small or average record sizes. You don't know whether the data is actually read from the disk, or rather residing in buffers in memory. So you can't be sure of the number of system calls. And why would you bother?
If performance matters to you, I'd suggest you test both methods in a typical, real-life scenario.
If you're dealing with a locking database and a long-running process (for whatever reason), it might be preferable to get all the data you need at once (staging file method) so you don't hold the locks on the origin and the destination server for the whole duration of the processing.
Your choice might also depend on your familiarity with tools or APIs to do bulk import of data. For example,
bcp.exe
for MSSQL is a handy tool for exporting and importing data. Most other databases have similar tools.我建议并行执行此操作,例如在脚本中使用
pipe()
和fork()
。否则大多数时候你将等待网络 IO。所以它就像(SELECT、序列化、打印) | (读取行、反序列化、INSERT_OR_UPDATE)
I would suggest doing that in parallel, e.g. using
pipe()
andfork()
in your script. Otherwise most of the time the you'll be waiting for the network IO. So it's like(SELECT, serialize, print) | (read line, deserialize, INSERT_OR_UPDATE)