Perl DBI 在嵌套 sql 上的性能

发布于 2024-11-04 06:46:30 字数 665 浏览 17 评论 0原文

我有一个关于 Perl dbi 中嵌套形式的 SQL 性能处理的问题。

基本上,脚本正在做的是

连接到 DB1

连接到 DB2

从DBI获取数据

在 while 循环内,

使用 DB1 中的数据并创建更新或插入语句

在DB2中执行sql

与 DB2 断开

断开DB1的连接

现在的问题是,这是否是最佳实践性能或以下方法是最好的选择。

从 DB1 获取日期

将数据写入暂存文件

从暂存文件中获取数据

 处理数据以在 DB2 中更新。 

编辑:我在连接到 DB1 后添加了到 DB2 的连接。但这不是我想要表达的重点。这更多地涉及服务器从数据库服务器和存储数据的磁盘读取数据的 I/O 和系统调用的性能问题。

如果你看看第一种方法。

在一个循环内。 该脚本从 DB1 获取一行数据 并等待返回的数据在 DB2 中得到处理。

如果有数千条记录,当您对服务器进行数千次系统调用以从磁盘读取数据时,对性能有何影响

i have a question on a performance processing sql in a nested forms in perl dbi.

bascially, what the script is doing is

connect to DB1

connect to DB2

fetch data from DBI

inside a while loop,

use data from DB1 and create an update or insert statement

execute the sql in DB2

disconenct from DB2

disconnect DB1

now the question is, is this the best practice when it comes to the performance or below method is the best alternative.

get date from DB1

write the data in a staging file

get data from the staging file

  process the data to udpate in DB2. 

Edit: i have added connect to DB2 after connect to DB1. but that is not the point i am trying to make. this is more on the question on the performance of I/O and system call that server is making to read from the database server and disk where the data is stored.

if you look at the first approach.

inside a loop.
the script gets one line of data from DB1
and wait until the returned data get processed in DB2.

if there is thousands of records, what is the impact on the performance when you are making thousnds of system calls to the server to read the data from the disk

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

爱的十字路口 2024-11-11 06:46:30

我对 Perl 只有中等的经验,但那项工作涉及 Perl 和数据库。据我了解,通常最好在循环之外连接到数据库——这样可以最大限度地减少重复连接、断开连接和重新连接的开销。

所以...

连接到 DB1

连接到 DB2

从 DB1 获取数据

在 while 循环内

使用 DB1 中的数据并创建更新
或插入语句

在DB2中执行sql

断开与 DB1 的连接

断开与 DB2 的连接

I only have moderate experience with Perl, but that work dealt with Perl and databases. As I understand it, it is typically best to connect to databases outside of loops -- that way you minimize the overhead of connecting, disconnecting, and reconnecting repeatedly.

So...

connect to DB1

connect to DB2

fetch data from DB1

inside a while loop

use data from DB1 and create an update
or insert statement

execute the sql in DB2

disconnect from DB1

disconnect from DB2

往日 2024-11-11 06:46:30

如果有数千条记录,当您对服务器进行数千次系统调用以从磁盘读取数据时,对性能有何影响

,对性能的影响是什么?考虑到典型的小或平均记录大小,数千条记录并不算多。您不知道数据实际上是从磁盘读取的,还是驻留在内存的缓冲区中。所以你无法确定系统调用的次数。你为什么要费心呢?

如果性能对您很重要,我建议您在典型的现实场景中测试这两种方法。

如果您正在处理锁定数据库和长时间运行的进程(无论出于何种原因),最好立即获取所需的所有数据(临时文件方法),这样您就不会持有源上的锁以及整个处理期间的目标服务器。

您的选择还可能取决于您对批量导入数据的工具或 API 的熟悉程度。例如,用于 MSSQL 的 bcp.exe 是一个用于导出和导入数据的便捷工具。大多数其他数据库都有类似的工具。

if there is thousands of records, what is the impact on the performance when you are making thousnds of system calls to the server to read the data from the disk

Thousands of records, that's not much, given typical small or average record sizes. You don't know whether the data is actually read from the disk, or rather residing in buffers in memory. So you can't be sure of the number of system calls. And why would you bother?

If performance matters to you, I'd suggest you test both methods in a typical, real-life scenario.

If you're dealing with a locking database and a long-running process (for whatever reason), it might be preferable to get all the data you need at once (staging file method) so you don't hold the locks on the origin and the destination server for the whole duration of the processing.

Your choice might also depend on your familiarity with tools or APIs to do bulk import of data. For example, bcp.exe for MSSQL is a handy tool for exporting and importing data. Most other databases have similar tools.

像你 2024-11-11 06:46:30

我建议并行执行此操作,例如在脚本中使用 pipe()fork() 。否则大多数时候你将等待网络 IO。所以它就像

(SELECT、序列化、打印) | (读取行、反序列化、INSERT_OR_UPDATE)

I would suggest doing that in parallel, e.g. using pipe() and fork() in your script. Otherwise most of the time the you'll be waiting for the network IO. So it's like

(SELECT, serialize, print) | (read line, deserialize, INSERT_OR_UPDATE)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文