PostgreSQL 中的快速批处理执行

发布于 2024-12-10 07:40:34 字数 2632 浏览 2 评论 0原文

我有很多数据,我想在最短的时间内插入数据库。我做了一些测试。我在 PostgreSQL 中创建了一个表(使用下面的脚本):

CREATE TABLE test_table
(
  id serial NOT NULL,
  item integer NOT NULL,
  count integer NOT NULL,
  CONSTRAINT test_table_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE test_table OWNER TO postgres;

我编写了测试代码,创建了 1000 个随机值,并以两种不同的方式插入到 test_table 中。首先,使用QSqlQuery::exec()

int insert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");

db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");

if (!db.open()) {
    qDebug() << "can not open DB";
    return -1;
}

QString queryString = QString("INSERT INTO test_table (item, count)"
        " VALUES (:item, :count)");

QSqlQuery query;
query.prepare(queryString);

QDateTime start = QDateTime::currentDateTime();

for (int i = 0; i < 1000; i++) {

    query.bindValue(":item", qrand());
    query.bindValue(":count", qrand());

    if (!query.exec()) {
        qDebug() << query.lastQuery();
        qDebug() << query.lastError();
    }

} //end of for i

QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}

其次使用QSqlQuery::execBatch

int batchInsert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");

db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");

if (!db.open()) {
    qDebug() << "can not open DB";
    return -1;
}

QString queryString = QString("INSERT INTO test_table (item, count)"
        " VALUES (:item, :count)");

QSqlQuery query;
query.prepare(queryString);

QVariantList itemList;
QVariantList CountList;

QDateTime start = QDateTime::currentDateTime();

for (int i = 0; i < 1000; i++) {

    itemList.append(qrand());
    CountList.append(qrand());

} //end of for i

query.addBindValue(itemList);
query.addBindValue(CountList);

if (!query.execBatch())
    qDebug() << query.lastError();

QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}

我发现它们之间没有区别:

int main() {
qDebug() << insert() << batchInsert();
return 1;}

结果:

14270 14663 (milliseconds)

我该如何改进它?

http://doc.qt.io/qt-5/qsqlquery.html# execBatch 已被引用:

如果数据库不支持批量执行,驱动程序将 使用传统的 exec() 调用来模拟它。

我不确定我的 DBMS 是否支持批量执行? 我该如何测试它?

I have a lots of data and I want to insert to DB in the least time. I did some tests. I created a table (using the below script) in PostgreSQL:

CREATE TABLE test_table
(
  id serial NOT NULL,
  item integer NOT NULL,
  count integer NOT NULL,
  CONSTRAINT test_table_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE test_table OWNER TO postgres;

I wrote test code, created 1000 random values and insert to test_table in two different ways. First, using QSqlQuery::exec()

int insert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");

db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");

if (!db.open()) {
    qDebug() << "can not open DB";
    return -1;
}

QString queryString = QString("INSERT INTO test_table (item, count)"
        " VALUES (:item, :count)");

QSqlQuery query;
query.prepare(queryString);

QDateTime start = QDateTime::currentDateTime();

for (int i = 0; i < 1000; i++) {

    query.bindValue(":item", qrand());
    query.bindValue(":count", qrand());

    if (!query.exec()) {
        qDebug() << query.lastQuery();
        qDebug() << query.lastError();
    }

} //end of for i

QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}

Second using QSqlQuery::execBatch:

int batchInsert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");

db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");

if (!db.open()) {
    qDebug() << "can not open DB";
    return -1;
}

QString queryString = QString("INSERT INTO test_table (item, count)"
        " VALUES (:item, :count)");

QSqlQuery query;
query.prepare(queryString);

QVariantList itemList;
QVariantList CountList;

QDateTime start = QDateTime::currentDateTime();

for (int i = 0; i < 1000; i++) {

    itemList.append(qrand());
    CountList.append(qrand());

} //end of for i

query.addBindValue(itemList);
query.addBindValue(CountList);

if (!query.execBatch())
    qDebug() << query.lastError();

QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}

I found that there is no difference between them:

int main() {
qDebug() << insert() << batchInsert();
return 1;}

Result:

14270 14663 (milliseconds)

How can I improve it?

In http://doc.qt.io/qt-5/qsqlquery.html#execBatch has been cited:

If the database doesn't support batch executions, the driver will
simulate it using conventional exec() calls.

I'm not sure my DBMS support batch executions or not?
How can I test it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

已下线请稍等 2024-12-17 07:40:34

不确定 qt 驱动程序的作用,但 PostgreSQL 可以支持在一个事务中运行多个语句。只需手动执行即可,而不是尝试使用驱动程序的内置功能。

尝试将 SQL 语句更改为

BEGIN TRANSACTION;

对于循环的每次迭代都运行插入语句。

INSERT HERE;

一旦所有 1000 条记录发生循环结束,就会发出此消息。在您的同一连接上。

COMMIT TRANSACTION;

此外,1000 行并不足以进行测试,您可能需要尝试 100,000 行或更多,以确保 qt 批处理确实没有帮助。

In not sure what the qt driver does, but PostgreSQL can support running multiple statements in one transaction. Just do it manually instead of trying to use the built in feature of the driver.

Try changing your SQL statement to

BEGIN TRANSACTION;

For every iteration of loop run an insert statement.

INSERT HERE;

Once end of loop happens for all 1000 records issue this. On your same connection.

COMMIT TRANSACTION;

Also 1000 rows is not much to test with, you might want to try 100,000 or more to make sure the qt batch really wasn't helping.

苍景流年 2024-12-17 07:40:34

通过发出 1000 个插入语句,您可以对数据库进行 1000 次往返。这需要相当长的时间(网络和调度延迟)。所以尽量减少insert语句的数量!

假设你想:

insert into test_table(item, count) values (1000, 10);
insert into test_table(item, count) values (1001, 20);
insert into test_table(item, count) values (1002, 30);

将其转换为单个查询,查询将需要不到一半的时间:

insert into test_table(item, count) values (1000, 10), (1001, 20), (1002, 30);

在 PostgreSQL 中,还有另一种编写方式:

insert into test_table(item, count) values (
  unnest(array[1000, 1001, 1002])
  unnest(array[10, 20, 30]));

我提出第二种方式的原因是你可以传递所有内容单个参数中的大数组(在 C# 中使用数据库驱动程序“Npgsql”进行测试):

insert into test_table(item, count) values (unnest(:items), unnest(:counts));
  • items 是一个查询参数,其值为 int[]{100, 1001, 1002}
  • counts 是一个查询参数,值为 int[]{10, 20, 30}

今天,我减少了 10,000 次插入的运行时间使用这种技术,C# 的时间从 80 秒缩短到 550 毫秒。这很容易。此外,事务也不存在任何麻烦,因为单个语句永远不会拆分为多个事务。

我希望这也适用于 Qt PostgreSQL 驱动程序。在服务器端,您需要 PostgreSQL >= 8.4.,因为旧版本不提供 unnest(但可能有解决方法)。

By issuing 1000 insert statements, you have 1000 round trips to the database. This takes quite some time (network and scheduling latency). So try to reduce the number of insert statements!

Let's say you want to:

insert into test_table(item, count) values (1000, 10);
insert into test_table(item, count) values (1001, 20);
insert into test_table(item, count) values (1002, 30);

Transform it into a single query and the query will need less than half of the time:

insert into test_table(item, count) values (1000, 10), (1001, 20), (1002, 30);

In PostgreSQL, there is another way to write it:

insert into test_table(item, count) values (
  unnest(array[1000, 1001, 1002])
  unnest(array[10, 20, 30]));

My reason for presenting the second way is that you can pass all the content of a big array in a single parameter (tested with in C# with the database driver "Npgsql"):

insert into test_table(item, count) values (unnest(:items), unnest(:counts));
  • items is a query parameter with the value int[]{100, 1001, 1002}
  • counts is a query parameter with the value int[]{10, 20, 30}

Today, I have cut down the running time of 10,000 inserts in C# from 80s to 550ms with this technique. It's easy. Furthermore, there is not any hassle with transactions, as a single statement is never split into multiple transactions.

I hope this works with the Qt PostgreSQL driver, too. On the server side, you need PostgreSQL >= 8.4., as older versions do not provide unnest (but there may be work arounds).

野心澎湃 2024-12-17 07:40:34

您可以使用 QSqlDriver::hasFeature 和参数 QSqlDriver::BatchOperations

在 4.8 源代码中,我发现只有 oci (oracle) 支持 BatchOperations。不知道为什么不在 psql 驱动程序中使用 postgresql 的 COPY 语句。

You can use QSqlDriver::hasFeature with argument QSqlDriver::BatchOperations

In the 4.8 sources, I found that only oci (oracle) support the BatchOperations. Don't know why not use the COPY statement for postgresql in the psql driver.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文