PostgreSQL 中的快速批处理执行
我有很多数据,我想在最短的时间内插入数据库。我做了一些测试。我在 PostgreSQL 中创建了一个表(使用下面的脚本):
CREATE TABLE test_table
(
id serial NOT NULL,
item integer NOT NULL,
count integer NOT NULL,
CONSTRAINT test_table_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE test_table OWNER TO postgres;
我编写了测试代码,创建了 1000 个随机值,并以两种不同的方式插入到 test_table
中。首先,使用QSqlQuery::exec()
int insert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");
db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");
if (!db.open()) {
qDebug() << "can not open DB";
return -1;
}
QString queryString = QString("INSERT INTO test_table (item, count)"
" VALUES (:item, :count)");
QSqlQuery query;
query.prepare(queryString);
QDateTime start = QDateTime::currentDateTime();
for (int i = 0; i < 1000; i++) {
query.bindValue(":item", qrand());
query.bindValue(":count", qrand());
if (!query.exec()) {
qDebug() << query.lastQuery();
qDebug() << query.lastError();
}
} //end of for i
QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}
其次使用QSqlQuery::execBatch
:
int batchInsert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");
db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");
if (!db.open()) {
qDebug() << "can not open DB";
return -1;
}
QString queryString = QString("INSERT INTO test_table (item, count)"
" VALUES (:item, :count)");
QSqlQuery query;
query.prepare(queryString);
QVariantList itemList;
QVariantList CountList;
QDateTime start = QDateTime::currentDateTime();
for (int i = 0; i < 1000; i++) {
itemList.append(qrand());
CountList.append(qrand());
} //end of for i
query.addBindValue(itemList);
query.addBindValue(CountList);
if (!query.execBatch())
qDebug() << query.lastError();
QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}
我发现它们之间没有区别:
int main() {
qDebug() << insert() << batchInsert();
return 1;}
结果:
14270 14663 (milliseconds)
我该如何改进它?
在 http://doc.qt.io/qt-5/qsqlquery.html# execBatch 已被引用:
如果数据库不支持批量执行,驱动程序将 使用传统的 exec() 调用来模拟它。
我不确定我的 DBMS 是否支持批量执行? 我该如何测试它?
I have a lots of data and I want to insert to DB in the least time. I did some tests. I created a table (using the below script) in PostgreSQL:
CREATE TABLE test_table
(
id serial NOT NULL,
item integer NOT NULL,
count integer NOT NULL,
CONSTRAINT test_table_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE test_table OWNER TO postgres;
I wrote test code, created 1000 random values and insert to test_table
in two different ways. First, using QSqlQuery::exec()
int insert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");
db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");
if (!db.open()) {
qDebug() << "can not open DB";
return -1;
}
QString queryString = QString("INSERT INTO test_table (item, count)"
" VALUES (:item, :count)");
QSqlQuery query;
query.prepare(queryString);
QDateTime start = QDateTime::currentDateTime();
for (int i = 0; i < 1000; i++) {
query.bindValue(":item", qrand());
query.bindValue(":count", qrand());
if (!query.exec()) {
qDebug() << query.lastQuery();
qDebug() << query.lastError();
}
} //end of for i
QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}
Second using QSqlQuery::execBatch
:
int batchInsert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");
db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");
if (!db.open()) {
qDebug() << "can not open DB";
return -1;
}
QString queryString = QString("INSERT INTO test_table (item, count)"
" VALUES (:item, :count)");
QSqlQuery query;
query.prepare(queryString);
QVariantList itemList;
QVariantList CountList;
QDateTime start = QDateTime::currentDateTime();
for (int i = 0; i < 1000; i++) {
itemList.append(qrand());
CountList.append(qrand());
} //end of for i
query.addBindValue(itemList);
query.addBindValue(CountList);
if (!query.execBatch())
qDebug() << query.lastError();
QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}
I found that there is no difference between them:
int main() {
qDebug() << insert() << batchInsert();
return 1;}
Result:
14270 14663 (milliseconds)
How can I improve it?
In http://doc.qt.io/qt-5/qsqlquery.html#execBatch has been cited:
If the database doesn't support batch executions, the driver will
simulate it using conventional exec() calls.
I'm not sure my DBMS support batch executions or not?
How can I test it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不确定 qt 驱动程序的作用,但 PostgreSQL 可以支持在一个事务中运行多个语句。只需手动执行即可,而不是尝试使用驱动程序的内置功能。
尝试将 SQL 语句更改为
对于循环的每次迭代都运行插入语句。
一旦所有 1000 条记录发生循环结束,就会发出此消息。在您的同一连接上。
此外,1000 行并不足以进行测试,您可能需要尝试 100,000 行或更多,以确保 qt 批处理确实没有帮助。
In not sure what the qt driver does, but PostgreSQL can support running multiple statements in one transaction. Just do it manually instead of trying to use the built in feature of the driver.
Try changing your SQL statement to
For every iteration of loop run an insert statement.
Once end of loop happens for all 1000 records issue this. On your same connection.
Also 1000 rows is not much to test with, you might want to try 100,000 or more to make sure the qt batch really wasn't helping.
通过发出 1000 个插入语句,您可以对数据库进行 1000 次往返。这需要相当长的时间(网络和调度延迟)。所以尽量减少insert语句的数量!
假设你想:
将其转换为单个查询,查询将需要不到一半的时间:
在 PostgreSQL 中,还有另一种编写方式:
我提出第二种方式的原因是你可以传递所有内容单个参数中的大数组(在 C# 中使用数据库驱动程序“Npgsql”进行测试):
items
是一个查询参数,其值为int[]{100, 1001, 1002}
counts
是一个查询参数,值为int[]{10, 20, 30}
今天,我减少了 10,000 次插入的运行时间使用这种技术,C# 的时间从 80 秒缩短到 550 毫秒。这很容易。此外,事务也不存在任何麻烦,因为单个语句永远不会拆分为多个事务。
我希望这也适用于 Qt PostgreSQL 驱动程序。在服务器端,您需要 PostgreSQL >= 8.4.,因为旧版本不提供 unnest(但可能有解决方法)。
By issuing 1000 insert statements, you have 1000 round trips to the database. This takes quite some time (network and scheduling latency). So try to reduce the number of insert statements!
Let's say you want to:
Transform it into a single query and the query will need less than half of the time:
In PostgreSQL, there is another way to write it:
My reason for presenting the second way is that you can pass all the content of a big array in a single parameter (tested with in C# with the database driver "Npgsql"):
items
is a query parameter with the valueint[]{100, 1001, 1002}
counts
is a query parameter with the valueint[]{10, 20, 30}
Today, I have cut down the running time of 10,000 inserts in C# from 80s to 550ms with this technique. It's easy. Furthermore, there is not any hassle with transactions, as a single statement is never split into multiple transactions.
I hope this works with the Qt PostgreSQL driver, too. On the server side, you need PostgreSQL >= 8.4., as older versions do not provide
unnest
(but there may be work arounds).您可以使用 QSqlDriver::hasFeature 和参数 QSqlDriver::BatchOperations
在 4.8 源代码中,我发现只有 oci (oracle) 支持 BatchOperations。不知道为什么不在 psql 驱动程序中使用 postgresql 的 COPY 语句。
You can use QSqlDriver::hasFeature with argument QSqlDriver::BatchOperations
In the 4.8 sources, I found that only oci (oracle) support the BatchOperations. Don't know why not use the COPY statement for postgresql in the psql driver.