postgresql批量插入性能问题(相对于mysql)
我有很多数据,我想至少插入到数据库中。我做了一些测试。 我创建了一个包含 21 列的表(使用下面的脚本 *1)。 1 列是 int,其他 20 列是 string。没有索引。我编写了一个测试代码,创建一个随机值并插入到数据库(使用 insert sql *2)。在运行sql命令之前,调用conn.setAutoCommit(false)而不是conn.commit()。这个操作大约需要6-7秒。有官方文档(*3)说使用“COPY”命令进行批量插入。创建一个类似的ascii文件并重新测试它,这个测试大约5秒完成。 在同一台机器上使用相同的测试代码,向Mysql插入相同的数据,测试不到1秒。我真的很惊讶,相对于 6-7 秒,性能有了很大的提高。这种差异真的存在还是我忽略了任何东西。
感谢您的帮助
我的测试配置是solaris 10和PostgreSQL 9.0.2和Mysql 5.0.85。
(*1) PostgreSQL 创建数据库脚本
CREATE TABLE tablo
(
id integer,
column1 character varying(50),
column2 character varying(50),
column3 character varying(50),
....
column20 character varying(50)
)
WITH (
OIDS=FALSE
);
ALTER TABLE tablo OWNER TO pgadmin;
(*2)插入查询
INSERT INTO tablo values (1,'column67062724628797','column26007603757271','column73982294239806','column43213154421324','column97722282440805','column79000889379973','column10680880337755','column14322827996050','column80720842739399','column22777514445036','column77771307997926','column92799724462613','column89992937353110','column61693061355353','column43804223262229','column62209656630047','column52150955786400','column85726157993572','column33358888005133','column77743799989746'),(2,'column77383691774831','column67841193885377','column36149612452454','column51161680852595','column91649734476301','column57283307765550','column14997046117948','column29457857794726','column91157683305554','column44413196495111','column40702778794938','column24744999726868','column38356057278249','column16808618337554','column64362413535503','column19577167594144','column72943639162993','column46830376244427','column01942608599939','column66479131355003'),
....
10K lines
(*3) PostgreSql 官方文档地址 http://www.postgresql.org/docs/8.3/interactive/populate.html
I have a lot of data and I want to insert to DB at least time. I do some test.
I create a table (using below script *1) with 21 column. 1 column is int, others 20 columns are string. There is no index. I write a test code, create a random values and insert to DB (using insert sql *2). Before to run sql command, call conn.setAutoCommit(false) than conn.commit(). This operation about 6-7 seconds. There is official document (*3)said that use "COPY" command for bulk insert. Create similar an ascii file and re-test it, this test finished about 5 seconds.
Using same test code on same machine, insert same data to Mysql, test less than 1 second . I realy suprise that with respect to 6-7 seconds great performance improvement. Is this differ really exists or I overlook anything.
Thanks for helps
My test configuration is solaris 10 and PostgreSQL 9.0.2 and Mysql 5.0.85.
(*1) PostgreSQL Create DB Script
CREATE TABLE tablo
(
id integer,
column1 character varying(50),
column2 character varying(50),
column3 character varying(50),
....
column20 character varying(50)
)
WITH (
OIDS=FALSE
);
ALTER TABLE tablo OWNER TO pgadmin;
(*2)Insert query
INSERT INTO tablo values (1,'column67062724628797','column26007603757271','column73982294239806','column43213154421324','column97722282440805','column79000889379973','column10680880337755','column14322827996050','column80720842739399','column22777514445036','column77771307997926','column92799724462613','column89992937353110','column61693061355353','column43804223262229','column62209656630047','column52150955786400','column85726157993572','column33358888005133','column77743799989746'),(2,'column77383691774831','column67841193885377','column36149612452454','column51161680852595','column91649734476301','column57283307765550','column14997046117948','column29457857794726','column91157683305554','column44413196495111','column40702778794938','column24744999726868','column38356057278249','column16808618337554','column64362413535503','column19577167594144','column72943639162993','column46830376244427','column01942608599939','column66479131355003'),
....
10K lines
(*3) Official PostgreSql document address
http://www.postgresql.org/docs/8.3/interactive/populate.html
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看起来很奇怪,您没有看到使用 COPY 之类的加速。我生成了一个脚本来创建一个类似的表并用 10,000 行填充它,并发现:
方法 2 和 3 比方法 1 快约 4 倍。方法 4 比方法 4 快约 10 倍2或3。
如果我将相同的数据导入到我的机器上的mysql中,大约需要方法2或3一半的时间。转储并重新加载它,相同。用 -e 转储并重新加载它,同样的。使用 InnoDB 将时间缩短到与方法 2 或 3 相同。
因此,至少在我的硬件/操作系统组合上,两者之间的速度是相当的……当然,我更好地照顾 postgresql 的设置,但对于一个小表像这样我不认为缓冲区缓存大小之类的事情很重要?
现在,至于 JDBC 对批量插入的支持有多好,我不知道。我仅使用命令行客户端完成了所有这些事情。
Seems odd that you're not seeing a speedup with things like using COPY. I generated a script to create a similar table and populate it with 10,000 rows, and found that:
Methods 2 and 3 were about 4 times faster than method 1. Method 4 was about 10 times faster than 2 or 3.
If I import the same data into mysql on my machine, it takes about half the time as methods 2 or 3. Dumping and reloading it, same. Dumping with -e and reloading it, same. Using InnoDB bumped the time up to being the same as methods 2 or 3.
So at least on my hardware/OS combination the speeds between the two are comparable... although of course I look after postgresql's settings better, but for a small table like this I wouldn't expect things like the buffer cache size to matter much?
Now, as to how good the JDBC support for doing batch inserts is, I have no idea. I did all these things using just the command-line clients.
这里有两个主要的考虑因素:
因此,除非从单个连接进行批量插入是您的应用程序的常态,否则此测试实际上不会显示任何有用的信息...您更有可能同时使用数十个连接来插入、查询和/或更新小块数据的
There are two major considerations here:
So, unless bulk inserts from a single connection are the norm for your application, this test really doesn't show anything useful... It is more likely you will be using dozens of connections simultaneously to insert, query and/or update small chunks of data
通过这种方式,我在 10 列(2 个 xeon、24 个核心、24 GB 内存、SSD)上创建索引,实现了每秒 400000-500000 次插入。
In such way I have achieve 400000-500000 inserts per seconds with index creation on a 10 columns (2 xeon, 24 cores, 24 Gb of memory, SSD).