使用 Psycopg2 运行其他事务时,CLUSTER 不会缩小表大小
我们正在运行一个 python 进程,它运行这个存储过程,它将文件从某个目录导入到 postgres 数据库。这些文件首先导入到内存表中,然后导入到磁盘表中。内存表的实际大小永远不应超过 30 MB。随着该表不断更新,表的大小会增加(由于死元组)。为了控制情况,我们需要对表执行 CLUSTER 操作。我正在使用 psycopg2 模块来运行存储过程并对表进行 CLUSTER,但如果导入过程正在运行,则表的大小永远不会减小。但是,如果我停止导入过程并运行 CLUSTER,那么表的大小就会减小。由于性能原因,我应该能够运行 CLUSTER 命令而不停止导入过程。
我尝试了手动提交,ISOLATION_LEVEL_AUTOCOMMIT,但这些都不起作用。 下面是该过程的示例代码 -
while True:
-get the filenames in directory
for filpath in filenames:
conn = psycopg2.connect("dbname='dbname' user='user' password='password'")
cursor = conn.cursor()
# Calls a postgresql function that reads a file and imports it into
# a table via INSERT statements and DELETEs any records that have the
# same unique key as any of the records in the file.
cursor.execute("SELECT import('%s', '%s');" % (filepath, str(db_timestamp))
conn.commit()
cursor.close()
conn.close()
os.remove(get_media_path(fname))
使用类似的 conn 对象,我想每小时运行一次 CLUSTER 命令 -
conn = psycopg2.connect("dbname='dbname' user='user' password='password'")
cursor = conn.cursor()
cursor.execute("CLUSTER table_name")
conn.commit()
cursor.close()
conn.close()
另外,我尝试设置 -
conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
另一条信息 - 我所有这些都在 django 环境中运行。我无法使用 django 连接对象来执行该任务,因为 - django 无法释放与我的线程代码的连接,很快数据库就停止接受连接。这种混合环境是否会对 psycopg 产生影响?
很少观察到 -
- 在导入进程运行时运行 CLUSTER 命令 - 大小不会下降
- 当我停止导入进程然后运行 CLUSTER 时 - 大小确实下降
- 当我停止导入进程并重新开始导入进程时,然后运行CLUSTER 命令 - 大小确实下降
对这个问题有什么想法吗?
We are running a python process which runs this stored procedure, which import files from a certain directory to the postgres database. These files are first get imported to a in-memory table and then to the disk table. The actual size of the in-memory table should never really grow beyond 30 MB. As this table is constantly updated, the size of the table grows (because of dead tuples). To keep things in check, we need to perform CLUSTER operation on the table. I am using psycopg2 module to run stored procedure adn CLUSTER the table, but if the import process is running the size of the table never goes down. But If I stop the import process and run CLUSTER then the size of the table goes down. Because of the performance reason, I should be able to run CLUSTER command without stopping the import procedure.
I tried manual commit, ISOLATION_LEVEL_AUTOCOMMIT but none of this has worked.
Below is the sample code of the process -
while True:
-get the filenames in directory
for filpath in filenames:
conn = psycopg2.connect("dbname='dbname' user='user' password='password'")
cursor = conn.cursor()
# Calls a postgresql function that reads a file and imports it into
# a table via INSERT statements and DELETEs any records that have the
# same unique key as any of the records in the file.
cursor.execute("SELECT import('%s', '%s');" % (filepath, str(db_timestamp))
conn.commit()
cursor.close()
conn.close()
os.remove(get_media_path(fname))
With the similar conn object, I want to run CLUSTER command once an hour -
conn = psycopg2.connect("dbname='dbname' user='user' password='password'")
cursor = conn.cursor()
cursor.execute("CLUSTER table_name")
conn.commit()
cursor.close()
conn.close()
Also, I tried setting -
conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
Another piece of information -
I have all this running inside django environment. I could not use django connection objects to do the task because - django could not release connections with my threading code and soon the database stopped accepting connections.Does this mixed environment might have an effect on psycopg?
Few observations -
- Running the CLUSTER command when import process is running - size doesn't go down
- When I stop the import process and then run CLUSTER - size does go down
- When I stop the import process and start import process back, and after that run CLUSTER command - size does go down
Any thoughts on the problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
从手册中:
您确定必须每小时进行一次集群吗?有了更好的填充因子和自动真空,你的表就不会增长那么多,表中也不会有死元组。
From the manual:
Are you sure you have to CLUSTER every hour? With a better fillfactor and autovacuum, your table won't grow that much and you won't have dead tuples in the table.
好的 - 我找到了罪魁祸首。
问题是集群或真空没有删除死元组,因为当我们直接在 django 环境中使用 pstcopg2 时,发生了一些奇怪的交互。隔离 psycopg 代码并从导入过程中删除 django 相关代码后,一切正常。这解决了问题,现在我可以在不停止导入过程的情况下对其进行清理或集群。
OK - I found the culprit.
The problem was somehow the cluster or vacuum were not deleting the dead tuples, because some weird interaction was happening when we using pstcopg2 directly in django environment. After isolating the psycopg code and removing the django related code from import-process, everything worked fine. This solved the problem and now I can vacuum or cluster it without stopping the import-process.