使用 Psycopg2 运行其他事务时，CLUSTER 不会缩小表大小

发布于 2024-10-11 13:34:32 字数 1641 浏览 4 评论 0原文

我们正在运行一个 python 进程，它运行这个存储过程，它将文件从某个目录导入到 postgres 数据库。这些文件首先导入到内存表中，然后导入到磁盘表中。内存表的实际大小永远不应超过 30 MB。随着该表不断更新，表的大小会增加（由于死元组）。为了控制情况，我们需要对表执行 CLUSTER 操作。我正在使用 psycopg2 模块来运行存储过程并对表进行 CLUSTER，但如果导入过程正在运行，则表的大小永远不会减小。但是，如果我停止导入过程并运行 CLUSTER，那么表的大小就会减小。由于性能原因，我应该能够运行 CLUSTER 命令而不停止导入过程。

我尝试了手动提交，ISOLATION_LEVEL_AUTOCOMMIT，但这些都不起作用。下面是该过程的示例代码 -

while True:
    -get the filenames in directory
    for filpath in  filenames:
        conn = psycopg2.connect("dbname='dbname' user='user' password='password'")
        cursor = conn.cursor()
        # Calls a postgresql function that reads a file and imports it into 
        # a table via INSERT statements and DELETEs any records that have the 
        # same unique key as any of the records in the file.
        cursor.execute("SELECT import('%s', '%s');" % (filepath, str(db_timestamp))
        conn.commit()
        cursor.close()
        conn.close()
        os.remove(get_media_path(fname))

使用类似的 conn 对象，我想每小时运行一次 CLUSTER 命令 -

conn = psycopg2.connect("dbname='dbname' user='user' password='password'")
cursor = conn.cursor()
cursor.execute("CLUSTER table_name")
conn.commit()
cursor.close()
conn.close()

另外，我尝试设置 -

conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)

另一条信息 - 我所有这些都在 django 环境中运行。我无法使用 django 连接对象来执行该任务，因为 - django 无法释放与我的线程代码的连接，很快数据库就停止接受连接。这种混合环境是否会对 psycopg 产生影响？

很少观察到 -

在导入进程运行时运行 CLUSTER 命令 - 大小不会下降
当我停止导入进程然后运行 CLUSTER 时 - 大小确实下降
当我停止导入进程并重新开始导入进程时，然后运行CLUSTER 命令 - 大小确实下降

对这个问题有什么想法吗？

原文

We are running a python process which runs this stored procedure, which import files from a certain directory to the postgres database. These files are first get imported to a in-memory table and then to the disk table. The actual size of the in-memory table should never really grow beyond 30 MB. As this table is constantly updated, the size of the table grows (because of dead tuples). To keep things in check, we need to perform CLUSTER operation on the table. I am using psycopg2 module to run stored procedure adn CLUSTER the table, but if the import process is running the size of the table never goes down. But If I stop the import process and run CLUSTER then the size of the table goes down. Because of the performance reason, I should be able to run CLUSTER command without stopping the import procedure.

I tried manual commit, ISOLATION_LEVEL_AUTOCOMMIT but none of this has worked.
Below is the sample code of the process -

while True:
    -get the filenames in directory
    for filpath in  filenames:
        conn = psycopg2.connect("dbname='dbname' user='user' password='password'")
        cursor = conn.cursor()
        # Calls a postgresql function that reads a file and imports it into 
        # a table via INSERT statements and DELETEs any records that have the 
        # same unique key as any of the records in the file.
        cursor.execute("SELECT import('%s', '%s');" % (filepath, str(db_timestamp))
        conn.commit()
        cursor.close()
        conn.close()
        os.remove(get_media_path(fname))

With the similar conn object, I want to run CLUSTER command once an hour -

conn = psycopg2.connect("dbname='dbname' user='user' password='password'")
cursor = conn.cursor()
cursor.execute("CLUSTER table_name")
conn.commit()
cursor.close()
conn.close()

Also, I tried setting -

conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)

Another piece of information -
I have all this running inside django environment. I could not use django connection objects to do the task because - django could not release connections with my threading code and soon the database stopped accepting connections.Does this mixed environment might have an effect on psycopg?

Few observations -

Running the CLUSTER command when import process is running - size doesn't go down
When I stop the import process and then run CLUSTER - size does go down
When I stop the import process and start import process back, and after that run CLUSTER command - size does go down

Any thoughts on the problem?

分享到QQ

分享到微博