Postgres:如果尚不存在则插入

发布于 2024-09-30 06:44:27 字数 712 浏览 7 评论 0原文

我正在使用 Python 写入 postgres 数据库:

sql_string = "INSERT INTO hundred (name,name_slug,status) VALUES ("
sql_string += hundred + ", '" + hundred_slug + "', " + status + ");"
cursor.execute(sql_string)

但是因为我的一些行是相同的,所以我收到以下错误:

psycopg2.IntegrityError: duplicate key value  
  violates unique constraint "hundred_pkey"

如何编写“INSERT 除非此行已存在”SQL 语句?

我见过推荐这样的复杂语句:

IF EXISTS (SELECT * FROM invoices WHERE invoiceid = '12345')
UPDATE invoices SET billed = 'TRUE' WHERE invoiceid = '12345'
ELSE
INSERT INTO invoices (invoiceid, billed) VALUES ('12345', 'TRUE')
END IF

但首先,这对于我的需要来说是否太过分了,其次,我如何将其中一个作为简单的字符串执行?

I'm using Python to write to a postgres database:

sql_string = "INSERT INTO hundred (name,name_slug,status) VALUES ("
sql_string += hundred + ", '" + hundred_slug + "', " + status + ");"
cursor.execute(sql_string)

But because some of my rows are identical, I get the following error:

psycopg2.IntegrityError: duplicate key value  
  violates unique constraint "hundred_pkey"

How can I write an 'INSERT unless this row already exists' SQL statement?

I've seen complex statements like this recommended:

IF EXISTS (SELECT * FROM invoices WHERE invoiceid = '12345')
UPDATE invoices SET billed = 'TRUE' WHERE invoiceid = '12345'
ELSE
INSERT INTO invoices (invoiceid, billed) VALUES ('12345', 'TRUE')
END IF

But firstly, is this overkill for what I need, and secondly, how can I execute one of those as a simple string?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(21

你如我软肋 2024-10-07 06:44:27

Postgres 9.5(自 2016 年 1 月 7 日发布)提供了一个 "upsert" 命令,也称为INSERT 的 ON CONFLICT 子句

INSERT ... ON CONFLICT DO NOTHING/UPDATE

它解决了使用并发操作时可能遇到的许多微妙问题,这是其他一些答案提出的。

Postgres 9.5 (released since 2016-01-07) offers an "upsert" command, also known as an ON CONFLICT clause to INSERT:

INSERT ... ON CONFLICT DO NOTHING/UPDATE

It solves many of the subtle problems you can run into when using concurrent operation, which some other answers propose.

跨年 2024-10-07 06:44:27

如何编写“INSERT 除非该行已存在”SQL 语句?

在 PostgreSQL 中有一种执行条件 INSERT 的好方法:

INSERT INTO example_table
    (id, name)
SELECT 1, 'John'
WHERE
    NOT EXISTS (
        SELECT id FROM example_table WHERE id = 1
    );

CAVEAT 不过,这种方法对于并发写入操作来说并不是 100% 可靠。 NOT EXISTS 反半连接中的 SELECTINSERT 本身之间存在非常微小的竞争条件。在这种情况下它可能失败。

How can I write an 'INSERT unless this row already exists' SQL statement?

There is a nice way of doing conditional INSERT in PostgreSQL:

INSERT INTO example_table
    (id, name)
SELECT 1, 'John'
WHERE
    NOT EXISTS (
        SELECT id FROM example_table WHERE id = 1
    );

CAVEAT This approach is not 100% reliable for concurrent write operations, though. There is a very tiny race condition between the SELECT in the NOT EXISTS anti-semi-join and the INSERT itself. It can fail under such conditions.

永不分离 2024-10-07 06:44:27

一种方法是创建一个非约束(无唯一索引)表来将所有数据插入其中,并执行与该表不同的选择以将插入插入到一百个表中。

水平就这么高。我假设在我的示例中所有三列都是不同的,因此对于步骤 3,将 NOT EXISTS 连接更改为仅连接百个表中的唯一列。

  1. 创建临时表。请参阅文档此处。< /p>

     创建临时表 temp_data(name, name_slug, status);
    
  2. 将数据插入临时表。

     INSERT INTO temp_data(name, name_slug, status); 
    
  3. 将任何索引添加到临时表。

  4. 执行主表插入。

     INSERT INTO Hundred(名称、name_slug、状态) 
         选择不同的名称、name_slug、状态
         从百
         不存在的地方(
             选择“X” 
             来自临时数据
             在哪里 
                 temp_data.name = 百.name
                 AND temp_data.name_slug = Hundred.name_slug
                 AND temp_data.status = 状态
         );
    

One approach would be to create a non-constrained (no unique indexes) table to insert all your data into and do a select distinct from that to do your insert into your hundred table.

So high level would be. I assume all three columns are distinct in my example so for step3 change the NOT EXISTS join to only join on the unique columns in the hundred table.

  1. Create temporary table. See docs here.

     CREATE TEMPORARY TABLE temp_data(name, name_slug, status);
    
  2. INSERT Data into temp table.

     INSERT INTO temp_data(name, name_slug, status); 
    
  3. Add any indexes to the temp table.

  4. Do main table insert.

     INSERT INTO hundred(name, name_slug, status) 
         SELECT DISTINCT name, name_slug, status
         FROM hundred
         WHERE NOT EXISTS (
             SELECT 'X' 
             FROM temp_data
             WHERE 
                 temp_data.name          = hundred.name
                 AND temp_data.name_slug = hundred.name_slug
                 AND temp_data.status    = status
         );
    
玩套路吗 2024-10-07 06:44:27

这正是我面临的问题,我的版本是 9.5

我用下面的 SQL 查询解决了它。

INSERT INTO example_table (id, name)
SELECT 1 AS id, 'John' AS name FROM example_table
WHERE NOT EXISTS(
            SELECT id FROM example_table WHERE id = 1
    )
LIMIT 1;

希望这对版本 >= 9.5 遇到相同问题的人有所帮助。

感谢您的阅读。

This is exactly the problem I face and my version is 9.5

And I solve it with SQL query below.

INSERT INTO example_table (id, name)
SELECT 1 AS id, 'John' AS name FROM example_table
WHERE NOT EXISTS(
            SELECT id FROM example_table WHERE id = 1
    )
LIMIT 1;

Hope that will help someone who has the same issue with version >= 9.5.

Thanks for reading.

失而复得 2024-10-07 06:44:27

不幸的是,PostgreSQL 既不支持 MERGE 也不支持 ON DUPLICATE KEY UPDATE,因此您必须用两个语句来完成此操作:

UPDATE  invoices
SET     billed = 'TRUE'
WHERE   invoices = '12345'

INSERT
INTO    invoices (invoiceid, billed)
SELECT  '12345', 'TRUE'
WHERE   '12345' NOT IN
        (
        SELECT  invoiceid
        FROM    invoices
        )

您可以将其包装为一个函数:

CREATE OR REPLACE FUNCTION fn_upd_invoices(id VARCHAR(32), billed VARCHAR(32))
RETURNS VOID
AS
$
        UPDATE  invoices
        SET     billed = $2
        WHERE   invoices = $1;

        INSERT
        INTO    invoices (invoiceid, billed)
        SELECT  $1, $2
        WHERE   $1 NOT IN
                (
                SELECT  invoiceid
                FROM    invoices
                );
$
LANGUAGE 'sql';

然后调用它:

SELECT  fn_upd_invoices('12345', 'TRUE')

Unfortunately, PostgreSQL supports neither MERGE nor ON DUPLICATE KEY UPDATE, so you'll have to do it in two statements:

UPDATE  invoices
SET     billed = 'TRUE'
WHERE   invoices = '12345'

INSERT
INTO    invoices (invoiceid, billed)
SELECT  '12345', 'TRUE'
WHERE   '12345' NOT IN
        (
        SELECT  invoiceid
        FROM    invoices
        )

You can wrap it into a function:

CREATE OR REPLACE FUNCTION fn_upd_invoices(id VARCHAR(32), billed VARCHAR(32))
RETURNS VOID
AS
$
        UPDATE  invoices
        SET     billed = $2
        WHERE   invoices = $1;

        INSERT
        INTO    invoices (invoiceid, billed)
        SELECT  $1, $2
        WHERE   $1 NOT IN
                (
                SELECT  invoiceid
                FROM    invoices
                );
$
LANGUAGE 'sql';

and just call it:

SELECT  fn_upd_invoices('12345', 'TRUE')
故事灯 2024-10-07 06:44:27

您可以使用 VALUES - 在 Postgres 中可用:

INSERT INTO person (name)
    SELECT name FROM person
    UNION 
    VALUES ('Bob')
    EXCEPT
    SELECT name FROM person;

You can make use of VALUES - available in Postgres:

INSERT INTO person (name)
    SELECT name FROM person
    UNION 
    VALUES ('Bob')
    EXCEPT
    SELECT name FROM person;
烟酉 2024-10-07 06:44:27

我知道这个问题是不久前提出的,但我认为这可能会对某人有所帮助。我认为最简单的方法是通过触发器。例如:

Create Function ignore_dups() Returns Trigger
As $
Begin
    If Exists (
        Select
            *
        From
            hundred h
        Where
            -- Assuming all three fields are primary key
            h.name = NEW.name
            And h.hundred_slug = NEW.hundred_slug
            And h.status = NEW.status
    ) Then
        Return NULL;
    End If;
    Return NEW;
End;
$ Language plpgsql;

Create Trigger ignore_dups
    Before Insert On hundred
    For Each Row
    Execute Procedure ignore_dups();

从 psql 提示符执行此代码(或者您喜欢直接在数据库上执行查询)。然后您可以像平常一样从 Python 插入。例如:

sql = "Insert Into hundreds (name, name_slug, status) Values (%s, %s, %s)"
cursor.execute(sql, (hundred, hundred_slug, status))

请注意,正如 @Thomas_Wouters 已经提到的,上面的代码利用了参数而不是连接字符串。

I know this question is from a while ago, but thought this might help someone. I think the easiest way to do this is via a trigger. E.g.:

Create Function ignore_dups() Returns Trigger
As $
Begin
    If Exists (
        Select
            *
        From
            hundred h
        Where
            -- Assuming all three fields are primary key
            h.name = NEW.name
            And h.hundred_slug = NEW.hundred_slug
            And h.status = NEW.status
    ) Then
        Return NULL;
    End If;
    Return NEW;
End;
$ Language plpgsql;

Create Trigger ignore_dups
    Before Insert On hundred
    For Each Row
    Execute Procedure ignore_dups();

Execute this code from a psql prompt (or however you like to execute queries directly on the database). Then you can insert as normal from Python. E.g.:

sql = "Insert Into hundreds (name, name_slug, status) Values (%s, %s, %s)"
cursor.execute(sql, (hundred, hundred_slug, status))

Note that as @Thomas_Wouters already mentioned, the code above takes advantage of parameters rather than concatenating the string.

烂柯人 2024-10-07 06:44:27

有一种在 PostgreSQL 中使用WITH查询执行条件插入的好方法:
喜欢:

WITH a as(
select 
 id 
from 
 schema.table_name 
where 
 column_name = your_identical_column_value
)
INSERT into 
 schema.table_name
(col_name1, col_name2)
SELECT
    (col_name1, col_name2)
WHERE NOT EXISTS (
     SELECT
         id
     FROM
         a
        )
  RETURNING id 

There is a nice way of doing conditional INSERT in PostgreSQL using WITH query:
Like:

WITH a as(
select 
 id 
from 
 schema.table_name 
where 
 column_name = your_identical_column_value
)
INSERT into 
 schema.table_name
(col_name1, col_name2)
SELECT
    (col_name1, col_name2)
WHERE NOT EXISTS (
     SELECT
         id
     FROM
         a
        )
  RETURNING id 
红尘作伴 2024-10-07 06:44:27

我们可以使用 upsert 来简化查询

insert into invoices (invoiceid, billed) 
  values ('12345', 'TRUE') 
  on conflict (invoiceid) do 
    update set billed=EXCLUDED.billed;

we can simplify the query using upsert

insert into invoices (invoiceid, billed) 
  values ('12345', 'TRUE') 
  on conflict (invoiceid) do 
    update set billed=EXCLUDED.billed;
┼── 2024-10-07 06:44:27

INSERT .. WHERE NOT EXISTS 是个好方法。并且可以通过事务“信封”来避免竞争条件:

BEGIN;
LOCK TABLE hundred IN SHARE ROW EXCLUSIVE MODE;
INSERT ... ;
COMMIT;

INSERT .. WHERE NOT EXISTS is good approach. And race conditions can be avoided by transaction "envelope":

BEGIN;
LOCK TABLE hundred IN SHARE ROW EXCLUSIVE MODE;
INSERT ... ;
COMMIT;
谎言月老 2024-10-07 06:44:27

您还可以使用 Postgres 15 中引入的 MERGE

不过,并不那么简单作为插入……在冲突时什么也不做

You can also use MERGE introduced in Postgres 15

Nevertheless, not as straightforward as INSERT … ON CONFLICT DO NOTHING

旧人 2024-10-07 06:44:27

规则很简单:

CREATE RULE file_insert_defer AS ON INSERT TO file
WHERE (EXISTS ( SELECT * FROM file WHERE file.id = new.id)) DO INSTEAD NOTHING

但是并发写入会失败......

It's easy with rules:

CREATE RULE file_insert_defer AS ON INSERT TO file
WHERE (EXISTS ( SELECT * FROM file WHERE file.id = new.id)) DO INSTEAD NOTHING

But it fails with concurrent writes ...

当梦初醒 2024-10-07 06:44:27

获得最多赞成票的方法(来自 John Doe)确实对我有用,但在我的例子中,从预期的 422 行中我只得到了 180 行。
我找不到任何错误,也没有任何错误,所以我寻找一种不同的简单方法。

SELECT 之后使用 IF NOT FOUND THEN 对我来说非常适合。

PostgreSQL 文档中描述)

文档中的示例:

SELECT * INTO myrec FROM emp WHERE empname = myname;
IF NOT FOUND THEN
  RAISE EXCEPTION 'employee % not found', myname;
END IF;

The approach with the most upvotes (from John Doe) does somehow work for me but in my case from expected 422 rows i get only 180.
I couldn't find anything wrong and there are no errors at all, so i looked for a different simple approach.

Using IF NOT FOUND THEN after a SELECT just works perfectly for me.

(described in PostgreSQL Documentation)

Example from documentation:

SELECT * INTO myrec FROM emp WHERE empname = myname;
IF NOT FOUND THEN
  RAISE EXCEPTION 'employee % not found', myname;
END IF;
唐婉 2024-10-07 06:44:27

psycopgs 游标类具有属性 rowcount

该只读属性指定最后一次读取的行数
产生的execute*()(对于像SELECT这样的DQL语句)或受影响的(对于
DML 语句(如 UPDATE 或 INSERT)。

因此,您可以先尝试 UPDATE,然后仅在 rowcount 为 0 时尝试 INSERT。

但是,根据数据库中的活动级别,您可能会遇到 UPDATE 和 INSERT 之间的竞争条件,其中另一个进程可能会在此期间创建该记录。

psycopgs cursor class has the attribute rowcount.

This read-only attribute specifies the number of rows that the last
execute*() produced (for DQL statements like SELECT) or affected (for
DML statements like UPDATE or INSERT).

So you could try UPDATE first and INSERT only if rowcount is 0.

But depending on activity levels in your database you may hit a race condition between UPDATE and INSERT where another process may create that record in the interim.

迷荒 2024-10-07 06:44:27

您的列“hundred”似乎被定义为主键,因此必须是唯一的,但事实并非如此。问题不在于,而在于您的数据。

我建议你插入一个 id 作为序列类型来处理主键

Your column "hundred" seems to be defined as primary key and therefore must be unique which is not the case. The problem isn't with, it is with your data.

I suggest you insert an id as serial type to handly the primary key

安人多梦 2024-10-07 06:44:27

如果您说许多行都是相同的,您将多次结束检查。您可以发送它们,数据库将使用 ON CONFLICT 子句确定是否插入它,如下所示

  INSERT INTO Hundred (name,name_slug,status) VALUES ("sql_string += hundred  
  +",'" + hundred_slug + "', " + status + ") ON CONFLICT ON CONSTRAINT
  hundred_pkey DO NOTHING;" cursor.execute(sql_string);

If you say that many of your rows are identical you will end checking many times. You can send them and the database will determine if insert it or not with the ON CONFLICT clause as follows

  INSERT INTO Hundred (name,name_slug,status) VALUES ("sql_string += hundred  
  +",'" + hundred_slug + "', " + status + ") ON CONFLICT ON CONSTRAINT
  hundred_pkey DO NOTHING;" cursor.execute(sql_string);
凡间太子 2024-10-07 06:44:27

在 Postgres 9.5 或更高版本中,您可以使用 ON CONFLICT 来避免像上面提到的 @Arie 这样的约束错误。要了解与此 INSERT 查询相关的更多选项,请参阅 Postgres 文档
另一种解决方案是使用 try/catch 来处理运行时错误

In Postgres version 9.5 or higher you can use ON CONFLICT to avoid errors of contraints like @Arie mentioned above. To know more options related to this INSERT query refer to Postgres Docs.
An alternative solution is by using try/catch to handle runtime errors

夏尔 2024-10-07 06:44:27
INSERT INTO invoices (invoiceid, billed) (
    SELECT '12345','TRUE' WHERE NOT EXISTS (
        SELECT 1 FROM invoices WHERE invoiceid='12345' AND billed='TRUE'
        )
)
INSERT INTO invoices (invoiceid, billed) (
    SELECT '12345','TRUE' WHERE NOT EXISTS (
        SELECT 1 FROM invoices WHERE invoiceid='12345' AND billed='TRUE'
        )
)
烟雨扶苏 2024-10-07 06:44:27

我一直在寻找类似的解决方案,试图找到适用于 PostgreSQL 和 HSQLDB 的 SQL。 (HSQLDB 就是让这变得困难的原因。)以您的示例为基础,这是我在其他地方找到的格式。

sql = "INSERT INTO hundred (name,name_slug,status)"
sql += " ( SELECT " + hundred + ", '" + hundred_slug + "', " + status
sql += " FROM hundred"
sql += " WHERE name = " + hundred + " AND name_slug = '" + hundred_slug + "' AND status = " + status
sql += " HAVING COUNT(*) = 0 );"

I was looking for a similar solution, trying to find SQL that work work in PostgreSQL as well as HSQLDB. (HSQLDB was what made this difficult.) Using your example as a basis, this is the format that I found elsewhere.

sql = "INSERT INTO hundred (name,name_slug,status)"
sql += " ( SELECT " + hundred + ", '" + hundred_slug + "', " + status
sql += " FROM hundred"
sql += " WHERE name = " + hundred + " AND name_slug = '" + hundred_slug + "' AND status = " + status
sql += " HAVING COUNT(*) = 0 );"
2024-10-07 06:44:27

这是一个通用的 python 函数,给定表名、列和值,生成 postgresql 的 upsert 等效项。

导入 json

def upsert(table_name, id_column, other_columns, values_hash):

    template = """
    WITH new_values ($ALL_COLUMNS$) as (
      values
         ($VALUES_LIST$)
    ),
    upsert as
    (
        update $TABLE_NAME$ m
            set
                $SET_MAPPINGS$
        FROM new_values nv
        WHERE m.$ID_COLUMN$ = nv.$ID_COLUMN$
        RETURNING m.*
    )
    INSERT INTO $TABLE_NAME$ ($ALL_COLUMNS$)
    SELECT $ALL_COLUMNS$
    FROM new_values
    WHERE NOT EXISTS (SELECT 1
                      FROM upsert up
                      WHERE up.$ID_COLUMN$ = new_values.$ID_COLUMN$)
    """

    all_columns = [id_column] + other_columns
    all_columns_csv = ",".join(all_columns)
    all_values_csv = ','.join([query_value(values_hash[column_name]) for column_name in all_columns])
    set_mappings = ",".join([ c+ " = nv." +c for c in other_columns])

    q = template
    q = q.replace("$TABLE_NAME$", table_name)
    q = q.replace("$ID_COLUMN$", id_column)
    q = q.replace("$ALL_COLUMNS$", all_columns_csv)
    q = q.replace("$VALUES_LIST$", all_values_csv)
    q = q.replace("$SET_MAPPINGS$", set_mappings)

    return q


def query_value(value):
    if value is None:
        return "NULL"
    if type(value) in [str, unicode]:
        return "'%s'" % value.replace("'", "''")
    if type(value) == dict:
        return "'%s'" % json.dumps(value).replace("'", "''")
    if type(value) == bool:
        return "%s" % value
    if type(value) == int:
        return "%s" % value
    return value


if __name__ == "__main__":

    my_table_name = 'mytable'
    my_id_column = 'id'
    my_other_columns = ['field1', 'field2']
    my_values_hash = {
        'id': 123,
        'field1': "john",
        'field2': "doe"
    }
    print upsert(my_table_name, my_id_column, my_other_columns, my_values_hash)

Here is a generic python function that given a tablename, columns and values, generates the upsert equivalent for postgresql.

import json

def upsert(table_name, id_column, other_columns, values_hash):

    template = """
    WITH new_values ($ALL_COLUMNS$) as (
      values
         ($VALUES_LIST$)
    ),
    upsert as
    (
        update $TABLE_NAME$ m
            set
                $SET_MAPPINGS$
        FROM new_values nv
        WHERE m.$ID_COLUMN$ = nv.$ID_COLUMN$
        RETURNING m.*
    )
    INSERT INTO $TABLE_NAME$ ($ALL_COLUMNS$)
    SELECT $ALL_COLUMNS$
    FROM new_values
    WHERE NOT EXISTS (SELECT 1
                      FROM upsert up
                      WHERE up.$ID_COLUMN$ = new_values.$ID_COLUMN$)
    """

    all_columns = [id_column] + other_columns
    all_columns_csv = ",".join(all_columns)
    all_values_csv = ','.join([query_value(values_hash[column_name]) for column_name in all_columns])
    set_mappings = ",".join([ c+ " = nv." +c for c in other_columns])

    q = template
    q = q.replace("$TABLE_NAME$", table_name)
    q = q.replace("$ID_COLUMN$", id_column)
    q = q.replace("$ALL_COLUMNS$", all_columns_csv)
    q = q.replace("$VALUES_LIST$", all_values_csv)
    q = q.replace("$SET_MAPPINGS$", set_mappings)

    return q


def query_value(value):
    if value is None:
        return "NULL"
    if type(value) in [str, unicode]:
        return "'%s'" % value.replace("'", "''")
    if type(value) == dict:
        return "'%s'" % json.dumps(value).replace("'", "''")
    if type(value) == bool:
        return "%s" % value
    if type(value) == int:
        return "%s" % value
    return value


if __name__ == "__main__":

    my_table_name = 'mytable'
    my_id_column = 'id'
    my_other_columns = ['field1', 'field2']
    my_values_hash = {
        'id': 123,
        'field1': "john",
        'field2': "doe"
    }
    print upsert(my_table_name, my_id_column, my_other_columns, my_values_hash)
日暮斜阳 2024-10-07 06:44:27

解决方案很简单,但不是立即解决。
如果您想使用此指令,则必须对 db:
进行一项更改,

ALTER USER user SET search_path to 'name_of_schema';

这些更改后“INSERT”将正常工作。

The solution in simple, but not immediatly.
If you want use this instruction, you must make one change to the db:

ALTER USER user SET search_path to 'name_of_schema';

after these changes "INSERT" will work correctly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文