Postgresql - 如果记录不存在则插入记录,如果存在则更新记录的干净方法
这是我的情况。我有一个包含一堆 URL 的表并进行爬行 与它们相关的日期。当我的程序处理 URL 时,我想要 插入带有爬网日期的新行。如果 URL 已经存在,我 想要将爬网日期更新为当前日期时间。使用 MS SQL 或 Oracle 我可能会为此使用 MERGE 命令。使用 mySQL 我会 可能使用 ON DUPLICATE KEY UPDATE 语法。
我可以在我的程序中执行多个查询,这可能是也可能不是 线程安全。我可以编写一个包含各种 IF...ELSE 的 SQL 函数 逻辑。然而,为了尝试 Postgres 功能,我 以前从未使用过,我正在考虑创建一个 INSERT 规则 - 像这样的东西:
CREATE RULE Pages_Upsert AS ON INSERT TO Pages
WHERE EXISTS (SELECT 1 from Pages P where NEW.Url = P.Url)
DO INSTEAD
UPDATE Pages SET LastCrawled = NOW(), Html = NEW.Html WHERE Url = NEW.Url;
这似乎确实效果很好。它可能会失去一些分数 “代码可读性”的观点,就像有人查看我的代码一样 第一次必须神奇地知道这个规则,但我 我猜这可以通过良好的代码注释来解决 文档。
这个想法还有其他缺点吗,或者可能是“你的想法” 太糟糕了,你应该这样做/这个/方式”评论?我在 PG 9.0 如果 这很重要。
更新:查询计划,因为有人想要它:)
"Insert (cost=2.79..2.81 rows=1 width=0)"
" InitPlan 1 (returns $0)"
" -> Seq Scan on pages p (cost=0.00..2.79 rows=1 width=0)"
" Filter: ('http://www.foo.com'::text = lower((url)::text))"
" -> Result (cost=0.00..0.01 rows=1 width=0)"
" One-Time Filter: ($0 IS NOT TRUE)"
""
"Update (cost=2.79..5.46 rows=1 width=111)"
" InitPlan 1 (returns $0)"
" -> Seq Scan on pages p (cost=0.00..2.79 rows=1 width=0)"
" Filter: ('http://www.foo.com'::text = lower((url)::text))"
" -> Result (cost=0.00..2.67 rows=1 width=111)"
" One-Time Filter: $0"
" -> Seq Scan on pages (cost=0.00..2.66 rows=1 width=111)"
" Filter: ((url)::text = 'http://www.foo.com'::text)"
Here's my situation. I have a table with a bunch of URLs and crawl
dates associated with them. When my program processes a URL, I want
to INSERT a new row with a crawl date. If the URL already exists, I
want to update the crawl date to the current datetime. With MS SQL or
Oracle I'd probably use a MERGE command for this. With mySQL I'd
probably use the ON DUPLICATE KEY UPDATE syntax.
I could do multiple queries in my program, which may or may not be
thread safe. I could write a SQL function which has various IF...ELSE
logic. However, for the sake of trying out Postgres features I've
never used before, I'm thinking about creating an INSERT rule -
something like this:
CREATE RULE Pages_Upsert AS ON INSERT TO Pages
WHERE EXISTS (SELECT 1 from Pages P where NEW.Url = P.Url)
DO INSTEAD
UPDATE Pages SET LastCrawled = NOW(), Html = NEW.Html WHERE Url = NEW.Url;
This seems to actually work great. It probably loses some points on
the "code readability" standpoint, as someone looking at my code for
the first time would have to magically know about this rule, but I
guess that could be solved with good code commenting and
documentation.
Are there any other drawbacks to this idea, or maybe a "your idea
sucks, you should do it /this/ way instead" comment? I'm on PG 9.0 if
that matters.
UPDATE: Query plan since someone wanted it :)
"Insert (cost=2.79..2.81 rows=1 width=0)"
" InitPlan 1 (returns $0)"
" -> Seq Scan on pages p (cost=0.00..2.79 rows=1 width=0)"
" Filter: ('http://www.foo.com'::text = lower((url)::text))"
" -> Result (cost=0.00..0.01 rows=1 width=0)"
" One-Time Filter: ($0 IS NOT TRUE)"
""
"Update (cost=2.79..5.46 rows=1 width=111)"
" InitPlan 1 (returns $0)"
" -> Seq Scan on pages p (cost=0.00..2.79 rows=1 width=0)"
" Filter: ('http://www.foo.com'::text = lower((url)::text))"
" -> Result (cost=0.00..2.67 rows=1 width=111)"
" One-Time Filter: $0"
" -> Seq Scan on pages (cost=0.00..2.66 rows=1 width=111)"
" Filter: ((url)::text = 'http://www.foo.com'::text)"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
好的,我成功创建了一个测试用例。结果是更新部分总是被执行,即使是在新插入时也是如此。 COPY 似乎绕过了规则系统。
[为了清楚起见,我已将其放入单独的回复中]
结果:
更新:只是为了证明它可以完成:
结果:
该视图对于防止重写系统进入递归是必要的。
DELETE 规则的构造留给读者作为练习。
Ok, I managed to create a testcase. The result is that the update part is always executed, even on a fresh insert. COPY seems to bypass the rule system.
[For clarity I have put this into a separate reply]
The result:
UPDATE: Just to prove it can be done:
Result:
The view is necessary to prevent the rewrite system to go into recursion.
Construction of a DELETE rule is left as an exercise to the reader.
来自应该了解这一点或非常接近这样的人的一些好观点;-)
PostgreSQL RULE 有何用处?
短篇故事:
SERIAL
配合使用效果好吗?BIGSERIAL
?INSERT
和UPDATE
的RETURNING
子句配合得很好吗?所有这些事情都归结为一个事实,即规则系统不是行驱动的,而是以您从未想象过的方式转换您的语句。
帮自己和团队成员一个忙,不要再用角色来做类似的事情了。
编辑:您的问题在 PostgreSQL 社区中得到了充分讨论。搜索关键字为:
MERGE
、UPSERT
。Some good points from someone who should know it or be very near to someone like that ;-)
What are PostgreSQL RULEs good for?
Short story:
SERIAL
andBIGSERIAL
?RETURNING
clauses ofINSERT
andUPDATE
?random()
?All these things boils down to the fact, that the rule system is not row driven but transforms your statements in a way you never imagine.
Do yourself and your team mates a favour and stop using roles for things like that.
Edit: Your problem is well discussed in the PostgreSQL community. Search keywords are:
MERGE
,UPSERT
.我不知道这是否太主观,但我对您的解决方案的看法是:这都是关于语义的。当我进行插入时,我期望插入,而不是一些可能执行插入但可能不执行的奇特逻辑。事实上,这就是函数的用途。
首先,我会尝试检查程序中的 URL,然后选择是否插入或更新。如果结果太慢,我会使用一个函数。如果您将其命名为
insert_or_update_url
,您将自动免费获得一些文档。重写规则要求你有一些隐性知识,我通常会尽量避免这种情况。从好的方面来说:如果有人复制数据但忘记了规则和函数,您的解决方案可能会悄无声息地崩溃(但这可能取决于其他约束),但缺少的函数会尖叫着崩溃。不要误会我的意思,我认为你的解决方案非常有创意和聪明。只是对我的口味来说有点太晦涩了。
I don't know if this gets too subjective but what I think about your solution is: It's all about semantics. When I do an insert, I expect an insert and not some fancy logic that maybe does an insert but maybe not. Indeed that's what functions are for.
At first I'd try checking for the URL in your program and then choosing whether to insert or update. If that turned out to be too slow, I'd use a function. If you name it like
insert_or_update_url
, you automatically get some documentation for free. The rewrite rule requires you to have some implicit knowledge and I generally try to avoid that.On the plus side: If someone copies the data but forgets rules and functions, your solution might break silently (but that may depend on other constraints), but a missing function goes down screaming. Don't get me wrong, I think your solution is very creative and smart. Just a bit too obscure for my taste.
有一个实现upsert的示例 / 合并使用Postgres文档中的简单函数。
永远不要使用规则——它们是邪恶的。
There's an example of implementing upsert / merge using simple function in Postgres documentation.
Never use rules — they're evil.
您不能在规则资格中引用除旧表和新表之外的其他表。
您应该在规则正文中执行此操作。
这都是因为规则只是一种通知重写系统应该执行和不应该执行哪些转换的方法。规则不是针对每一行执行的触发器,但它们为查询计划程序提供了良好的信息,并要求它很好地重写计划。
来自文档:
You cannot refer to other tables than old an new in the rule qualification.
You should instead do this in the rule body.
This is all because the rule is just a way to inform the rewrite system about what transformations it should and should not perform. Rules are not triggers, executing for every row, but they give the query planner a fine massage and ask it nicely to rewrite the plan.
From the docs: