在 MySQL 中跳过重复条目的最佳实践
我之前写过一个提要聚合器,但我正在尝试对其进行一些优化。过去,使用 simplepie(php 类)解析提要时,我对每个提要项使用 get_id() 函数来返回哈希值(链接 + 标题的 md5 组合)。我将此“id”存储为 MySQL 中的“remote_id”。但是,为了确保没有重复项,我一直在对每个提要项执行 SELECT 查询,以确保“remote_id”不存在。考虑到我正在查看数千个提要,这似乎效率低下。
将remote_id转换为唯一键然后让数据库在每次传递时都无法写入新记录是最有效的吗?还有其他更好的设计方法吗?
I have written a feed aggregator before but am trying to optimize it a bit. In the past, using simplepie (php class) to parse the feeds, I have used the get_id() function for each feed item to return a hash (an md5 mix of link + title). I store this "id" as the "remote_id" in MySQL. However to ensure that I have no duplicates I've been doing a SELECT query for each feed item to ensure that the "remote_id" does not exist. This seems inefficient considering I am looking at 1000's of feeds.
Is it most efficient to just turn remote_id into a unique key and then let the database fail to write the new record on each pass? Any other way to engineer this that is better?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,如果一个键在 mysql 中应该是唯一的,那么将其定义为唯一键通常是一个好主意。
当插入可能的重复项时,您可以使用 PDO 和 try {} catch () {} 语句来过滤掉它们,它们会抛出异常。您不必事先检查。
我在类似的情况下使用类似的东西(伪代码警报):
Yes, if a key should be unique in mysql, it's generally a good idea to define it as a unique key.
When inserting possible duplicates you may use PDO and try {} catch () {} statements to filter them out, they will throw an exception. You won't have to check beforehand.
I use something like this in a similar situation (pseudocode alert):