棘手的 MySQL 批处理设计
我有一个抓取工具,可以访问许多站点并查找即将发生的事件和另一个脚本,该脚本实际上应该将它们放入数据库中。目前,插入数据库是我的瓶颈,我需要一种比现在更快的方法来批处理查询。
造成这一问题的棘手之处在于,单个事件的数据跨三个表,而这三个表之间具有彼此的键。要插入单个事件,我插入位置或获取该位置已存在的 ID,然后插入实际事件文本和其他数据或获取事件 ID(如果已存在)(有些每周重复一次等),最后插入包含位置和事件 ID 的日期。
我无法使用 REPLACE INTO,因为它会孤立具有相同键的旧数据。我在 Tricky MySQL Batch Query 中询问了这个问题,但是如果 TLDR 结果是我必须检查哪个键已经存在,预分配那些不存在的键,然后为每个表进行一次插入(即在 php 中完成大部分工作)。这很好,但问题是,如果一次处理多个批次,它们都可以选择预分配相同的密钥,然后互相覆盖。无论如何,有没有办法解决这个问题,因为这样我就可以回到这个解决方案?这些批次必须能够并行工作。
我现在所拥有的是,我只是在批处理期间关闭索引并分别插入每个事件,但我需要更快的东西。任何想法都会对解决这个相当棘手的问题有所帮助。 (这些表现在是 InnoDB...事务可以帮助解决这个问题吗?)
I have a scraper which visits many sites and finds upcoming events and another script which is actually supposed to put them in the database. Currently the inserting into the database is my bottleneck and I need a faster way to batch the queries than what I have now.
What makes this tricky is that a single event has data across three tables which have keys to each other. To insert a single event I insert the location or get the already existing id of that location, then insert the actual event text and other data or get the event id if it already exists (some are repeating weekly etc.), and finally insert the date with the location and event ids.
I can't use a REPLACE INTO because it will orphan older data with those same keys. I asked about this in Tricky MySQL Batch Query but if TLDR the outcome was I have to check which keys already exist, preallocate those that don't exist then make a single insert for each of the tables (i.e. do most of the work in php). That's great but the problem is that if more than one batch was processing at a time, they could both choose to preallocate the same keys then overwrite each other. Is there anyway around this because then I could go back to this solution? The batches have to be able to work in parallel.
What I have right now is that I simply turn off the indexing for the duration of the batch and insert each of the events separately but I need something faster. Any ideas would be helpful on this rather tricky problem. (The tables are InnoDB now... could transactions help solve any of this?)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议您从 Mysql Lock Tables 开始可以用来防止其他会话在您插入数据时写入表。
例如,您可能会做类似的事情
上面做了两件事。首先,它锁定表,防止其他会话写入该表,直到完成并运行解锁语句。第二件事是$firstEntryIndex;这是将在任何后续插入查询中使用的第一个键值。
I'd recommend starting with Mysql Lock Tables which you can use to prevent other sessions from writing to the tables whilst you insert your data.
For example you might do something similar to this
The above does two things. Firstly it locks the table preventing other sessions from writing to it until you the point where you're finished and the unlock statement is run. The second thing is the $firstEntryIndex; which is the first key value which will be used in any subsequent insert queries.