从 GoogleAppEngine 模型中删除重复项吗?
我有两个 Google App Engine 模型。我运行了几次 cron,现在我的数据存储中有重复的条目。如果删除整个数据存储并再次上传数据很容易,我会的。但上次上传花了 4 个小时,所以我想知道是否有一种快速方法可以删除模型中“标题”字段中具有重复名称的条目?
I have two Google App Engine Models. I ran my cron's a few times and now there are duplicate entries in my datastore. If it was easy to just delete my entire datastore and upload my data again I would. BUT it took 4 hours to upload last time so I am wondering is there a quick way of deleting entries with duplicate names in the "title" field within the model?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
快的?可能不会。
如果您确实想删除重复项,我的方法是编写一个remote_api 脚本。查询模型中的所有实体,按标题排序,并批量获取 100 个实体。保留本地 Python 标题字典。如果您遇到新标题,请将其添加到词典中。如果遇到已知标题,请将实体添加到删除批次,并在继续下一个查询批次之前刷新删除。
当您只需清除数据存储并重新导入时,工作量可能会很大。
Quick? Probably not.
If you did want to delete dupes, my approach would be to write a remote_api script. Query the model for all entities, sort by title, and fetch batches of 100. Keep a local Python dictionary of titles. If you encounter a new title, add it to the dictionary. If you encounter a known title, add the entity to a delete batch, and flush the deletes before moving on to the next query batch.
Probably an excessive amount of work when you can just wipe out your datastore and re-import instead.