AWS S3 复制 DstObjectHardDeleted 复制期间出现错误

发布于 2025-01-17 08:12:30 字数 872 浏览 0 评论 0原文

背景:我们目前正在尝试从 1 个 AWS 账户切换到另一个账户。这包括将 S3 存储桶的完整副本获取到新帐户中(包括所有历史版本和时间戳)。我们首先启动到新帐户的 S3 存储桶的复制,运行批处理作业来复制历史数据,然后对其进行测试。之后,我们清空存储桶以删除测试期间添加的数据,然后尝试重做复制/批处理作业。

现在看来 AWS 不会复制这些对象,因为它发现它们曾经存在于存储桶中。查看批处理作业的输出,每个对象都显示了以下内容:

{bucket} {key} {version} failed 500 DstObjectHardDeleted Currently object can't be replicated if this object previously existed in the destination but was recently deleted. Please try again at a later time

看到此内容后,我完全删除了目标存储桶并重新创建了它,希望它能够清除以前的数据痕迹,然后我重试了。出现同样的错误。

我找不到有关此错误的任何信息,甚至无法在 AWS 文档中找到关于这是预期问题或潜在问题的确认。

谁能告诉我我们要等多久才能再次复制?一个小时? 24?

AWS 中有关于此错误的任何文档吗?

有没有办法绕过这个限制?


更新:全天定期重试,但从未复制上传。我还尝试复制到第三个存储桶,然后启动从该新存储桶到原始目标的复制。它抛出同样的错误。


更新2:这篇文章是在周五发布的。今天(下周一)重试作业,错误仍然存​​在。


Update3:可能是最后一次更新。简短的版本是我放弃了,并制作了一个不同的桶来复制它。如果有人有这方面的信息,我仍然感兴趣,我只是不能再浪费时间了。

Background: We are currently trying to cutover from 1 AWS account to another. This includes getting a full copy of the S3 buckets into the new account (including all historical versions and timestamps). We first initiated replication to the new account's S3 buckets, ran a batch job to copy the historical data, and then tested against it. Afterward, we emptied the bucket to remove data added during testing, and then tried to redo the replication/batch job.

Now it seems AWS will not replicate the objects because it sees they did at one point exist in the bucket. Looking at the batch job's output, every object shows this:

{bucket} {key} {version} failed 500 DstObjectHardDeleted Currently object can't be replicated if this object previously existed in the destination but was recently deleted. Please try again at a later time

After seeing this, I deleted the destination bucket completely and recreated it, in the hope that it would flush out any previous traces of the data, and then I retried it. The same error occurs.

I cannot find any information on this error or even an acknowledgement in the AWS docs that this is expected or a potential issue.

Can anyone tell me how long we have to wait before replicating again? an hour? 24?

Is there any documentation on this error in AWS?

Is there anyway to get around this limitation?


Update: Retried periodically throughout the day, and never got an upload to replicate. Also I tried replicating instead to a third bucket, and then initiate replication from that new bucket to the original target. It throws the same error.


Update2: This post was made on a Friday. Retried the jobs today (the following Monday), and the error remains unchanged.


Update3: Probably the last update. Short version is I gave up, and made a different bucket to replicate it. If anyone has information on this, I'm still interested, I just can't waste anymore time on it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

只有影子陪我不离不弃 2025-01-24 08:12:30

批量复制不支持从目标存储桶中重新复制硬删除(随对象版本一起删除)的对象。

以下是此限制的可能解决方法:

  • 使用批量复制作业将源对象复制到位。复制那些
    到位的对象将创建对象的新版本
    源并自动启动复制到目标。你
    还可以使用自定义脚本在源中进行就地复制
    桶。

  • 将这些源对象重新复制到不同/新的目标存储桶。

  • 运行 aws s3 同步命令。它将使用新版本 ID 将对象复制到目标存储桶(源存储桶和目标存储桶中的版本 ID 将不同)。如果您要同步大量对象,请在前缀级别运行它,并根据您的网络吞吐量确定复制所有对象的大致时间。使用“&”在后台运行命令在最后。您也可以在实际复制之前进行试运行。请参阅更多选项。

    aws s3 同步 s3://SOURCE-BUCKET/prefix1 s3://DESTINATION-BUCKET/prefix1 --dryrun >输出.txt

    aws s3 同步 s3://SOURCE-BUCKET/prefix1 s3://DESTINATION-BUCKET/prefix1 >输出.txt &

总之,您可以仅为新版本 ID 对象执行 S3 批量复制或 S3 复制到现有目标存储桶。要复制源存储桶的现有版本 ID 对象,您必须使用不同/新的目标存储桶。

Batch Replication does not support re-replicating objects that were hard-deleted (deleted with the version of the object) from the destination bucket.

Below are possible workaround for this limitation:

  • Copy the source objects in place with a Batch Copy job. Copying those
    objects in place will create new versions of the objects in the
    source and initiate replication automatically to the destination. You
    may also use a custom script to do an in-place copy in the source
    bucket.

  • Re-replicate these source objects to a different/new destination bucket.

  • Run aws s3 sync command. It will copy objects to destination bucket with new version IDs (Version IDs will be different in source and destination buckets). If you are syncing large number of objects, run it at prefix level and determine the approximate time to replicate all objects depending on your network throughput. Run command in background with "&" at the end. You may also do dryrun before actual copy. Refer for more options.

    aws s3 sync s3://SOURCE-BUCKET/prefix1 s3://DESTINATION-BUCKET/prefix1 --dryrun > output.txt

    aws s3 sync s3://SOURCE-BUCKET/prefix1 s3://DESTINATION-BUCKET/prefix1 > output.txt &

In summary, you can do S3 batch copy OR S3 replication to existing destination bucket only for new version ID objects. To replicate existing version ID objects of source bucket, you will have to use different/new destination bucket.

七度光 2025-01-24 08:12:30

我们遇到了同样的事情并尝试了您概述的相同过程。我们确实在第二个帐户复制批处理作业中成功获得了一些存储桶,但最大数据量略低于 200 万条。我们必须使用 aws cli 来同步数据或使用 DataSync 服务(此过程仍在进行中,可能需要运行多次以打破记录)。

看来,当删除第一个帐户中的大存储桶时,有关它们的元数据会保留很长时间。我们移动了大约 150 个包含不同数量数据的存储桶。只有大约一半的人进入了执行两步复制的第二个帐户。因此,我学到的教训是,如果您可以控制存储桶的名称并在移动过程中更改它们,那就这样做。

We encountered the same thing and tried the same process you outlined. We did get some of the buckets to succeeded in the second account replication batch job but the largest amount of data was just below 2 million count. We have had to use the aws cli to sync the data or use the DataSync service (this process is still ongoing and may have to run many times breaking up the records).

It appears that when deleting large buckets in the first account, the metadata about them is hanging around for a long time. We moved about 150 buckets with varying amounts of data. Only about half made it to the second account doing the two step replication. So the lesson I learned is if you can control the name of your buckets and change them during the move, do that.

输什么也不输骨气 2025-01-24 08:12:30

我们遇到了同样的问题。
对于我们来说,保持存储桶名称不变非常重要,我们还需要文件的原始创建日期。
我们决定删除目标存储桶,希望在圣诞节假期两周后 AWS 能够删除有关该存储桶的元数据。
这确实有效。在删除存储桶大约两周后,我们重新创建了它,并且复制能够再次成功。

We encountered the same issue.
For us, it was quite important to keep the bucket name as-is and we also require the original created date of the files.
We decided to delete the destination bucket, hoping that after the 2 weeks christmas break AWS would have deleted the metadata about the bucket.
And indeed that worked. After the bucket was deleted for roughly 2 weeks, we recreated it and the replication was able to succeed again.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文