我有一个服务 a ,它不断更新S3存储桶中的一组文件。
或多或少,它等效于这样的东西:
while true
do
generate file
aws cp s3 <file> <bucket>/<file>
sleep a little
done
我有一个服务 b ,它会偶尔读取该文件以更新自身内部的数据。我想要一个服务的实例 a ,而服务 b 运行100个实例。
因此,服务 a 等同于:
while true
do
aws cp s3 <bucket>/<file> <file>
update variable holding this data
sleep a little
done
目前,&lt; file&gt;
名称始终保持不变。我想知道这是否会引起问题。当我上传文件版本时,什么时候发生?旧版本是否仍然可用,直到Service b 中的副本完成,或者文件被服务覆盖 a ?
IE在我知道的所有操作系统下,如果我写入文件,则在同一位置读取一个新数据,而不是旧数据。换句话说,如果是标准操作系统文件,则读取可能会看到杂交的数据(旧数据和新数据的混合)。
是S3文件与标准OS文件相同,或者在这种情况下它们更安全在上传完成之前,不会覆盖?
注意:我特别感兴趣地拥有有关该特定情况如何工作的官方S3文档。到目前为止,我的搜索已经空了。
I have a service A which constantly updates a set of files in an S3 bucket.
More or less, it is equivalent to something like this:
while true
do
generate file
aws cp s3 <file> <bucket>/<file>
sleep a little
done
I have a service B which reads that file once in a while to update the data inside itself. I want a single instance of service A while service B runs 100 instances.
So service A has an equivalent to:
while true
do
aws cp s3 <bucket>/<file> <file>
update variable holding this data
sleep a little
done
At the moment, the <file>
name always remains the same. I'm wondering whether this can cause issues. When happens when I upload a file version of the file? Is the old version still available until the copy in service B is done, or does the file get overwritten by service A?
i.e. under all operating systems I know of, if I write to a file, a read at the same location sees the new data, not the old one. In other words, in case of a standard OS file, the read may see mangled data (a mix of old and new data).
Are S3 files the same as standard OS files, or are they safer in this case or not overwritten until an upload is done?
Note: I'm particularly interested in having an official S3 document about how this specific case works. My searches have, so far, come empty.
发布评论
评论(1)
答案在。
这是相关段落:
这清楚地表明,您
get
的数据不会像标准文件那样被覆盖。您也不知道它是旧实例还是新实例(除非您定义某些元数据或文件中有日期或序列号)。但是,在处理大文件时,api 这意味着您最终可能会复制旧文件和新文件的一部分的一部分。为了避免问题,您必须确保在不使用多个副本的情况下进行副本。
The answer is in the Amazon S3 User Guide in the Consistency Model.
Here is the pertinent paragraph:
This clearly says that the data you
GET
will not be overwritten as in the case of a standard file. You also won't know whether it is the old or new instance (unless you define some metadata or have a date or serial number in the file).However, when dealing with large files, the API automatically switches to multi-part uploads and that means you may end up copying part of the old file and parts of the new file. To avoid the issue, you must make sure to do a copy without using multiparts.