如何将文件锁定在AWS S3上?

发布于 2025-02-07 16:52:37 字数 271 浏览 1 评论 0原文

通过锁定,我并不是说对象锁定S3可用。我说的是以下情况:

我有多个(python)流程,这些过程读写并写入S3上托管的单个文件;也许该文件是需要定期更新的各种索引。

这些过程并行运行,因此我要确保在给定时间只能写入一个过程(以避免伴随写入clobbering数据)。

如果我将其写入共享文件系统,我只能问使用flock并将其用作同步对文件的访问的一种方式,但是我不能在S3 AFAICT上这样做。

在AWS S3上锁定文件的最简单方法是什么?

By locking, I don't mean the Object Lock S3 makes available. I'm talking about the following situation:

I have multiple (Python) processes that read and write to a single file hosted on S3; maybe the file is an index of sorts that needs to be updated periodically.

The processes run in parallel, so I want to make sure only a single process can ever write to the file at a given time (to avoid concomitant write clobbering data).

If I was writing this to a shared filesystem, I could just ask use flock and use that as a way to sync access to the file, but I can't do that on S3 afaict.

What is the easiest way to lock files on AWS S3?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

鲸落 2025-02-14 16:52:37

不幸的是,AWS S3没有提供本机锁定对象的方式 - 如您指出的那样,没有羊群模拟。相反,您有一些选项:例如,

使用数据库

,例如,Postgres提供咨询锁。设置此问题时,您将需要执行以下操作:

  1. 确保所有流程都可以访问数据库。
  2. 确保数据库可以处理传入的连接(如果您运行某种类型的大型处理网格,则可能需要将Postgres实例放在 pgbouncer
  3. 请注意,在完成锁之前,请勿关闭客户的会话。

使用咨询锁时,您还需要考虑其他一些警告 - 从Postgres文档中:

咨询锁和常规锁都存储在共享内存池中,其大小由配置变量max_locks_per_per_transaction和max_connections定义。必须注意不要耗尽此内存,否则服务器将根本无法授予任何锁。这对服务器可授予的咨询锁的数量施加了上限,通常在数十万,具体取决于服务器的配置方式。

在某些情况下,使用咨询锁定方法,尤其是在涉及明确订购和限制条款的查询中,必须小心以控制获得的锁,因为对SQL表达式进行了评估的顺序

请使用我看到的外部服务,

我看到人们使用了<<<<<<< a href =“ https://lockable.dev” rel =“ nofollow noreferrer”>可锁定解决此问题。来自他们的 docs ,他们似乎有一个python库:

$ pip安装可锁定-dev

from lockable import Lock

with Lock('my-lock-name'):
    #do stuff

如果您不使用Python,则可以通过击中一些HTTP端点来使用他们的服务:

curl https://api.lockable.dev/v1/acquire/my-lock-name
curl https://api.lockable.dev/v1/release/my-lock-name

Unfortunately, AWS S3 does not offer a native way of locking objects - there's no flock analogue, as you pointed out. Instead you have a few options:

Use a database

For example, Postgres offers advisory locks. When setting this up, you will need to do the following:

  1. Make sure all processes can access the database.
  2. Make sure the database can handle the incoming connections (if you're running some type of large processing grid, then you may want to put your Postgres instance behind PGBouncer)
  3. Be careful that you do not close the session from the client before you're done with the lock.

There are a few other caveats you need to consider when using advisory locks - from the Postgres documentation:

Both advisory locks and regular locks are stored in a shared memory pool whose size is defined by the configuration variables max_locks_per_transaction and max_connections. Care must be taken not to exhaust this memory or the server will be unable to grant any locks at all. This imposes an upper limit on the number of advisory locks grantable by the server, typically in the tens to hundreds of thousands depending on how the server is configured.

In certain cases using advisory locking methods, especially in queries involving explicit ordering and LIMIT clauses, care must be taken to control the locks acquired because of the order in which SQL expressions are evaluated

Use an external service

I've seen people use something like lockable to solve this issue. From their docs, they seem to have a Python library:

$ pip install lockable-dev

from lockable import Lock

with Lock('my-lock-name'):
    #do stuff

If you're not using Python, you can still use their service by hitting some HTTP endpoints:

curl https://api.lockable.dev/v1/acquire/my-lock-name
curl https://api.lockable.dev/v1/release/my-lock-name
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文