Python 中的原子状态存储?

发布于 2024-10-03 15:28:02 字数 1594 浏览 1 评论 0原文

我正在一个不可靠的系统上开展一个项目,我认为该系统随时可能失败。我想保证的是,如果我 write_state 并且机器在操作过程中失败,则 read_state 将读取有效状态或根本不读取任何状态。我已经实现了一些我认为会在下面起作用的东西——如果有人知道的话,我对对此的批评或替代解决方案很感兴趣。

我的想法:

import hashlib, cPickle, os

def write_state(logname, state):
    state_string = cPickle.dumps(state, cPickle.HIGHEST_PROTOCOL)
    state_string += hashlib.sha224(state_string).hexdigest()

    handle = open('%s.1' % logname, 'wb')
    handle.write(state_string)
    handle.close()

    handle = open('%s.2' % logname, 'wb')
    handle.write(state_string)
    handle.close()

def get_state(logname):
    def read_file(name):
        try:
            f = open(name,'rb')
            data = f.read()
            f.close()
            return data
        except IOError:
            return ''
    def parse(data):
        if len(data) < 56:
            return (None, '', False)
        hash = data[-56:]
        data = data[:-56]
        valid = hashlib.sha224(data).hexdigest() == hash
        try:
            parsed = cPickle.loads(data)
        except cPickle.UnpicklingError:
            parsed = None
        return (parsed, valid)

    data1,valid1 = parse(read_file('%s.1'%logname))
    data2,valid2 = parse(read_file('%s.2'%logname))

    if valid1 and valid2:
        return data1
    elif valid1 and not valid2:
        return data1
    elif valid2 and not valid1:
        return data2
    elif not valid1 and not valid2:
        raise Exception('Theoretically, this never happens...')

例如:

write_state('test_log', {'x': 5})
print get_state('test_log')

I'm working on a project on an unreliable system which I'm assuming can fail at any point. What I want to guarantee is that if I write_state and the machine fails mid-operation, a read_state will either read a valid state or no state at all. I've implemented something which I think will work below -- I'm interested in criticism of that or alternative solutions if anyone knows of one.

My idea:

import hashlib, cPickle, os

def write_state(logname, state):
    state_string = cPickle.dumps(state, cPickle.HIGHEST_PROTOCOL)
    state_string += hashlib.sha224(state_string).hexdigest()

    handle = open('%s.1' % logname, 'wb')
    handle.write(state_string)
    handle.close()

    handle = open('%s.2' % logname, 'wb')
    handle.write(state_string)
    handle.close()

def get_state(logname):
    def read_file(name):
        try:
            f = open(name,'rb')
            data = f.read()
            f.close()
            return data
        except IOError:
            return ''
    def parse(data):
        if len(data) < 56:
            return (None, '', False)
        hash = data[-56:]
        data = data[:-56]
        valid = hashlib.sha224(data).hexdigest() == hash
        try:
            parsed = cPickle.loads(data)
        except cPickle.UnpicklingError:
            parsed = None
        return (parsed, valid)

    data1,valid1 = parse(read_file('%s.1'%logname))
    data2,valid2 = parse(read_file('%s.2'%logname))

    if valid1 and valid2:
        return data1
    elif valid1 and not valid2:
        return data1
    elif valid2 and not valid1:
        return data2
    elif not valid1 and not valid2:
        raise Exception('Theoretically, this never happens...')

e.g.:

write_state('test_log', {'x': 5})
print get_state('test_log')

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

若无相欠,怎会相见 2024-10-10 15:28:02

你的两份副本不起作用。文件系统可以重新排序,以便在将任何文件写入磁盘之前两个文件都被截断。

有一些文件系统操作保证是原子的:将一个文件重命名为另一个文件就是其中之一,只要该文件位于一个位置或另一个位置。然而,就 POSIX 而言,它不能保证在文件内容到达磁盘之前完成移动,这意味着它只为您提供锁定。

Linux 文件系统强制文件内容在原子移动之前到达磁盘(但不是同步),因此这可以满足您的要求。 ext4 在短时间内打破了这一假设,使得这些文件实际上更有可能变成空的。这被广泛认为是一个愚蠢的举动,此后已得到纠正。

无论如何,正确的方法是:在同一目录中创建临时文件(因此它位于同一文件系统上);写入新数据; fsync 临时文件;在以前的版本上重命名它。这是操作系统可以保证的原子性。它还为您提供了耐用性,但代价是旋转磁盘,这就是应用程序开发人员不喜欢使用 fsync 并将有问题的 ext4 版本列入黑名单的原因。

Your two copies won't work. The filesystem can reorder things so that both files have been truncated before any has been written to disk.

There are a few filesystem operations that are guaranteed to be atomic: renaming a file over another is one, insofar as the file will either be in one place or another. However, as far as POSIX is concerned, it doesn't guarantee the move is done before the file contents have hit the disk, meaning it only gives you locking.

Linux filesystems have enforced that file contents hit the disk before the atomic move does (but not synchronously), so this does what you want. ext4 has broken that assumption for a short while, making those files actually more likely to end up empty. This was widely regarded as a dick move, and has been remedied since.

Anyway, the proper way to do this is: create temporary file in the same directory (so it's on the same filesystem); write new data; fsync the temporary file; rename it over the previous version. This is as atomic as the OS can guarantee. It also gives you durability at the cost of spinning up the disks, which is why app developers prefer not using fsync and blacklisting the offending ext4 versions.

世界如花海般美丽 2024-10-10 15:28:02

我将添加一个异端回应:使用 sqlite 怎么样?或者,可能是 bsddb,但是这似乎已被弃用,您必须使用第三方模块。

I will add a heretic response: what about using sqlite? Or, possibly, bsddb, however that seems to be deprecated and you would have to use a third-party module.

盗梦空间 2024-10-10 15:28:02

我对数据库工作方式的模糊记忆是这样的。它涉及三个文件。控制文件、目标数据库文件和待处理事务日志。

控制文件具有全局事务计数器和散列或其他校验和。这是一个小文件,大小为一个物理块。一次操作系统级写入。

在目标文件中有一个包含真实数据的全局事务计数器,以及哈希值或其他校验和。

有一个待处理的事务日志,它只会增长,或者是一个有限大小的循环队列,或者可能会滚动。没关系。

  1. 将所有挂起的事务记录到简单日志中。有一个序列号和更改的内容。

  2. 更新事务计数器,更新控制文件中的哈希值。一写,脸就红了。如果失败,那么什么都没有改变。如果成功,则控制文件和目标文件不匹配,表明事务已启动但尚未完成。

  3. 对目标文件进行预期的更新。查找开头并更新计数器和校验和。如果失败,则控制文件的计数器比目标文件多 1。目标文件已损坏。当此操作有效时,最后记录的事务、控制文件和目标文件都在序列号上达成一致。

您可以通过重播日志来恢复,因为您知道最后一个有效的序列号。

My vague recollection from the way databases work is this. It involves three files. A control file, the target database file and a pending transaction log.

The control file has a global transaction counter and a hash or other checksum. This is a small file that's one physical block in size. One OS-level write.

Have a global transaction counter in your target file with the real data, plus a hash or other checksum.

Have a pending transaction log that just grows or is a circular queue of a finite size, or perhaps rolls over. It doesn't much matter.

  1. Log all pending transactions to the simple log. There's a sequence number and the content of the change.

  2. Update the transaction counter, update the hash in the control file. One write, flushed. If this fails, then nothing has changed. If this succeeds, the control file and target file don't match, indicating a transaction was started but not finished.

  3. Do the expected update on the target file. Seek to the beginning and update the counter and the checksum. If this fails, the control file has a counter one more than the target file. The target file is damaged. When this works, the last logged transaction, the control file and the target file all agree on the sequence number.

You can recover by replaying the log, since you know the last good sequence number.

被翻牌 2024-10-10 15:28:02

在类似 UNIX 的系统下,通常的答案是进行链接舞蹈。使用唯一名称创建文件(使用 tmpfile 模块),然后在将内容同步到所需(发布)状态后,使用 os.link() 函数创建到目标名称的硬链接。

在此方案下,您的读者在状态正常之前不会看到该文件。链接操作是原子的。成功链接到“就绪”名称后,您可以取消临时名称的链接。如果您需要保证旧版本 NFS 的语义而不依赖于锁定守护进程,则需要处理一些额外的问题。

Under UNIX like systems the usual answer is to do the link dance. Create the file under a unique name (use the tmpfile module) then use the os.link() function to create a hard link to the destination name after you synchronized the contents into the desired (publication) state.

Under this scheme your readers don't see the file until the state is sane. The link operation is atomic. You can unlink the temporary name after you'successfully linked to the "ready" name. There are some additional wrinkles to handle if you need to guarantee semantics over old versions of NFS without depending on the locking daemons.

调妓 2024-10-10 15:28:02

我认为你可以简化一些事情

def read_file(name):
    try:
        with open(name,'rb') as f
            return f.read()
    except IOError:
        return ''

if valid1:
    return data1
elif valid2:
    return data2
else:
    raise Exception('Theoretically, this never happens...')

你可能不需要一直写入这两个文件,只需写入 file2 并将其重命名为 file1 即可。

我认为硬重置(例如断电)仍然有可能导致两个文件由于延迟写入而无法正确写入磁盘

I think you can simplify a few things

def read_file(name):
    try:
        with open(name,'rb') as f
            return f.read()
    except IOError:
        return ''

if valid1:
    return data1
elif valid2:
    return data2
else:
    raise Exception('Theoretically, this never happens...')

You probably don't need to write both files all the time, just write file2 and rename it over file1.

I think there is still a chance that a hard reset (eg power cut) could cause both files not to be written to disk properly due to delayed writing

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文