随机偏移二进制二进制原始磁盘在没有任何缓存的情况下写入

发布于 2025-01-23 22:04:55 字数 1307 浏览 3 评论 0原文

对于我的应用程序，我试图确定数据备份系统是否错过了任何写作。我正在通过为1GB虚拟磁盘编写一个增量的整数对抗来做到这一点，并且要确保不错过任何写作，我可以查看恢复的快照并查看是否存在差距（即如果我看到1、2、3、0 ，0，6，7我知道备份未正确写入4和5）。这一切都在CentOS 7 VM上，其中大部分是Python 2.7写作/读取（速度不是一个巨大的问题）

我的问题的很大一部分是缓存：由于我正在模拟随机I/O，因此写信通常是从缓存中冲洗并写成磁盘，以免订单。这使得每个测试看起来都是假阳性，因为该快照时似乎缺少某些数据。再说一次，我根本不在乎效率，所以我不介意真的很慢。读取可以使用缓存，这不是问题，但这也不重要，或者另一种

方法是我尝试禁用缓存的事情：

禁用磁盘写缓存使用sudo sudo hdparm -w 0 / dev/sdb其中/dev/sdb
写入没有文件系统的原始磁盘，因此没有文件系统缓存
在python脚本中使用open 在上设置缓冲标志到0（没有python写缓存）

基本上是不可能的任务，确保我的写作按顺序排列放在磁盘上？我所需要的是写＃（n）要在写入＃（n+1）之前发生，而＃（n+1）在＃（n+2）等之前。

这是我要写的python脚本（根据磁盘的尺寸和随机种子的大小和素数更改）：

from struct import pack, unpack
import sys
SIZE,PRIME = [x],[x]
# random I/O traversal iterator
def rand_index_generator(a,b):
    ctr=0
    while True:
        yield (ctr%b)
        ctr+=a

with open('/dev/sdb', 'rb+', buffering=0) as f:
    index_gen = rand_index_generator(PRIME, SIZE)
    # random traversal using iterator above, write counter to file
    for counter in xrange(1, SIZE-16):
        f.seek(index_gen.next()*4)
        f.write(pack('>I', counter))

然后以相同的顺序验证i穿越，并观察不成文数据的空白。这是在将VM恢复回快照之后。我知道所有的遍历和写作工作都起作用，因为验证在恢复之前就不会顺利进行，但我认为某些“书面”数据在RAM中死了，并且不会将其删除为磁盘，

以确保写入命令我需要此应用程序

原文

For my application, I am attempting to determine whether a data backup system missed any writes. I am doing this by writing an incrementing integer counter to a 1GB virtual disk, and to make sure no writes were missed I can look at the reverted snapshot and see if there were any gaps (i.e. if I see 1, 2, 3, 0, 0, 6, 7 I know that the backup didn't get writes 4 and 5 correctly). This is all on a CentOS 7 VM, with mostly Python 2.7 scripts for writes/reads (speed isn't a huge concern)

A big part of my issues has been caching: since I'm simulating random I/O, writes are often flushed from caches and written to disk out of order. This makes every test appear as a false positive, since it looks like some data is missing at the time of the snapshot. Again, I don't really care about efficiency at all, so I don't mind really slow writes. Reads can use caching, that's not a problem, but also doesn't matter much one way or the other

Here are the things I have done to try to disable caching:

disable the disk write cache with sudo hdparm -W 0 /dev/sdb where /dev/sdb
writing to a raw disk with no filesystem, so no filesystem caching
set the buffering flag on with open in the Python script to 0 (no Python write cache)

Is it basically an impossible task to make sure that my writes get put on the disk in sequential order? All I need is write #(n) to happen before write #(n+1), and #(n+1) before #(n+2), etc.

This is the Python script I'm using to write to disk (SIZE and PRIME change based on the size of the disk an a random seed):

from struct import pack, unpack
import sys
SIZE,PRIME = [x],[x]
# random I/O traversal iterator
def rand_index_generator(a,b):
    ctr=0
    while True:
        yield (ctr%b)
        ctr+=a

with open('/dev/sdb', 'rb+', buffering=0) as f:
    index_gen = rand_index_generator(PRIME, SIZE)
    # random traversal using iterator above, write counter to file
    for counter in xrange(1, SIZE-16):
        f.seek(index_gen.next()*4)
        f.write(pack('>I', counter))

Then to validate I traverse in the same order and watch for gaps of unwritten data. This is after reverting the VM back to the snapshot. I know all the traversal and writing things work since validation will work smoothly with no missed writes before reverting, but I think some "written" data dies in RAM and doesn't make it to disk

Will take any suggestions to guarantee the write order I need for this application

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟若柳尘 2025-01-30 22:04:55

找到了这个问题的答案。我误解了写入原始磁盘的效果，因为我仍在打电话给操作系统写信给原始磁盘，因此没有消除操作系统。 OOPS

绕过OS缓存您应该使用OS.Open并通过OS.O_DIRECT和OS.O_SYNC flags来确保以正确的方式写入写入序列（有关这些标志的更多信息不卡在挥发性记忆中。我使用mmap和OS文件描述符，但您也可以使用普通文件喜欢此

页面大小是您操作系统的特定的。对于Linux，它是4096

代码的顶部保持不变，但这是写入循环：

PAGESIZE = 4096
filedesc = os.open('/dev/sdb', os.O_DIRECT|os.O_SYNC|os.O_RDWR)
for counter in xrange(1, SIZE-16):
    write_loc = index_gen.next()*4
    page_dist = (write_loc%PAGESIZE)
    offset = write_loc - page_dist
    bytemap = mmap.mmap(filedesc, PAGESIZE, offset=offset)
    bytemap[page_dist:(page_dist+4)] = pack('>I', counter)
    bytemap.flush()
    bytemap.close()

Found out the answer to this question. I misunderstood the effect of writing to a raw disk, it did not eliminate OS caching since I was still calling the OS to write to my raw disk. Oops

To bypass OS caches you should use os.open and pass os.O_DIRECT and os.O_SYNC flags to make sure writes happen in the correct sequence (more info on those flags) and are not stuck in volatile memory. I used mmap and os file descriptors but you could also use the normal filehandles like this

Page size is specific to your operating system. For Linux it is 4096

The top section of the code stayed the same but here is the write loop:

PAGESIZE = 4096
filedesc = os.open('/dev/sdb', os.O_DIRECT|os.O_SYNC|os.O_RDWR)
for counter in xrange(1, SIZE-16):
    write_loc = index_gen.next()*4
    page_dist = (write_loc%PAGESIZE)
    offset = write_loc - page_dist
    bytemap = mmap.mmap(filedesc, PAGESIZE, offset=offset)
    bytemap[page_dist:(page_dist+4)] = pack('>I', counter)
    bytemap.flush()
    bytemap.close()

回复收藏 0 原文

~没有更多了~