Windows 资源管理器目录作为捆绑包

发布于 2024-07-09 21:15:14 字数 654 浏览 8 评论 0原文

一段时间以来,我一直在研究一种防止用户意外进入应用程序数据目录的方法。

我的应用程序使用一个文件夹来存储结构化项目。 文件夹内部结构很重要,不应弄乱。 我希望我的用户看到这个文件夹作为一个整体,但无法打开它(就像 Mac 捆绑包一样)。

有没有办法在 Windows 上做到这一点?

根据当前答案进行编辑

当然,我并不是试图阻止我的用户访问他们的数据,只是保护他们免于意外破坏数据完整性。 因此不需要加密或密码保护。

感谢大家的 .Net 回答,但不幸的是,这主要是一个 C++ 项目,不依赖于 .Net 框架。

我提到的数据不是光数据,它们是从电子显微镜获得的图像。 这些数据可能很大(~100 MiB 到~1 GiB),因此无法将所有内容加载到内存中。 这些是巨大的图像,因此存储必须提供一种通过一次访问一个文件来增量读取数据的方法,而无需将整个存档加载到内存中。

此外,该应用程序主要是遗留的,其中一些组件我们甚至不负责。 允许我保留当前 IO 代码的解决方案是更好的。

Shell 扩展看起来很有趣,我将进一步研究该解决方案。

LarryF,您能详细说明 Filter Driver 或 DefineDOSDevice 吗? 我对这些概念并不熟悉。

I have been investigating for some time now a way to prevent my user from accidently entering a data directory of my application.

My application uses a folder to store a structured project. The folder internal structure is critic and should not be messed up. I would like my user to see this folder as a whole and not be able to open it (like a Mac bundle).

Is there a way to do that on Windows?

Edit from current answers

Of course I am not trying to prevent my users from accessing their data, just protecting them from accidentally destroying the data integrity. So encryption or password protection are not needed.

Thank you all for your .Net answers but unfortunately, this is mainly a C++ project without any dependency to the .Net framework.

The data I am mentioning are not light, they are acquired images from an electronic microscope. These data can be huge (~100 MiB to ~1 GiB) so loading everything in memory is not an option. These are huge images so the storage must provide a way to read the data incrementally by accessing one file at a time without loading the whole archive in memory.

Besides, the application is mainly legacy with some components we are not even responsible of. A solution that allows me to keep the current IO code is preferable.

Shell Extension looks interesting, I will investigate the solution further.

LarryF, can you elaborate on Filter Driver or DefineDOSDevice ? I am not familiar with these concepts.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

绅刃 2024-07-16 21:15:14

您可以执行以下操作:

一件事是您可以创建一个FolderView Windows Shell 扩展,该扩展将为您的关键文件夹创建自定义视图。 通过创建自定义的FolderView,您可以使文件夹变成空白的白色,并带有一行文本“此处无任何内容”,或者您可以执行一些更复杂的操作,例如使用相同方法的GAC 查看器。 此方法相当复杂,但可以通过使用 this CodeProject< 之类的方法来减轻这种复杂性/a> 文章的库作为基础。

另一种解决方案是使用 ZIP 虚拟文件系统,这需要您替换直接使用 System.IO 的任何代码以使用其他内容。 ASP.NET 2.0 正是出于这个原因而这样做的,您可以非常轻松地在此基础上进行构建,请查看此 关于实现 VirtualPathProvider 的 MSDN 文章

There are a couple things that you could do:

One thing is that you could create a FolderView Windows Shell Extension that would create a custom view for your critical folder. By creating a custom FolderView you could make the folder just blank white with one line of text "Nothing to see here", or you could do something more complication like the GAC viewer which uses this same method. This method would be fairly complex, but this complexity can be mitigated by using something like this CodeProject article's library as a base.

Another solution would be to do ZIP Virtual Filesystem, this would require you to replace any code that uses System.IO directly to use something else. ASP.NET 2.0 did this for this exact reason and you could build ontop of that pretty easily, take a look at this MSDN Article on implementing a VirtualPathProvider.

岁月蹉跎了容颜 2024-07-16 21:15:14

结构化存储专为您的场景而设计描述:

结构化存储通过将单个文件作为称为存储和流的对象的结构化集合来处理,从而在 COM 中提供文件和数据持久性。

“存储”类似于文件夹,“流”类似于文件。 基本上,您有一个文件,当使用结构化存储 API 访问时,该文件的行为和外观就像一个完整的、独立的文件系统。

但请注意:

对 COM 技术的深入理解是开发使用结构化存储的先决条件。

Structured Storage was designed for the scenario that you describe:

Structured Storage provides file and data persistence in COM by handling a single file as a structured collection of objects known as storages and streams.

A "storage" is analogous to a folder, and a "stream" is analogous to a file. Basically you have a single file that, when accessed using the Structured Storage APIs, behaves and looks like a complete, self-contained file system.

Take note, though, that:

A solid understanding of COM technologies is prerequisite to the developmental use of Structured Storage.

冬天旳寂寞 2024-07-16 21:15:14

在你的程序内部还是外部?

方法是有的,但没有一个是容易的。 您可能会查看文件系统上的过滤器驱动程序。

Inside, or outside of your program?

There are ways, but none of them easy. You are probably going to be looking at a Filter Driver on the file system.

瑕疵 2024-07-16 21:15:14

如果您采用 ZIP 文件方法(我为您考虑过,但没有提及),我建议使用 deflate 算法,但使用您自己的文件系统...看看类似 TAR 格式的文件。 然后,只需编写代码,以便在将所有 I/O 写入磁盘时通过 Inflate/Deflate 算法。 我不会使用 ZIP“格式”,因为它太容易查看文件,找到 PK 作为前两个字节,然后解压缩文件......

我最喜欢 Joshperry 的建议。

当然,您也可以编写一个设备驱动程序,将所有数据存储在一个文件中,但同样,我们正在研究驱动程序。 (我不确定你是否可以在驱动程序之外实现它。你可能可以,并且在你的程序中调用 DefineDOSDevice,给它一个只有你的代码可以访问的名称,它将被视为普通的文件系统。 )。 我会尝试一些想法,如果有效,我会给你拍一个样本。 现在你让我感兴趣了。

If you take the ZIP file aproach, (which I considered for you, but didn't mention it) I would suggest using the deflate algorithm, but use your own File System... Look at something like the TAR format. Then, just write your code to pass ALL I/O past the Inflate/Deflate algorithms as they get written to disk. I wouldn't use the ZIP "FORMAT", as it's far too easy look at the file, find the PK as the first two bytes, and unzip your file....

I like Joshperry's suggestions best.

Of course, you can also write a device driver that stores all your data inside a single file, but again, we're looking at a driver. (I'm not certain you could implement it outside of a driver.. You PROBABLY can, and inside your program call DefineDOSDevice, giving it a name that only your code has access to, and it will be seen as a normal file system.). I'll play with some ideas, and if they work, I'll shoot you a sample. Now you got me interested.

携余温的黄昏 2024-07-16 21:15:14

您可以将项目目录包装到 .zip 文件中并将数据存储在那里,就像使用 .jar 一样(我知道 .jar 几乎是只读的,这是为了示例)。 制作一个非标准扩展,这样双击不会立即生效,完成。 ;-)

当然,这意味着您必须包装所有文件 IO 才能使用 .zip,具体取决于您的程序的构建方式,这可能会很乏味。 Java 已经完成了:TrueZip。 也许你可以用它作为灵感?

如果您已经受到诱惑 - 我不建议摆弄文件夹权限,因为显而易见的原因这不会有帮助。

You could wrap your project directory into a .zip file and store your data there, much like a .jar is used (I know a .jar is pretty much read only, it's for sake of the example). Make a non-standard extension so that a double-click has no immediate effect, done. ;-)

Of course this means you would have to wrap all your file IO to use the .zip instead, depending on how your program is built this could be tedious. It's been done already for Java: TrueZip. Maybe you can use that as an inspiration?

If you have been tempted - I would not recommend fiddling with folder permissions, for obvious reasons this is not going to help.

回忆那么伤 2024-07-16 21:15:14

您可以使用隔离存储。

http://www.ondotnet.com/pub/ a/dotnet/2003/04/21/isolatedstorage.html

它并没有解决所有问题,但它确实使应用程序数据免受损害。

You could use Isolated Storage.

http://www.ondotnet.com/pub/a/dotnet/2003/04/21/isolatedstorage.html

It doesn't solve all of the problems, but it does put app data well out of harm's way.

无所谓啦 2024-07-16 21:15:14

请记住:如果您将其存储在文件系统中,用户将始终能够看到它。 篡改资源管理器,我使用 cmd.exe 代替。 或总司令。 或者其他什么。

如果您不希望人们弄乱您的文件,我建议对

  • 文件
  • 它们进行加密以防止篡改将它们放入存档(即 ZIP)中的 ,可能对其进行密码保护,然后在运行时压缩/解压缩(i会查找快速修改档案的算法)

这当然不是完全的保护,但它的实现相当简单,不需要您在操作系统内安装时髦的东西,并且应该让大多数好奇的用户望而却步。

当然,如果不控制计算机本身,您将永远无法完全控制用户计算机上的文件。

Keep in mind: if you store it in the File system, the user will ALWAYS be able to see it. Tamper with explorer and i use cmd.exe instead. Or Total Commander. Or anything else.

If you don't want people to mess with your files, I'd recommend

  • encrypting them to prevent tampering with the files
  • putting them in an archive (i.e. ZIP), possibly password-protecting it, and then compressing/uncompressing at runtime (i would look up algorithms who are fast at modifying archives)

That is not full protection of course, but it's rather straight-forward to implement, does not require you to install funky stuff inside the operating system and should keep away most curious users.

Of course, you will never ever be able to fully control files on a users computer without controlling the computer itself.

颜漓半夏 2024-07-16 21:15:14

我见过一些软件(Visual Paradigm 的 Agilian)使用 Tomalak 建议的 zip 存档作为“项目文件”。 Zip 文件很好理解,使用非标准文件扩展名确实可以防止临时用户弄乱“文件”。 这样做的一大优点是,如果发生损坏,可以使用标准工具来解决问题,并且您不必担心创建特殊工具来支持您的主应用程序。

I've seen software (Visual Paradigm's Agilian) that used Tomalak's suggestion of a zip archive as the 'project file'. Zip files are well understood, and the use of a non-standard file extension does prevent the casual user from messing with the 'file'. One big advantage to this is that in the event of corruption, standard tools can be used to fix the problem, and you don't have to worry about creating special tools to support your main application.

寒江雪… 2024-07-16 21:15:14

看起来一些 FUSE 的 Windows 端口 开始出现。 我认为这将是最好的解决方案,因为它允许我保持遗留代码(相当大)不变。

Looks like some Windows ports of FUSE are starting to appear. I think this would be the best solution since it would allow me to keep the legacy code (which is quite large) untouched.

迷你仙 2024-07-16 21:15:14

我很高兴听到你用 C++ 来做这件事。 似乎没有人再认为 C++ 是“必需的”。 这都是 C#,而 ASP.NET 是……即使我在一个全 C# 的房子里工作,当我发誓我永远不会切换时,因为 C++ 做了我需要做的一切,然后一些。 我已经长大了,可以清理自己的记忆了! 呵呵.. 无论如何,回到手头的问题...

DefineDOSDevice() 是一种用于分配驱动器号、端口名称(LPT1、COM1 等)的方法。 您向它传递一个名称、一些标志和一个处理该设备的“路径”。 但是,不要让它欺骗你。 它不是文件系统路径,而是 NT 对象路径。 我确信您已经将它们视为“\Device\HardDisk0”等。您可以使用 sysinternals 中的 WinObj.exe 来了解我的意思。 无论如何,您可以创建一个设备驱动程序,然后将 MSDOS 符号链接指向它,然后就可以开始运行了。 但当然,对于最初的问题来说,这似乎需要做很多工作。

典型目录中有多少个兆字节到千兆字节的文件? 您可能最好将所有文件放在一个巨大文件中,并在其旁边存储一个索引文件(或每个文件的标头),该索引文件指向“虚拟文件系统”文件中的下一个“文件”。

一个很好的例子是查看 Microsoft MSN Archive 格式。 当我在一家 AV 公司工作时,我反转了这种存档格式,它实际上非常有创意,但非常简单。 它可以在一个文件中完成,如果您想变得更奇特,您可以将数据存储在 RAID 5 类型配置中的 3 个文件中,因此如果 3 个文件中的任何一个被破坏,你可以重建其他人。 另外,用户只会在目录中看到 3 个非常 大文件,并且无法访问各个(内部)文件。

我为您提供了解压其中一种 MSN Archive 格式的代码。 我没有创建一个的代码,但是从提取源代码中,您可以毫无问题地构造/编写一个。 如果文件经常被删除和/或重命名,则可能会导致文件中的已用空间出现问题,必须不时地修剪文件。

这种格式甚至支持 CRC 字段,因此您可以测试文件是否正常输出。 我始终无法完全逆转 Microsoft 用于 CRC 数据的算法,但我有一个很好的主意。

您将无法保留当前的 ​​I/O 例程,这意味着 CreateFile() 不仅能够打开存档中的任何文件,但是,凭借 C++ 的超酷功能,您可以覆盖 CreateFile 调用来实现您的存档格式。

如果您需要他的帮助,并且这是一个足够大的问题,也许我们可以离线交谈并为您找到解决方案。

我并不反对为您编写一个 FileSystemDriver,但为此,我们必须开始讨论补偿问题。 我非常乐意免费为您提供指导和想法,就像我现在所做的那样。

我不确定在这里给你我的电子邮件地址是否符合犹太教规,我不确定 SO 对此的政策,因为我们可能会谈论潜在的工作/招揽,但这不是我的唯一意图。 我宁愿先帮助您找到自己的解决方案。

在查看设备驱动程序之前,请下载 WinDDK。 上面到处都是驱动程序样本。

如果你想知道为什么我如此关心这个,那是因为我多年来一直在写一个与此类似的驱动程序,它必须与 Windows AND OSX 兼容,这将允许用户无需安装任何驱动程序或复杂(且笨重,有时烦人)的软件即可保护驱动器卷(USB 密钥、可移动卷)。 近年来,很多硬件厂商都做了类似的事情,但我认为安全性并不是那么可靠。 我正在考虑使用 RSA 和 AES,这与 GPG 和 PGP 的工作方式完全相同。 最初有人联系我是因为(我相信,但没有证据)将用于保护 MP3 文件。 由于它们以加密格式存储,因此如果没有正确的密码,它们就无法工作。 但是,我也看到了它的其他用途。 (当时 16 兆(是的 MEG)USB 密钥的成本超过 100 美元左右)。

这个项目还与我的石油和天然气行业 PC 安全系统一起使用,该系统使用类似于智能卡的东西,只是更容易使用、重复使用/重新发行、不可能(读:非常困难,而且不太可能)被黑客攻击,并且我可以在家里给我自己的孩子使用它! (因为总是在争论谁有时间使用计算机,谁获得最多时间,等等,等等,等等……)

唷.. 我想我在这里偏离了主题。 无论如何,这里是 Microsoft MSN 存档格式的示例。 看看您是否可以使用类似的东西,知道当您在主文件中解析/搜索请求的文件时,您始终可以通过跟踪文件中的偏移量“跳”到文件; 或者在内存中保存的预先解析的数据中。 由于您不会将原始二进制文件数据加载到内存中,因此唯一的限制可能是 32 位计算机上的 4GB 文件限制。

MARC(Microsoft MSN Archive)格式的布局(松散)如下:

  • 12 字节标头(只有一个)
    • 文件魔法
    • MARC版本
    • 文件数量(如下表所示)
  • 68 字节文件表标头(其中 1 到 Header.NumFiles)
    • 文件名
    • 文件大小
    • 校验和
    • 原始文件数据的偏移量

和偏移现在,在 12 字节文件表条目中,32 位用于文件长度和偏移量。 对于非常大的文件,您可能必须将其增加到 48 或 64 位整数。

这是我编写的一些代码来处理这些问题。

#define MARC_FILE_MAGIC         0x4352414D // In Little Endian
#define MARC_FILENAME_LEN       56 //(You'll notice this is rather small)
#define MARC_HEADER_SIZE        12
#define MARC_FILE_ENT_SIZE      68

#define MARC_DATA_SIZE          1024 * 128 // 128k Read Buffer should be enough.

#define MARC_ERR_OK              0      // No error
#define MARC_ERR_OOD             314    // Out of data error
#define MARC_ERR_OS              315    // Error returned by the OS
#define MARC_ERR_CRC             316    // CRC error

struct marc_file_hdr
{
    ULONG            h_magic;
    ULONG            h_version;
    ULONG            h_files;
    int              h_fd;
    struct marc_dir *h_dir;
};

struct marc_file
{
    char            f_filename[MARC_FILENAME_LEN];
    long            f_filesize;
    unsigned long   f_checksum;
    long            f_offset;
};

struct marc_dir
{
    struct marc_file       *dir_file;
    ULONG                   dir_filenum;
    struct marc_dir        *dir_next;
};

这让您了解我为它们编写的标头,这是 open 函数。 是的,它缺少所有支持电话、错误例程等,但您明白了。 请原谅 C 和 C++ 代码风格的混合。 我们的扫描仪是由许多不同的问题组成的集群,就像这样......我使用了像 open()、fopen() 这样的古董调用,以与代码库的其余部分保持标准。

struct marc_file_hdr *marc_open(char *filename)
{
    struct marc_file_hdr *fhdr  = (struct marc_file_hdr*)malloc(sizeof(marc_file_hdr));
    fhdr->h_dir = NULL;

#if defined(_sopen_s)
    int errno = _sopen_s(fhdr->h_fd, filename, _O_BINARY | _O_RDONLY, _SH_DENYWR, _S_IREAD | _S_IWRITE);
#else
    fhdr->h_fd = open(filename, _O_BINARY | _O_RDONLY);
#endif
    if(fhdr->h_fd < 0)
    {
        marc_close(fhdr);
        return NULL;
    }

    //Once we have the file open, read all the file headers, and populate our main headers linked list.
    if(read(fhdr->h_fd, fhdr, MARC_HEADER_SIZE) != MARC_HEADER_SIZE)
    {
        errmsg("MARC: Could not read MARC header from file %s.\n", filename);
        marc_close(fhdr);
        return NULL;
    }

    // Verify the file magic
    if(fhdr->h_magic != MARC_FILE_MAGIC)
    {
        errmsg("MARC: Incorrect file magic %x found in MARC file.", fhdr->h_magic);
        marc_close(fhdr);
        return NULL;
    }

    if(fhdr->h_files <= 0)
    {
        errmsg("MARC: No files found in archive.\n");
        marc_close(fhdr);
        return NULL;
    }

    // Get all the file headers from this archive, and link them to the main header.
    struct marc_dir *lastdir = NULL, *curdir = NULL;
    curdir = (struct marc_dir*)malloc(sizeof(marc_dir));
    fhdr->h_dir = curdir;

    for(int x = 0;x < fhdr->h_files;x++)
    {
        if(lastdir)
        {
            lastdir->dir_next = (struct marc_dir*)malloc(sizeof(marc_dir));
            lastdir->dir_next->dir_next = NULL;
            curdir = lastdir->dir_next;
        }

        curdir->dir_file = (struct marc_file*)malloc(sizeof(marc_file));
        curdir->dir_filenum = x + 1;

        if(read(fhdr->h_fd, curdir->dir_file, MARC_FILE_ENT_SIZE) != MARC_FILE_ENT_SIZE)
        {
            errmsg("MARC: Could not read file header for file %d\n", x);
            marc_close(fhdr);
            return NULL;
        }
        // LEF: Just a little extra insurance...
        curdir->dir_file->f_filename[MARC_FILENAME_LEN] = NULL;

        lastdir = curdir;
    }
    lastdir->dir_next = NULL;

    return fhdr;
}

然后,你就有了简单的提取方法。 请记住,这严格用于病毒扫描,因此没有搜索例程等。其设计目的是简单地转储文件,扫描它,然后继续。 下面是我相信 Microsoft 使用的 CRC 代码例程,但我不确定他们到底进行了 CRC 校验。 它可能包括头数据+文件数据等。我只是没有足够关心回去尝试反转它。 无论如何,正如您所看到的,这种存档格式没有压缩,但添加它非常很容易。 如果您愿意,可以提供完整的源代码。 (我认为剩下的就是 close() 例程,以及调用和提取每个文件的代码等!!)

bool marc_extract(struct marc_file_hdr *marc, struct marc_file *marcfile, char *file, int &err)
{
    // Create the file from marcfile, in *file's location, return any errors.
    int ofd = 0;
#if defined(_sopen_s)
     err = _sopen_s(ofd, filename, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR, _SH_DENYNO, _S_IREAD | _S_IWRITE);
#else
    ofd = open(file, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR);
#endif

    // Seek to the offset of the file to extract
    if(lseek(marc->h_fd, marcfile->f_offset, SEEK_SET) != marcfile->f_offset)
    {
        errmsg("MARC: Could not seek to offset 0x%04x for file %s.\n", marcfile->f_offset, marcfile->f_filename);
        close(ofd);
        err = MARC_ERR_OS; // Get the last error from the OS.
        return false;
    }

    unsigned char *buffer = (unsigned char*)malloc(MARC_DATA_SIZE);

    long bytesleft = marcfile->f_filesize;
    long readsize = MARC_DATA_SIZE >= marcfile->f_filesize ? marcfile->f_filesize : MARC_DATA_SIZE;
    unsigned long crc = 0;

    while(bytesleft)
    {
        if(read(marc->h_fd, buffer, readsize) != readsize)
        {
            errmsg("MARC: Failed to extract data from MARC archive.\n");
            free(buffer);
            close(ofd);
            err = MARC_ERR_OOD;
            return false;
        }

        crc = marc_checksum(buffer, readsize, crc);

        if(write(ofd, buffer, readsize) != readsize)
        {
            errmsg("MARC: Failed to write data to file.\n");
            free(buffer);
            close(ofd);
            err = MARC_ERR_OS; // Get the last error from the OS.
            return false;
        }
        bytesleft -= readsize;
        readsize = MARC_DATA_SIZE >= bytesleft ? bytesleft : MARC_DATA_SIZE;
    }

    // LEF:  I can't quite figure out how the checksum is computed, but I think it has to do with the file header, PLUS the data in the file, or it's checked on the data in the file
    //       minus any BOM's at the start...  So, we'll just rem out this code for now, but I wrote it anyways.
    //if(crc != marcfile->f_checksum)
    //{
    //    warningmsg("MARC: File CRC does not match.  File could be corrupt, or altered.  CRC=0x%08X, expected 0x%08X\n", crc, marcfile->f_checksum);
    //    err = MARC_ERR_CRC;
    //}

    free(buffer);
    close(ofd);

    return true;
}

这是我假定的 CRC 例程(我可能从 Stuart Caie 和 libmspack,我不记得了):

static unsigned long marc_checksum(void *pv, UINT cb, unsigned long seed)
{
    int count = cb / 4;
    unsigned long csum = seed;
    BYTE *p = (BYTE*)pv;
    unsigned long ul;

    while(count-- > 0)
    {
        ul = *p++;
        ul |= (((unsigned long)(*p++)) <<  8);
        ul |= (((unsigned long)(*p++)) << 16);
        ul |= (((unsigned long)(*p++)) << 24);
        csum ^= ul;
    }

    ul = 0;
    switch(cb % 4)
    {
        case 3: ul |= (((unsigned long)(*p++)) << 16);
        case 2: ul |= (((unsigned long)(*p++)) <<  8);
        case 1: ul |= *p++;
        default: break;
    }
    csum ^= ul;

    return csum;
}                                                                                     

嗯,我认为这篇文章现在已经足够长了。 .. 如果您需要帮助或有疑问,请联系我。

I'm glad to hear you are doing this in C++. It seems that no one sees C++ as "necessary" anymore. It's all C# this, and ASP.NET that... Even I work in an all C# house, when I swore I would never switch, as C++ does everything I'd ever need to do and then some. I'm adult enough to clean up my own memory! heh.. Anyways, back to the issue at hand...

The DefineDOSDevice() is a method that you use to assign drive letters, port names (LPT1, COM1, etc). You pass it a name, some flags and a "path" that handles this device. But, don't let that fool you. It's not a File System path, it's an NT Object path. I'm sure you've seen them as "\Device\HardDisk0", etc. You can use WinObj.exe from sysinternals to see what I mean. Anyways, you can create a device driver, and then point an MSDOS Symlink to it, and you are off and running. But granted that seems like a lot of work for what the original problem is.

How many of these meg to gigabyte files are in a typical directory? You might be best off just sticking all the files inside of one giant file, and store an index file right next to it, (or a header to each file) that points to the next "File" inside your "virtual FileSystem" file.

A good example might be to look at the Microsoft MSN Archive format. I reversed this archive format when I was working for an AV company, and it's actually pretty creative, yet VERY simple. It can be done all in one file, and if you want to get fancy, you COULD store the data across 3 files in a RAID 5 type configuration, so if any one of the 3 files gets hosed, you COULD rebuild the others. Plus, the users would just see 3 VERY large files in a directory, and would not be able to access the individual (inner) files.

I have provided you with code that unpacks one of these MSN Archive formats. I don't have code that CREATES one, but from the extract source, you'd be able to construct/write one with no problem. If files are deleted, and/or renamed often, that might pose a problem with used space in the file that would have to be trimmed from time to time.

This format even supports CRC fields, so you can test if you got the file out OK. I was never able to fully reverse the algorithm that Microsoft used to CRC the data, but I have a pretty good idea.

You wouldn't be able to keep current I/O routines, meaning CreateFile() would not just be able to open up any file in the archive, however, with the uber-coolness of C++, you could override the CreateFile call to implement your archive format.

If you want some help with his, and it's a big enough issue, perhaps we could talk off-line and find a solution for you.

I'm not opposed to writing you a FileSystemDriver, but for that, we'd have to start talking about compensation. I would be more than happy to give you direction and ideas for free, just as I'm doing now.

I'm not sure if it's kosher for me to give you my email address on here, I'm not sure about SO's policies on this, since we could be talking about potential work/solicitation, but that's not my sole intention. I'd rather help you find your own solutions first.

Before you look into a device driver, download the WinDDK. It has driver samples all over it.

If you wonder why I care so much about this, it is because I've had on my slate for years to write a driver similar to this, that had to be Windows AND OSX compatible, that would allow users to secure drive volumes (USB Keys, removable volumes) WITHOUT installing any drivers, or complicated (and bulky, sometimes annoying) software. In recent years, a lot of the hardware manufacturers have done similar things, but I don't think the security is all that secure. I'm looking at using RSA and AES, exactly the same way GPG, and PGP work. Originally I was contacted about it for what (I believe, but have no proof) was going to be used to secure MP3 files. Since they'd be stored in encrypted format, they simply wouldn't work without the correct passphrase. But, I saw other uses for it too. (This was back when a 16 meg (yes MEG) USB Key cost in excess of $100 or so).

This project also went along with my Oil and Gas industry PC security system that used something similar to Smart Cards, just much easier to use, re-use/re-issue, impossible (read: VERY difficult, and unlikely) to hack, and I could use it on my own kids at home! (Since there is always fighting over who gets time on the computer, and who got the most, and on, and on, and on, and...)

Phew.. I think I got way off topic here. Anyways, here is an example of the Microsoft MSN archive format. See if you may be able to use something like this, knowing that you can always "Skip" right to a file by following the offsets in the file as you parse/search for the requested file in the master file; or in the pre-parsed data held in memory. And since you would not be loading the raw binary file data in memory, your only limit would probably be the 4gb file limit on 32 bit machines.

The MARC (Microsoft MSN Archive) format is laid out (loosely) like this:

  • 12 Byte Header (only one)
    • File Magic
    • MARC version
    • Number of files (in the following table)
  • 68 Byte File Table headers (1 to Header.NumFiles of these)
    • File name
    • File Size
    • Checksum
    • offset to raw file data

Now, in the 12 Byte File Table entries, 32 bits are used for file lengths, and offsets. For your VERY large files, you may have to up that to 48 or 64 bit integers.

Here is some code I wrote up to handle these.

#define MARC_FILE_MAGIC         0x4352414D // In Little Endian
#define MARC_FILENAME_LEN       56 //(You'll notice this is rather small)
#define MARC_HEADER_SIZE        12
#define MARC_FILE_ENT_SIZE      68

#define MARC_DATA_SIZE          1024 * 128 // 128k Read Buffer should be enough.

#define MARC_ERR_OK              0      // No error
#define MARC_ERR_OOD             314    // Out of data error
#define MARC_ERR_OS              315    // Error returned by the OS
#define MARC_ERR_CRC             316    // CRC error

struct marc_file_hdr
{
    ULONG            h_magic;
    ULONG            h_version;
    ULONG            h_files;
    int              h_fd;
    struct marc_dir *h_dir;
};

struct marc_file
{
    char            f_filename[MARC_FILENAME_LEN];
    long            f_filesize;
    unsigned long   f_checksum;
    long            f_offset;
};

struct marc_dir
{
    struct marc_file       *dir_file;
    ULONG                   dir_filenum;
    struct marc_dir        *dir_next;
};

That gives you an idea of the headers I wrote for them, and here is the open function. Yes, it's missing all the support calls, err routines, etc, but you get the idea. Please excuse the C and C++ code style mixture. Our scanner was a cluster of many different problems like this... I used the antique calls like open(), fopen(), to keep standards with the rest of the code base.

struct marc_file_hdr *marc_open(char *filename)
{
    struct marc_file_hdr *fhdr  = (struct marc_file_hdr*)malloc(sizeof(marc_file_hdr));
    fhdr->h_dir = NULL;

#if defined(_sopen_s)
    int errno = _sopen_s(fhdr->h_fd, filename, _O_BINARY | _O_RDONLY, _SH_DENYWR, _S_IREAD | _S_IWRITE);
#else
    fhdr->h_fd = open(filename, _O_BINARY | _O_RDONLY);
#endif
    if(fhdr->h_fd < 0)
    {
        marc_close(fhdr);
        return NULL;
    }

    //Once we have the file open, read all the file headers, and populate our main headers linked list.
    if(read(fhdr->h_fd, fhdr, MARC_HEADER_SIZE) != MARC_HEADER_SIZE)
    {
        errmsg("MARC: Could not read MARC header from file %s.\n", filename);
        marc_close(fhdr);
        return NULL;
    }

    // Verify the file magic
    if(fhdr->h_magic != MARC_FILE_MAGIC)
    {
        errmsg("MARC: Incorrect file magic %x found in MARC file.", fhdr->h_magic);
        marc_close(fhdr);
        return NULL;
    }

    if(fhdr->h_files <= 0)
    {
        errmsg("MARC: No files found in archive.\n");
        marc_close(fhdr);
        return NULL;
    }

    // Get all the file headers from this archive, and link them to the main header.
    struct marc_dir *lastdir = NULL, *curdir = NULL;
    curdir = (struct marc_dir*)malloc(sizeof(marc_dir));
    fhdr->h_dir = curdir;

    for(int x = 0;x < fhdr->h_files;x++)
    {
        if(lastdir)
        {
            lastdir->dir_next = (struct marc_dir*)malloc(sizeof(marc_dir));
            lastdir->dir_next->dir_next = NULL;
            curdir = lastdir->dir_next;
        }

        curdir->dir_file = (struct marc_file*)malloc(sizeof(marc_file));
        curdir->dir_filenum = x + 1;

        if(read(fhdr->h_fd, curdir->dir_file, MARC_FILE_ENT_SIZE) != MARC_FILE_ENT_SIZE)
        {
            errmsg("MARC: Could not read file header for file %d\n", x);
            marc_close(fhdr);
            return NULL;
        }
        // LEF: Just a little extra insurance...
        curdir->dir_file->f_filename[MARC_FILENAME_LEN] = NULL;

        lastdir = curdir;
    }
    lastdir->dir_next = NULL;

    return fhdr;
}

Then, you have the simple extract method. Keep in mind this was strictly for virus scanning, so there are no search routines, etc. This was designed to simply dump out a file, scan it, and moved on. Below that is the CRC code routine that I BELIEVE Microsoft used, but I'm not sure WHAT exactly they CRC'ed. It might include header data + file data, etc. I just haven't cared enough to go back and try to reverse it. Anyway, as you can see, there is no compression on this archive format, but it is VERY easy to add. Full source can be provided if you want. (I think all that's left is the close() routine, and the code that calls and extracts each file, etc.!!)

bool marc_extract(struct marc_file_hdr *marc, struct marc_file *marcfile, char *file, int &err)
{
    // Create the file from marcfile, in *file's location, return any errors.
    int ofd = 0;
#if defined(_sopen_s)
     err = _sopen_s(ofd, filename, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR, _SH_DENYNO, _S_IREAD | _S_IWRITE);
#else
    ofd = open(file, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR);
#endif

    // Seek to the offset of the file to extract
    if(lseek(marc->h_fd, marcfile->f_offset, SEEK_SET) != marcfile->f_offset)
    {
        errmsg("MARC: Could not seek to offset 0x%04x for file %s.\n", marcfile->f_offset, marcfile->f_filename);
        close(ofd);
        err = MARC_ERR_OS; // Get the last error from the OS.
        return false;
    }

    unsigned char *buffer = (unsigned char*)malloc(MARC_DATA_SIZE);

    long bytesleft = marcfile->f_filesize;
    long readsize = MARC_DATA_SIZE >= marcfile->f_filesize ? marcfile->f_filesize : MARC_DATA_SIZE;
    unsigned long crc = 0;

    while(bytesleft)
    {
        if(read(marc->h_fd, buffer, readsize) != readsize)
        {
            errmsg("MARC: Failed to extract data from MARC archive.\n");
            free(buffer);
            close(ofd);
            err = MARC_ERR_OOD;
            return false;
        }

        crc = marc_checksum(buffer, readsize, crc);

        if(write(ofd, buffer, readsize) != readsize)
        {
            errmsg("MARC: Failed to write data to file.\n");
            free(buffer);
            close(ofd);
            err = MARC_ERR_OS; // Get the last error from the OS.
            return false;
        }
        bytesleft -= readsize;
        readsize = MARC_DATA_SIZE >= bytesleft ? bytesleft : MARC_DATA_SIZE;
    }

    // LEF:  I can't quite figure out how the checksum is computed, but I think it has to do with the file header, PLUS the data in the file, or it's checked on the data in the file
    //       minus any BOM's at the start...  So, we'll just rem out this code for now, but I wrote it anyways.
    //if(crc != marcfile->f_checksum)
    //{
    //    warningmsg("MARC: File CRC does not match.  File could be corrupt, or altered.  CRC=0x%08X, expected 0x%08X\n", crc, marcfile->f_checksum);
    //    err = MARC_ERR_CRC;
    //}

    free(buffer);
    close(ofd);

    return true;
}

Here is MY assumed CRC routine (I may have stole this from Stuart Caie and libmspack, I can't recall):

static unsigned long marc_checksum(void *pv, UINT cb, unsigned long seed)
{
    int count = cb / 4;
    unsigned long csum = seed;
    BYTE *p = (BYTE*)pv;
    unsigned long ul;

    while(count-- > 0)
    {
        ul = *p++;
        ul |= (((unsigned long)(*p++)) <<  8);
        ul |= (((unsigned long)(*p++)) << 16);
        ul |= (((unsigned long)(*p++)) << 24);
        csum ^= ul;
    }

    ul = 0;
    switch(cb % 4)
    {
        case 3: ul |= (((unsigned long)(*p++)) << 16);
        case 2: ul |= (((unsigned long)(*p++)) <<  8);
        case 1: ul |= *p++;
        default: break;
    }
    csum ^= ul;

    return csum;
}                                                                                     

Well, I think this post is long enough now... Contact me if you need help, or have questions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文