当前位置：文江博客话题详情

Windows 资源管理器目录作为捆绑包

发布于 2024-07-09 21:15:14 字数 654 浏览 8 评论 0原文

一段时间以来，我一直在研究一种防止用户意外进入应用程序数据目录的方法。

我的应用程序使用一个文件夹来存储结构化项目。文件夹内部结构很重要，不应弄乱。我希望我的用户看到这个文件夹作为一个整体，但无法打开它（就像 Mac 捆绑包一样）。

有没有办法在 Windows 上做到这一点？

根据当前答案进行编辑

当然，我并不是试图阻止我的用户访问他们的数据，只是保护他们免于意外破坏数据完整性。因此不需要加密或密码保护。

感谢大家的 .Net 回答，但不幸的是，这主要是一个 C++ 项目，不依赖于 .Net 框架。

我提到的数据不是光数据，它们是从电子显微镜获得的图像。这些数据可能很大（~100 MiB 到~1 GiB），因此无法将所有内容加载到内存中。这些是巨大的图像，因此存储必须提供一种通过一次访问一个文件来增量读取数据的方法，而无需将整个存档加载到内存中。

此外，该应用程序主要是遗留的，其中一些组件我们甚至不负责。允许我保留当前 IO 代码的解决方案是更好的。

Shell 扩展看起来很有趣，我将进一步研究该解决方案。

LarryF，您能详细说明 Filter Driver 或 DefineDOSDevice 吗？我对这些概念并不熟悉。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

绅刃 2024-07-16 21:15:14

您可以执行以下操作：

一件事是您可以创建一个FolderView Windows Shell 扩展，该扩展将为您的关键文件夹创建自定义视图。通过创建自定义的FolderView，您可以使文件夹变成空白的白色，并带有一行文本“此处无任何内容”，或者您可以执行一些更复杂的操作，例如使用相同方法的GAC 查看器。此方法相当复杂，但可以通过使用 this CodeProject< 之类的方法来减轻这种复杂性/a> 文章的库作为基础。

另一种解决方案是使用 ZIP 虚拟文件系统，这需要您替换直接使用 System.IO 的任何代码以使用其他内容。 ASP.NET 2.0 正是出于这个原因而这样做的，您可以非常轻松地在此基础上进行构建，请查看此关于实现 VirtualPathProvider 的 MSDN 文章。

回复收藏 0 原文

岁月蹉跎了容颜 2024-07-16 21:15:14

结构化存储专为您的场景而设计描述：

结构化存储通过将单个文件作为称为存储和流的对象的结构化集合来处理，从而在 COM 中提供文件和数据持久性。

“存储”类似于文件夹，“流”类似于文件。基本上，您有一个文件，当使用结构化存储 API 访问时，该文件的行为和外观就像一个完整的、独立的文件系统。

但请注意：

对 COM 技术的深入理解是开发使用结构化存储的先决条件。

回复收藏 0 原文

冬天旳寂寞 2024-07-16 21:15:14

在你的程序内部还是外部？

方法是有的，但没有一个是容易的。您可能会查看文件系统上的过滤器驱动程序。

回复收藏 0 原文

瑕疵 2024-07-16 21:15:14

如果您采用 ZIP 文件方法（我为您考虑过，但没有提及），我建议使用 deflate 算法，但使用您自己的文件系统...看看类似 TAR 格式的文件。然后，只需编写代码，以便在将所有 I/O 写入磁盘时通过 Inflate/Deflate 算法。我不会使用 ZIP“格式”，因为它太容易查看文件，找到 PK 作为前两个字节，然后解压缩文件......

我最喜欢 Joshperry 的建议。

当然，您也可以编写一个设备驱动程序，将所有数据存储在一个文件中，但同样，我们正在研究驱动程序。（我不确定你是否可以在驱动程序之外实现它。你可能可以，并且在你的程序中调用 DefineDOSDevice，给它一个只有你的代码可以访问的名称，它将被视为普通的文件系统。）。我会尝试一些想法，如果有效，我会给你拍一个样本。现在你让我感兴趣了。

回复收藏 0 原文

携余温的黄昏 2024-07-16 21:15:14

您可以将项目目录包装到 .zip 文件中并将数据存储在那里，就像使用 .jar 一样（我知道 .jar 几乎是只读的，这是为了示例）。制作一个非标准扩展，这样双击不会立即生效，完成。 ;-)

当然，这意味着您必须包装所有文件 IO 才能使用 .zip，具体取决于您的程序的构建方式，这可能会很乏味。 Java 已经完成了：TrueZip。也许你可以用它作为灵感？

如果您已经受到诱惑 - 我不建议摆弄文件夹权限，因为显而易见的原因这不会有帮助。

回复收藏 0 原文

回忆那么伤 2024-07-16 21:15:14

您可以使用隔离存储。

http://www.ondotnet.com/pub/ a/dotnet/2003/04/21/isolatedstorage.html

它并没有解决所有问题，但它确实使应用程序数据免受损害。

回复收藏 0 原文

无所谓啦 2024-07-16 21:15:14

请记住：如果您将其存储在文件系统中，用户将始终能够看到它。篡改资源管理器，我使用 cmd.exe 代替。或总司令。或者其他什么。

如果您不希望人们弄乱您的文件，我建议对

文件
它们进行加密以防止篡改将它们放入存档（即 ZIP）中的，可能对其进行密码保护，然后在运行时压缩/解压缩（i会查找快速修改档案的算法）

这当然不是完全的保护，但它的实现相当简单，不需要您在操作系统内安装时髦的东西，并且应该让大多数好奇的用户望而却步。

当然，如果不控制计算机本身，您将永远无法完全控制用户计算机上的文件。

回复收藏 0 原文

颜漓半夏 2024-07-16 21:15:14

我见过一些软件（Visual Paradigm 的 Agilian）使用 Tomalak 建议的 zip 存档作为“项目文件”。 Zip 文件很好理解，使用非标准文件扩展名确实可以防止临时用户弄乱“文件”。这样做的一大优点是，如果发生损坏，可以使用标准工具来解决问题，并且您不必担心创建特殊工具来支持您的主应用程序。

回复收藏 0 原文

寒江雪… 2024-07-16 21:15:14

看起来一些 FUSE 的 Windows 端口开始出现。我认为这将是最好的解决方案，因为它允许我保持遗留代码（相当大）不变。

回复收藏 0 原文

迷你仙 2024-07-16 21:15:14

我很高兴听到你用 C++ 来做这件事。似乎没有人再认为 C++ 是“必需的”。这都是 C#，而 ASP.NET 是……即使我在一个全 C# 的房子里工作，当我发誓我永远不会切换时，因为 C++ 做了我需要做的一切，然后一些。我已经长大了，可以清理自己的记忆了！呵呵.. 无论如何，回到手头的问题...

DefineDOSDevice() 是一种用于分配驱动器号、端口名称（LPT1、COM1 等）的方法。您向它传递一个名称、一些标志和一个处理该设备的“路径”。但是，不要让它欺骗你。它不是文件系统路径，而是 NT 对象路径。我确信您已经将它们视为“\Device\HardDisk0”等。您可以使用 sysinternals 中的 WinObj.exe 来了解我的意思。无论如何，您可以创建一个设备驱动程序，然后将 MSDOS 符号链接指向它，然后就可以开始运行了。但当然，对于最初的问题来说，这似乎需要做很多工作。

典型目录中有多少个兆字节到千兆字节的文件？您可能最好将所有文件放在一个巨大文件中，并在其旁边存储一个索引文件（或每个文件的标头），该索引文件指向“虚拟文件系统”文件中的下一个“文件”。

一个很好的例子是查看 Microsoft MSN Archive 格式。当我在一家 AV 公司工作时，我反转了这种存档格式，它实际上非常有创意，但非常简单。它可以在一个文件中完成，如果您想变得更奇特，您可以将数据存储在 RAID 5 类型配置中的 3 个文件中，因此如果 3 个文件中的任何一个被破坏，你可以重建其他人。另外，用户只会在目录中看到 3 个非常大文件，并且无法访问各个（内部）文件。

我为您提供了解压其中一种 MSN Archive 格式的代码。我没有创建一个的代码，但是从提取源代码中，您可以毫无问题地构造/编写一个。如果文件经常被删除和/或重命名，则可能会导致文件中的已用空间出现问题，必须不时地修剪文件。

这种格式甚至支持 CRC 字段，因此您可以测试文件是否正常输出。我始终无法完全逆转 Microsoft 用于 CRC 数据的算法，但我有一个很好的主意。

您将无法保留当前的 I/O 例程，这意味着 CreateFile() 不仅能够打开存档中的任何文件，但是，凭借 C++ 的超酷功能，您可以覆盖 CreateFile 调用来实现您的存档格式。

如果您需要他的帮助，并且这是一个足够大的问题，也许我们可以离线交谈并为您找到解决方案。

我并不反对为您编写一个 FileSystemDriver，但为此，我们必须开始讨论补偿问题。我非常乐意免费为您提供指导和想法，就像我现在所做的那样。

我不确定在这里给你我的电子邮件地址是否符合犹太教规，我不确定 SO 对此的政策，因为我们可能会谈论潜在的工作/招揽，但这不是我的唯一意图。我宁愿先帮助您找到自己的解决方案。

在查看设备驱动程序之前，请下载 WinDDK。上面到处都是驱动程序样本。

如果你想知道为什么我如此关心这个，那是因为我多年来一直在写一个与此类似的驱动程序，它必须与 Windows AND OSX 兼容，这将允许用户无需安装任何驱动程序或复杂（且笨重，有时烦人）的软件即可保护驱动器卷（USB 密钥、可移动卷）。近年来，很多硬件厂商都做了类似的事情，但我认为安全性并不是那么可靠。我正在考虑使用 RSA 和 AES，这与 GPG 和 PGP 的工作方式完全相同。最初有人联系我是因为（我相信，但没有证据）将用于保护 MP3 文件。由于它们以加密格式存储，因此如果没有正确的密码，它们就无法工作。但是，我也看到了它的其他用途。（当时 16 兆（是的 MEG）USB 密钥的成本超过 100 美元左右）。

这个项目还与我的石油和天然气行业 PC 安全系统一起使用，该系统使用类似于智能卡的东西，只是更容易使用、重复使用/重新发行、不可能（读：非常困难，而且不太可能）被黑客攻击，并且我可以在家里给我自己的孩子使用它！（因为总是在争论谁有时间使用计算机，谁获得最多时间，等等，等等，等等……）

唷.. 我想我在这里偏离了主题。无论如何，这里是 Microsoft MSN 存档格式的示例。看看您是否可以使用类似的东西，知道当您在主文件中解析/搜索请求的文件时，您始终可以通过跟踪文件中的偏移量“跳”到文件；或者在内存中保存的预先解析的数据中。由于您不会将原始二进制文件数据加载到内存中，因此唯一的限制可能是 32 位计算机上的 4GB 文件限制。

MARC（Microsoft MSN Archive）格式的布局（松散）如下：

12 字节标头（只有一个）
- 文件魔法
- MARC版本
- 文件数量（如下表所示）
68 字节文件表标头（其中 1 到 Header.NumFiles）
- 文件名
- 文件大小
- 校验和
- 原始文件数据的偏移量

和偏移现在，在 12 字节文件表条目中，32 位用于文件长度和偏移量。对于非常大的文件，您可能必须将其增加到 48 或 64 位整数。

这是我编写的一些代码来处理这些问题。

#define MARC_FILE_MAGIC         0x4352414D // In Little Endian
#define MARC_FILENAME_LEN       56 //(You'll notice this is rather small)
#define MARC_HEADER_SIZE        12
#define MARC_FILE_ENT_SIZE      68

#define MARC_DATA_SIZE          1024 * 128 // 128k Read Buffer should be enough.

#define MARC_ERR_OK              0      // No error
#define MARC_ERR_OOD             314    // Out of data error
#define MARC_ERR_OS              315    // Error returned by the OS
#define MARC_ERR_CRC             316    // CRC error

struct marc_file_hdr
{
    ULONG            h_magic;
    ULONG            h_version;
    ULONG            h_files;
    int              h_fd;
    struct marc_dir *h_dir;
};

struct marc_file
{
    char            f_filename[MARC_FILENAME_LEN];
    long            f_filesize;
    unsigned long   f_checksum;
    long            f_offset;
};

struct marc_dir
{
    struct marc_file       *dir_file;
    ULONG                   dir_filenum;
    struct marc_dir        *dir_next;
};

这让您了解我为它们编写的标头，这是 open 函数。是的，它缺少所有支持电话、错误例程等，但您明白了。请原谅 C 和 C++ 代码风格的混合。我们的扫描仪是由许多不同的问题组成的集群，就像这样......我使用了像 open()、fopen() 这样的古董调用，以与代码库的其余部分保持标准。

struct marc_file_hdr *marc_open(char *filename)
{
    struct marc_file_hdr *fhdr  = (struct marc_file_hdr*)malloc(sizeof(marc_file_hdr));
    fhdr->h_dir = NULL;

#if defined(_sopen_s)
    int errno = _sopen_s(fhdr->h_fd, filename, _O_BINARY | _O_RDONLY, _SH_DENYWR, _S_IREAD | _S_IWRITE);
#else
    fhdr->h_fd = open(filename, _O_BINARY | _O_RDONLY);
#endif
    if(fhdr->h_fd < 0)
    {
        marc_close(fhdr);
        return NULL;
    }

    //Once we have the file open, read all the file headers, and populate our main headers linked list.
    if(read(fhdr->h_fd, fhdr, MARC_HEADER_SIZE) != MARC_HEADER_SIZE)
    {
        errmsg("MARC: Could not read MARC header from file %s.\n", filename);
        marc_close(fhdr);
        return NULL;
    }

    // Verify the file magic
    if(fhdr->h_magic != MARC_FILE_MAGIC)
    {
        errmsg("MARC: Incorrect file magic %x found in MARC file.", fhdr->h_magic);
        marc_close(fhdr);
        return NULL;
    }

    if(fhdr->h_files <= 0)
    {
        errmsg("MARC: No files found in archive.\n");
        marc_close(fhdr);
        return NULL;
    }

    // Get all the file headers from this archive, and link them to the main header.
    struct marc_dir *lastdir = NULL, *curdir = NULL;
    curdir = (struct marc_dir*)malloc(sizeof(marc_dir));
    fhdr->h_dir = curdir;

    for(int x = 0;x < fhdr->h_files;x++)
    {
        if(lastdir)
        {
            lastdir->dir_next = (struct marc_dir*)malloc(sizeof(marc_dir));
            lastdir->dir_next->dir_next = NULL;
            curdir = lastdir->dir_next;
        }

        curdir->dir_file = (struct marc_file*)malloc(sizeof(marc_file));
        curdir->dir_filenum = x + 1;

        if(read(fhdr->h_fd, curdir->dir_file, MARC_FILE_ENT_SIZE) != MARC_FILE_ENT_SIZE)
        {
            errmsg("MARC: Could not read file header for file %d\n", x);
            marc_close(fhdr);
            return NULL;
        }
        // LEF: Just a little extra insurance...
        curdir->dir_file->f_filename[MARC_FILENAME_LEN] = NULL;

        lastdir = curdir;
    }
    lastdir->dir_next = NULL;

    return fhdr;
}

然后，你就有了简单的提取方法。请记住，这严格用于病毒扫描，因此没有搜索例程等。其设计目的是简单地转储文件，扫描它，然后继续。下面是我相信 Microsoft 使用的 CRC 代码例程，但我不确定他们到底进行了 CRC 校验。它可能包括头数据+文件数据等。我只是没有足够关心回去尝试反转它。无论如何，正如您所看到的，这种存档格式没有压缩，但添加它非常很容易。如果您愿意，可以提供完整的源代码。（我认为剩下的就是 close() 例程，以及调用和提取每个文件的代码等！！）

bool marc_extract(struct marc_file_hdr *marc, struct marc_file *marcfile, char *file, int &err)
{
    // Create the file from marcfile, in *file's location, return any errors.
    int ofd = 0;
#if defined(_sopen_s)
     err = _sopen_s(ofd, filename, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR, _SH_DENYNO, _S_IREAD | _S_IWRITE);
#else
    ofd = open(file, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR);
#endif

    // Seek to the offset of the file to extract
    if(lseek(marc->h_fd, marcfile->f_offset, SEEK_SET) != marcfile->f_offset)
    {
        errmsg("MARC: Could not seek to offset 0x%04x for file %s.\n", marcfile->f_offset, marcfile->f_filename);
        close(ofd);
        err = MARC_ERR_OS; // Get the last error from the OS.
        return false;
    }

    unsigned char *buffer = (unsigned char*)malloc(MARC_DATA_SIZE);

    long bytesleft = marcfile->f_filesize;
    long readsize = MARC_DATA_SIZE >= marcfile->f_filesize ? marcfile->f_filesize : MARC_DATA_SIZE;
    unsigned long crc = 0;

    while(bytesleft)
    {
        if(read(marc->h_fd, buffer, readsize) != readsize)
        {
            errmsg("MARC: Failed to extract data from MARC archive.\n");
            free(buffer);
            close(ofd);
            err = MARC_ERR_OOD;
            return false;
        }

        crc = marc_checksum(buffer, readsize, crc);

        if(write(ofd, buffer, readsize) != readsize)
        {
            errmsg("MARC: Failed to write data to file.\n");
            free(buffer);
            close(ofd);
            err = MARC_ERR_OS; // Get the last error from the OS.
            return false;
        }
        bytesleft -= readsize;
        readsize = MARC_DATA_SIZE >= bytesleft ? bytesleft : MARC_DATA_SIZE;
    }

    // LEF:  I can't quite figure out how the checksum is computed, but I think it has to do with the file header, PLUS the data in the file, or it's checked on the data in the file
    //       minus any BOM's at the start...  So, we'll just rem out this code for now, but I wrote it anyways.
    //if(crc != marcfile->f_checksum)
    //{
    //    warningmsg("MARC: File CRC does not match.  File could be corrupt, or altered.  CRC=0x%08X, expected 0x%08X\n", crc, marcfile->f_checksum);
    //    err = MARC_ERR_CRC;
    //}

    free(buffer);
    close(ofd);

    return true;
}

这是我假定的 CRC 例程（我可能从 Stuart Caie 和 libmspack，我不记得了）：

static unsigned long marc_checksum(void *pv, UINT cb, unsigned long seed)
{
    int count = cb / 4;
    unsigned long csum = seed;
    BYTE *p = (BYTE*)pv;
    unsigned long ul;

    while(count-- > 0)
    {
        ul = *p++;
        ul |= (((unsigned long)(*p++)) <<  8);
        ul |= (((unsigned long)(*p++)) << 16);
        ul |= (((unsigned long)(*p++)) << 24);
        csum ^= ul;
    }

    ul = 0;
    switch(cb % 4)
    {
        case 3: ul |= (((unsigned long)(*p++)) << 16);
        case 2: ul |= (((unsigned long)(*p++)) <<  8);
        case 1: ul |= *p++;
        default: break;
    }
    csum ^= ul;

    return csum;
}

嗯，我认为这篇文章现在已经足够长了。 .. 如果您需要帮助或有疑问，请联系我。

I'm glad to hear you are doing this in C++. It seems that no one sees C++ as "necessary" anymore. It's all C# this, and ASP.NET that... Even I work in an all C# house, when I swore I would never switch, as C++ does everything I'd ever need to do and then some. I'm adult enough to clean up my own memory! heh.. Anyways, back to the issue at hand...

The DefineDOSDevice() is a method that you use to assign drive letters, port names (LPT1, COM1, etc). You pass it a name, some flags and a "path" that handles this device. But, don't let that fool you. It's not a File System path, it's an NT Object path. I'm sure you've seen them as "\Device\HardDisk0", etc. You can use WinObj.exe from sysinternals to see what I mean. Anyways, you can create a device driver, and then point an MSDOS Symlink to it, and you are off and running. But granted that seems like a lot of work for what the original problem is.

How many of these meg to gigabyte files are in a typical directory? You might be best off just sticking all the files inside of one giant file, and store an index file right next to it, (or a header to each file) that points to the next "File" inside your "virtual FileSystem" file.

A good example might be to look at the Microsoft MSN Archive format. I reversed this archive format when I was working for an AV company, and it's actually pretty creative, yet VERY simple. It can be done all in one file, and if you want to get fancy, you COULD store the data across 3 files in a RAID 5 type configuration, so if any one of the 3 files gets hosed, you COULD rebuild the others. Plus, the users would just see 3 VERY large files in a directory, and would not be able to access the individual (inner) files.

I have provided you with code that unpacks one of these MSN Archive formats. I don't have code that CREATES one, but from the extract source, you'd be able to construct/write one with no problem. If files are deleted, and/or renamed often, that might pose a problem with used space in the file that would have to be trimmed from time to time.

This format even supports CRC fields, so you can test if you got the file out OK. I was never able to fully reverse the algorithm that Microsoft used to CRC the data, but I have a pretty good idea.

You wouldn't be able to keep current I/O routines, meaning CreateFile() would not just be able to open up any file in the archive, however, with the uber-coolness of C++, you could override the CreateFile call to implement your archive format.

If you want some help with his, and it's a big enough issue, perhaps we could talk off-line and find a solution for you.

I'm not opposed to writing you a FileSystemDriver, but for that, we'd have to start talking about compensation. I would be more than happy to give you direction and ideas for free, just as I'm doing now.

I'm not sure if it's kosher for me to give you my email address on here, I'm not sure about SO's policies on this, since we could be talking about potential work/solicitation, but that's not my sole intention. I'd rather help you find your own solutions first.

Before you look into a device driver, download the WinDDK. It has driver samples all over it.

If you wonder why I care so much about this, it is because I've had on my slate for years to write a driver similar to this, that had to be Windows AND OSX compatible, that would allow users to secure drive volumes (USB Keys, removable volumes) WITHOUT installing any drivers, or complicated (and bulky, sometimes annoying) software. In recent years, a lot of the hardware manufacturers have done similar things, but I don't think the security is all that secure. I'm looking at using RSA and AES, exactly the same way GPG, and PGP work. Originally I was contacted about it for what (I believe, but have no proof) was going to be used to secure MP3 files. Since they'd be stored in encrypted format, they simply wouldn't work without the correct passphrase. But, I saw other uses for it too. (This was back when a 16 meg (yes MEG) USB Key cost in excess of $100 or so).

This project also went along with my Oil and Gas industry PC security system that used something similar to Smart Cards, just much easier to use, re-use/re-issue, impossible (read: VERY difficult, and unlikely) to hack, and I could use it on my own kids at home! (Since there is always fighting over who gets time on the computer, and who got the most, and on, and on, and on, and...)

Phew.. I think I got way off topic here. Anyways, here is an example of the Microsoft MSN archive format. See if you may be able to use something like this, knowing that you can always "Skip" right to a file by following the offsets in the file as you parse/search for the requested file in the master file; or in the pre-parsed data held in memory. And since you would not be loading the raw binary file data in memory, your only limit would probably be the 4gb file limit on 32 bit machines.

The MARC (Microsoft MSN Archive) format is laid out (loosely) like this:

12 Byte Header (only one)
- File Magic
- MARC version
- Number of files (in the following table)
68 Byte File Table headers (1 to Header.NumFiles of these)
- File name
- File Size
- Checksum
- offset to raw file data

Now, in the 12 Byte File Table entries, 32 bits are used for file lengths, and offsets. For your VERY large files, you may have to up that to 48 or 64 bit integers.

Here is some code I wrote up to handle these.

#define MARC_FILE_MAGIC         0x4352414D // In Little Endian
#define MARC_FILENAME_LEN       56 //(You'll notice this is rather small)
#define MARC_HEADER_SIZE        12
#define MARC_FILE_ENT_SIZE      68

#define MARC_DATA_SIZE          1024 * 128 // 128k Read Buffer should be enough.

#define MARC_ERR_OK              0      // No error
#define MARC_ERR_OOD             314    // Out of data error
#define MARC_ERR_OS              315    // Error returned by the OS
#define MARC_ERR_CRC             316    // CRC error

struct marc_file_hdr
{
    ULONG            h_magic;
    ULONG            h_version;
    ULONG            h_files;
    int              h_fd;
    struct marc_dir *h_dir;
};

struct marc_file
{
    char            f_filename[MARC_FILENAME_LEN];
    long            f_filesize;
    unsigned long   f_checksum;
    long            f_offset;
};

struct marc_dir
{
    struct marc_file       *dir_file;
    ULONG                   dir_filenum;
    struct marc_dir        *dir_next;
};

That gives you an idea of the headers I wrote for them, and here is the open function. Yes, it's missing all the support calls, err routines, etc, but you get the idea. Please excuse the C and C++ code style mixture. Our scanner was a cluster of many different problems like this... I used the antique calls like open(), fopen(), to keep standards with the rest of the code base.

struct marc_file_hdr *marc_open(char *filename)
{
    struct marc_file_hdr *fhdr  = (struct marc_file_hdr*)malloc(sizeof(marc_file_hdr));
    fhdr->h_dir = NULL;

#if defined(_sopen_s)
    int errno = _sopen_s(fhdr->h_fd, filename, _O_BINARY | _O_RDONLY, _SH_DENYWR, _S_IREAD | _S_IWRITE);
#else
    fhdr->h_fd = open(filename, _O_BINARY | _O_RDONLY);
#endif
    if(fhdr->h_fd < 0)
    {
        marc_close(fhdr);
        return NULL;
    }

    //Once we have the file open, read all the file headers, and populate our main headers linked list.
    if(read(fhdr->h_fd, fhdr, MARC_HEADER_SIZE) != MARC_HEADER_SIZE)
    {
        errmsg("MARC: Could not read MARC header from file %s.\n", filename);
        marc_close(fhdr);
        return NULL;
    }

    // Verify the file magic
    if(fhdr->h_magic != MARC_FILE_MAGIC)
    {
        errmsg("MARC: Incorrect file magic %x found in MARC file.", fhdr->h_magic);
        marc_close(fhdr);
        return NULL;
    }

    if(fhdr->h_files <= 0)
    {
        errmsg("MARC: No files found in archive.\n");
        marc_close(fhdr);
        return NULL;
    }

    // Get all the file headers from this archive, and link them to the main header.
    struct marc_dir *lastdir = NULL, *curdir = NULL;
    curdir = (struct marc_dir*)malloc(sizeof(marc_dir));
    fhdr->h_dir = curdir;

    for(int x = 0;x < fhdr->h_files;x++)
    {
        if(lastdir)
        {
            lastdir->dir_next = (struct marc_dir*)malloc(sizeof(marc_dir));
            lastdir->dir_next->dir_next = NULL;
            curdir = lastdir->dir_next;
        }

        curdir->dir_file = (struct marc_file*)malloc(sizeof(marc_file));
        curdir->dir_filenum = x + 1;

        if(read(fhdr->h_fd, curdir->dir_file, MARC_FILE_ENT_SIZE) != MARC_FILE_ENT_SIZE)
        {
            errmsg("MARC: Could not read file header for file %d\n", x);
            marc_close(fhdr);
            return NULL;
        }
        // LEF: Just a little extra insurance...
        curdir->dir_file->f_filename[MARC_FILENAME_LEN] = NULL;

        lastdir = curdir;
    }
    lastdir->dir_next = NULL;

    return fhdr;
}

Then, you have the simple extract method. Keep in mind this was strictly for virus scanning, so there are no search routines, etc. This was designed to simply dump out a file, scan it, and moved on. Below that is the CRC code routine that I BELIEVE Microsoft used, but I'm not sure WHAT exactly they CRC'ed. It might include header data + file data, etc. I just haven't cared enough to go back and try to reverse it. Anyway, as you can see, there is no compression on this archive format, but it is VERY easy to add. Full source can be provided if you want. (I think all that's left is the close() routine, and the code that calls and extracts each file, etc.!!)

bool marc_extract(struct marc_file_hdr *marc, struct marc_file *marcfile, char *file, int &err)
{
    // Create the file from marcfile, in *file's location, return any errors.
    int ofd = 0;
#if defined(_sopen_s)
     err = _sopen_s(ofd, filename, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR, _SH_DENYNO, _S_IREAD | _S_IWRITE);
#else
    ofd = open(file, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR);
#endif

    // Seek to the offset of the file to extract
    if(lseek(marc->h_fd, marcfile->f_offset, SEEK_SET) != marcfile->f_offset)
    {
        errmsg("MARC: Could not seek to offset 0x%04x for file %s.\n", marcfile->f_offset, marcfile->f_filename);
        close(ofd);
        err = MARC_ERR_OS; // Get the last error from the OS.
        return false;
    }

    unsigned char *buffer = (unsigned char*)malloc(MARC_DATA_SIZE);

    long bytesleft = marcfile->f_filesize;
    long readsize = MARC_DATA_SIZE >= marcfile->f_filesize ? marcfile->f_filesize : MARC_DATA_SIZE;
    unsigned long crc = 0;

    while(bytesleft)
    {
        if(read(marc->h_fd, buffer, readsize) != readsize)
        {
            errmsg("MARC: Failed to extract data from MARC archive.\n");
            free(buffer);
            close(ofd);
            err = MARC_ERR_OOD;
            return false;
        }

        crc = marc_checksum(buffer, readsize, crc);

        if(write(ofd, buffer, readsize) != readsize)
        {
            errmsg("MARC: Failed to write data to file.\n");
            free(buffer);
            close(ofd);
            err = MARC_ERR_OS; // Get the last error from the OS.
            return false;
        }
        bytesleft -= readsize;
        readsize = MARC_DATA_SIZE >= bytesleft ? bytesleft : MARC_DATA_SIZE;
    }

    // LEF:  I can't quite figure out how the checksum is computed, but I think it has to do with the file header, PLUS the data in the file, or it's checked on the data in the file
    //       minus any BOM's at the start...  So, we'll just rem out this code for now, but I wrote it anyways.
    //if(crc != marcfile->f_checksum)
    //{
    //    warningmsg("MARC: File CRC does not match.  File could be corrupt, or altered.  CRC=0x%08X, expected 0x%08X\n", crc, marcfile->f_checksum);
    //    err = MARC_ERR_CRC;
    //}

    free(buffer);
    close(ofd);

    return true;
}

Here is MY assumed CRC routine (I may have stole this from Stuart Caie and libmspack, I can't recall):

static unsigned long marc_checksum(void *pv, UINT cb, unsigned long seed)
{
    int count = cb / 4;
    unsigned long csum = seed;
    BYTE *p = (BYTE*)pv;
    unsigned long ul;

    while(count-- > 0)
    {
        ul = *p++;
        ul |= (((unsigned long)(*p++)) <<  8);
        ul |= (((unsigned long)(*p++)) << 16);
        ul |= (((unsigned long)(*p++)) << 24);
        csum ^= ul;
    }

    ul = 0;
    switch(cb % 4)
    {
        case 3: ul |= (((unsigned long)(*p++)) << 16);
        case 2: ul |= (((unsigned long)(*p++)) <<  8);
        case 1: ul |= *p++;
        default: break;
    }
    csum ^= ul;

    return csum;
}

Well, I think this post is long enough now... Contact me if you need help, or have questions.

回复收藏 0 原文

~没有更多了~