当前位置：文江博客话题详情

XML 文件的 MD5 C 实现

发布于 2024-08-19 04:03:16 字数 232 浏览 7 评论 0原文

我需要实现 MD5 校验和来验证 XML 文件中的 MD5 校验和，包括所有 XML 标签，并且该文件是从我们的客户端收到的。接收到的MD5校验和的长度为32字节的十六进制数字。

在校验和计算之前，我们需要设置接收到的 XML 文件中的 MD5 校验和字段应为 0，并且我们必须独立计算和验证接收到的 XML 文件中的 MD5 校验和值。

我们的应用程序是用 C 实现的。请协助我如何实现这一点。

谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

自我难过 2024-08-26 04:03:16

这直接取决于用于 XML 解析的库。然而，这很棘手，因为您无法将 MD5 嵌入到 XML 文件本身中，因为在将校验和嵌入到其中之后，除非您仅对特定元素进行校验和。据我了解您独立收到MD5？它是根据整个文件计算的，还是仅根据标签/内容计算的？

MD5 公共域代码链接 - http://www.fourmilab.ch/md5/
XML 库C - http://xmlsoft.org/

确切的解决方案取决于所使用的代码。

根据您的评论，您需要执行以下步骤：

加载 xml 文件（甚至可能是纯文本）读取 MD5
用零替换文件中的 MD5，写下文件（或更好地写入内存）
在纯文件数据并与之前存储的值进行比较

回复收藏 0 原文

旧情别恋 2024-08-26 04:03:16

您应该使用 MD5 的公共域实现，而不是编写自己的实现。我听说 Colin Plumb 的版本被广泛使用。

回复收藏 0 原文

场罚期间 2024-08-26 04:03:16

不要重新发明轮子，使用经过验证的现有解决方案：http:// userpages.umbc.edu/~mabzug1/cs/md5/md5.html

顺便说一句，这是我Google 搜索“md5 c 实现”。

回复收藏 0 原文

小女人ら 2024-08-26 04:03:16

这是相当令人讨厌的。建议的方法似乎意味着您需要将 XML 文档解析为 DOM 树之类的东西，找到 MD5 校验和并将其存储以供将来参考。然后，您可以在重新序列化文档并计算其 MD5 哈希值之前将校验和替换为 0。这一切听起来可行，但可能很棘手。我发现的主要困难是文档的新序列化可能与原始序列化不同，并且不相关的（对于 XML）差异（例如在属性值周围使用单引号或双引号、添加换行符甚至不同的编码）将导致哈希值不同。如果您沿着这条路线走下去，您需要确保您的应用程序和用于创建文档的过程首先做出相同的选择。对于此类问题，规范 XML 是标准解决方案 (http://www.w3.org/ TR/xml-c14n)。

然而，我会做一些不同的事情。运气好的话，应该很容易编写一个正则表达式来定位文件中的 MD5 哈希值并将其替换为 0。然后，您可以使用它来获取哈希值并在重新计算哈希值之前在 XML 文件中将其替换为 0。这避免了解析、更改和重新序列化 XML 文档时可能出现的所有问题。为了说明这一点，我将假设散列“33d4046bea07e89134aecfcaf7e73015”存在于 XML 文件中，如下所示：（

<docRoot xmlns='some-irrelevant-uri>
  <myData>Blar blar</myData>
  <myExtraData number='1'/>
  <docHash MD5='33d4046bea07e89134aecfcaf7e73015' />
  <evenMoreOfMyData number='34'/>
</docRoot>

我将其称为 hash.xml），MD5 应替换为 32 个零（因此散列是正确的）并说明使用 perl、md5 和 bash 在 shell 命令行上执行该过程。（考虑到正则表达式和散列库的存在，希望将其翻译成 C 不会太难。）

分解问题，您首先需要能够找到文件中的散列：（

perl -p -e'if (m#<docHash.+MD5="([a-fA-F0-9]{32})#) {$_ = "$1\n"} else {$_ = ""}' hash.xml

这可以通过查找docHash 元素的 MD5 属性的开头，允许可能的其他属性，然后抓取接下来的 32 个十六进制字符，如果找到它们，则将它们塞入神奇的 $_ 变量中，如果没有，则将 $_ 设置为空，然后每行打印 $_ 的值。这会打印字符串“33d4046bea07e89134aecfcaf7e73015”。）

然后您需要计算文件的哈希值，其中已替换为零：（

perl -p -e's#(<docHash.+MD5=)"([a-fA-F0-9]{32})#$1"000000000000000000000000000000#' hash.xml | md5

其中正则表达式几乎是相同，但这次十六进制字符被零替换并打印整个文件，然后通过 md5 哈希程序将结果与一点 bash 结合起来计算出该文件的 MD5，

if [ `perl -p -e'if (m#<docHash.+MD5="([a-fA-F0-9]{32})#) {$_ = "$1\n"} else {$_ = ""}' hash.xml` = `perl -p -e's#(<docHash.+MD5=)"([a-fA-F0-9]{32})#$1"000000000000000000000000000000#' hash.xml | md5` ] ; then echo OK; else echo ERROR; fi

该程序执行这两个文件。小命令，比较输出，如果输出匹配则打印“OK”，如果不匹配则打印“ERROR”。显然这只是一个简单的原型，并且使用了错误的语言，我认为它说明了最直接的解决方案。

顺便问一下，为什么要将哈希值放在 XML 文档中？据我所知，与在侧通道上传递散列相比，它没有任何优势（甚至像在名为 documentname.md5 的第二个文件中一样简单），并且使散列验证更加困难。

This is rather nasty. The approach suggested seems to imply you need to parse the XML document into something like a DOM tree, find the MD5 checksum and store it for future reference. Then you would replace the checksum with 0 before re-serializing the document and calculating it's MD5 hash. This all sounds doable but potentially tricky. The major difficulty I see is that your new serialization of the document may not be the same as the original one and irrelevant (to XML) differences like the use of single or double quotes around attribute values, added line breaks or even a different encoding will cause the hashs to differ. If you go down this route you'll need to make sure your app and the procedure used to create the document in the first place make the same choices. For this sort of problem canonical XML is the standard solution (http://www.w3.org/TR/xml-c14n).

However, I would do something different. With any luck it should be quite easy to write a regular expression to locate the MD5 hash in the file and replace it with 0. You can then use this to grab the hash and replace with 0 it in the XML file before recalculating the hash. This sidesteps all the possible issues with parsing, changing and re-serializing the XML document. To illustrate I'm going to assume the hash '33d4046bea07e89134aecfcaf7e73015' lives in the XML file like this:

<docRoot xmlns='some-irrelevant-uri>
  <myData>Blar blar</myData>
  <myExtraData number='1'/>
  <docHash MD5='33d4046bea07e89134aecfcaf7e73015' />
  <evenMoreOfMyData number='34'/>
</docRoot>

(which I've called hash.xml), that the MD5 should be replaced by 32 zeros (so the hash is correct) and illustrate the procedure on a shell command line using perl, md5 and bash. (Hopefully translating this into C won't be too hard given the existence of regular expression and hashing libraries.)

Breaking down the problem, you first need to be able to find the hash that is in the file:

perl -p -e'if (m#<docHash.+MD5="([a-fA-F0-9]{32})#) {$_ = "$1\n"} else {$_ = ""}' hash.xml

(this works by looking for the start of the MD5 attribute of the docHash element, allowing for possible other attributes, and then grabbing the next 32 hex characters. If it finds them it bungs them in the magic $_ variable, if not it sets $_ to be empty, then the value of $_ gets printed for each line. This results in the string "33d4046bea07e89134aecfcaf7e73015" being printed.)

Then you need to calculate the hash of the the file with the has replaced with zeros:

perl -p -e's#(<docHash.+MD5=)"([a-fA-F0-9]{32})#$1"000000000000000000000000000000#' hash.xml | md5

(where the regular expression is almost the same, but this time the hex characters are replaced by zeros and the whole file is printed. Then the MD5 of this is calculated by piping the result through an md5 hashing program. Putting this together with a bit of bash gives:

if [ `perl -p -e'if (m#<docHash.+MD5="([a-fA-F0-9]{32})#) {$_ = "$1\n"} else {$_ = ""}' hash.xml` = `perl -p -e's#(<docHash.+MD5=)"([a-fA-F0-9]{32})#$1"000000000000000000000000000000#' hash.xml | md5` ] ; then echo OK; else echo ERROR; fi

which executes those two small commands, compares the output and prints "OK" if the outputs match or "ERROR" if they don't. Obviously this is just a simple prototype, and is in the wrong language, I think it illustrates the most straight forward solution.

Incidentally, why do you put the hash inside the XML document? As far as I can see it doesn't have any advantage compared to passing the hash along on a side channel (even something as simple as in a second file called documentname.md5) and makes the hash validation more difficult.

回复收藏 0 原文