如何对 JSON 对象进行加密哈希?
下面的问题比乍看起来更复杂。
假设我有一个任意 JSON 对象,该对象可能包含任意数量的数据,包括其他嵌套的 JSON 对象。我想要的是 JSON 数据的加密哈希/摘要,而不考虑实际的 JSON 格式本身(例如:忽略换行符和 JSON 令牌之间的间距差异)。
最后一部分是一项要求,因为 JSON 将由许多不同平台上的各种(反)序列化器生成/读取。我知道至少有一个 Java JSON 库可以在反序列化期间读取数据时完全删除格式。因此它会破坏哈希值。
上面的任意数据子句也使事情变得复杂,因为它阻止我按照给定的顺序获取已知字段并在进行哈希之前将它们连接起来(大致考虑一下 Java 的非加密 hashCode() 方法是如何工作的)。
最后,将整个 JSON 字符串散列为字节块(在反序列化之前)也是不可取的,因为在计算散列时应忽略 JSON 中的某些字段。
我不确定这个问题有一个好的解决方案,但我欢迎任何方法或想法=)
The following question is more complex than it may first seem.
Assume that I've got an arbitrary JSON object, one that may contain any amount of data including other nested JSON objects. What I want is a cryptographic hash/digest of the JSON data, without regard to the actual JSON formatting itself (eg: ignoring newlines and spacing differences between the JSON tokens).
The last part is a requirement, as the JSON will be generated/read by a variety of (de)serializers on a number of different platforms. I know of at least one JSON library for Java that completely removes formatting when reading data during deserialization. As such it will break the hash.
The arbitrary data clause above also complicates things, as it prevents me from taking known fields in a given order and concatenating them prior to hasing (think roughly how Java's non-cryptographic hashCode() method works).
Lastly, hashing the entire JSON String as a chunk of bytes (prior to deserialization) is not desirable either, since there are fields in the JSON that should be ignored when computing the hash.
I'm not sure there is a good solution to this problem, but I welcome any approaches or thoughts =)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
在计算任何允许灵活性的数据格式的哈希值时,这个问题是一个常见问题。为了解决这个问题,您需要对表示进行规范化。
例如,Twitter 和其他服务用于身份验证的 OAuth1.0a 协议需要请求消息的安全哈希值。要计算哈希值,OAuth1.0a 表示您需要首先按字母顺序排列字段,用换行符分隔它们,删除字段名称(众所周知),并使用空行表示空值。签名或散列是根据该规范化的结果计算的。
XML DSIG 的工作方式相同 - 您需要在签名之前对 XML 进行规范化。有一个拟议的 W3 标准涵盖了这一点,因为这是一个基本要求签署。有些人称之为 c14n。
我不知道 json 的规范化标准。值得研究。
如果没有,您当然可以为您的特定应用程序使用建立一个约定。合理的开始可能是:
您可能还想考虑如何在 JSON 对象中传递该签名 - 可能会建立一个众所周知的属性名称,例如“nichols-hmac”或其他名称,以获取哈希的 Base64 编码版本。散列算法必须明确排除该属性。然后,任何 JSON 接收者都可以检查哈希值。
规范化表示不必是您在应用程序中传递的表示。只需给定任意 JSON 对象即可轻松生成它。
The problem is a common one when computing hashes for any data format where flexibility is allowed. To solve this, you need to canonicalize the representation.
For example, the OAuth1.0a protocol, which is used by Twitter and other services for authentication, requires a secure hash of the request message. To compute the hash, OAuth1.0a says you need to first alphabetize the fields, separate them by newlines, remove the field names (which are well known), and use blank lines for empty values. The signature or hash is computed on the result of that canonicalization.
XML DSIG works the same way - you need to canonicalize the XML before signing it. There is a proposed W3 standard covering this, because it's such a fundamental requirement for signing. Some people call it c14n.
I don't know of a canonicalization standard for json. It's worth researching.
If there isn't one, you can certainly establish a convention for your particular application usage. A reasonable start might be:
You may also want to think about how to pass that signature in the JSON object - possibly establish a well-known property name, like "nichols-hmac" or something, that gets the base64 encoded version of the hash. This property would have to be explicitly excluded by the hashing algorithm. Then, any receiver of the JSON would be able to check the hash.
The canonicalized representation does not need to be the representation you pass around in the application. It only needs to be easily produced given an arbitrary JSON object.
您可能不想发明自己的 JSON 标准化/规范化,而是使用 bencode。从语义上讲,它与 JSON(数字、字符串、列表和字典的组合)相同,但具有加密哈希所需的明确编码属性。
Bencode 用作 Torrent 文件格式,每个 BitTorrent 客户端都包含一个实现。
Instead of inventing your own JSON normalization/canonicalization you may want to use bencode. Semantically it's the same as JSON (composition of numbers, strings, lists and dicts), but with the property of unambiguous encoding that is necessary for cryptographic hashing.
bencode is used as a torrent file format, every bittorrent client contains an implementation.
这与导致 S/MIME 签名和 XML 签名出现问题的问题相同。也就是说,待签名的数据有多种等效表示。
例如,在 JSON 中:
vs.
Or 根据您的应用程序,这甚至可能是等效的:
规范化可以解决该问题,但这是您根本不需要的问题。
如果您可以控制规范,那么简单的解决方案是将对象包装在某种容器中,以防止其转换为“等效”但不同的表示形式。
即通过不签署“逻辑”对象而是签署它的特定序列化表示来避免该问题。
例如,JSON 对象 -> UTF-8 文本 ->字节。将字节签名为字节,然后将它们作为字节传输,例如通过base64编码。由于您正在对字节进行签名,因此空格之类的差异是签名内容的一部分。
不要尝试这样做:
只需这样做:
即不要签署 JSON,而是签署编码 JSON 的字节。
是的,这意味着签名不再透明。
This is the same issue as causes problems with S/MIME signatures and XML signatures. That is, there are multiple equivalent representations of the data to be signed.
For example in JSON:
vs.
Or depending on your application, this may even be equivalent:
Canonicalization could solve that problem, but it's a problem you don't need at all.
The easy solution if you have control over the specification is to wrap the object in some sort of container to protect it from being transformed into an "equivalent" but different representation.
I.e. avoid the problem by not signing the "logical" object but signing a particular serialized representation of it instead.
For example, JSON Objects -> UTF-8 Text -> Bytes. Sign the bytes as bytes, then transmit them as bytes e.g. by base64 encoding. Since you are signing the bytes, differences like whitespace are part of what is signed.
Instead of trying to do this:
Just do this:
I.e. don't sign the JSON, sign the bytes of the encoded JSON.
Yes, it means the signature is no longer transparent.
JSON-LD 可以进行规范化。
您必须定义您的上下文。
JSON-LD can do normalitzation.
You will have to define your context.
RFC 7638:JSON Web 密钥 (JWK) 指纹包括一种规范化类型。尽管 RFC7638 期望成员数量有限,但我们可以对任何成员应用相同的计算。
https://www.rfc-editor.org/rfc/rfc7638#section- 3
RFC 7638: JSON Web Key (JWK) Thumbprint includes a type of canonicalization. Although RFC7638 expects a limited set of members, we would be able to apply the same calculation for any member.
https://www.rfc-editor.org/rfc/rfc7638#section-3
理想的情况是 JavaScript 本身为 JavaScript 对象定义了一个正式的哈希过程。
然而我们确实有 RFC-8785 JSON 规范化方案 (JCS) 希望可以在大多数 JSON 库中实现,特别是添加到流行的 JavaScript JSON 对象中。完成标准化后,只需应用您首选的哈希算法即可。
如果 JCS 在浏览器和其他工具和库中可用,那么期望大多数在线 JSON 采用这种常见的规范化形式就变得合理了。像这样的标准的共同一致应用和验证可以在一定程度上抵御微不足道的安全威胁。
What would be ideal is if JavaScript itself defined a formal hashing process for JavaScript Objects.
Yet we do have RFC-8785 JSON Canonicalization Scheme (JCS) which hopefully can be implemented in most libs for JSON and in particular added to popular JavaScript JSON object. With this canonicalization done it is just a matter of applying your preferred hashing algorithm.
If JCS is available in browsers and other tools and libs it becomes reasonable to expect most JSON on-the-wire to be in this common canonicalized form. Common consistent application and verification of standards like this can go some way to pushing back against trivial security threats.
我会按照给定的顺序(例如按字母顺序)处理所有字段。为什么任意数据都会产生影响?您可以只迭代属性(ala 反射)。
或者,我会考虑将原始 json 字符串转换为某种定义明确的规范形式(删除所有多余的格式) - 并对其进行哈希处理。
I would do all fields in a given order (alphabetically for example). Why does arbitrary data make a difference? You can just iterate over the properties (ala reflection).
Alternatively, I would look into converting the raw json string into some well defined canonical form (remove all superflous formatting) - and hashing that.
我们在对 JSON 编码的有效负载进行哈希处理时遇到了一个简单的问题。
在我们的例子中,我们使用以下方法:
使用此解决方案的优点:
缺点
We encountered a simple issue with hashing JSON-encoded payloads.
In our case we use the following methodology:
Advantages of using this solution:
Disadvantages
所以我不确定为什么这里没有提到库,但你可以使用类似 的东西https://www.npmjs.com/package/@tufjs/canonical-json 作为第一步,然后是您选择的任何哈希算法。
So I am not sure why there is no library mentioned here but you could just use something like https://www.npmjs.com/package/@tufjs/canonical-json as first step and afterwards any hash algorithm of your choice.