当前位置：文江博客话题详情

我需要一个例子来理解 ASN.1 中的隐式标记

发布于 2024-09-11 02:17:52 字数 180 浏览 7 评论 0原文

我一直在阅读以下教程

http://www.obj-sys.com/asn1tutorial /node12.html

你能用一个例子帮助我理解隐式标记吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

江湖正好 2024-09-18 02:17:52

实际上，在 ASN.1 中，标记有两个目的：键入和命名。键入意味着它告诉编码器/解码器是什么类型的数据类型（是字符串、整数、布尔值、集合等），命名意味着如果存在多个相同类型的字段并且某些（或全部）是可选的，它告诉编码器/解码器该值是哪个字段。

如果将 ASN.1 与 JSON 进行比较，并查看以下 JSON 数据：

"Image": {
    "Width":  800,
    "Height": 600,
    "Title":  "View from 15th Floor"
}

您会注意到，在 JSON 中，每个字段总是显式命名（“图像”、“宽度”、“高度”、“标题”） ") 并且显式或隐式键入（“Title”是一个字符串，因为它的值被引号括起来，“Width”是一个整数，因为它没有引号，只有数字，它不是“null”、“true”或“假”，并且没有小数点）。

在 ASN.1 中，这条数据将是：

Image ::= SEQUENCE { 
    Width  INTEGER,
    Height INTEGER,
    Title  UTF8String
 }

这将在没有任何特殊标记的情况下工作，这里只需要通用标记。通用标签不命名数据，它们只是输入数据，所以 en-/解码器知道前两个值是整数，最后一个是字符串。第一个整数是宽度，第二个整数是高度，不需要在字节流中编码，它是由它们的顺序定义的（序列有固定的顺序，集合没有。在您提到的页面上，集合是正在使用）。

现在按如下方式更改架构：

Image ::= SEQUENCE { 
    Width  INTEGER OPTIONAL,
    Height INTEGER OPTIONAL,
    Title  UTF8String
 }

好的，现在我们有一个问题。假设收到以下数据：

INTEGER(750), UTF8String("A funny kitten")

750 是多少？宽度还是高度？可以是宽度（并且缺少高度），也可以是高度（并且缺少宽度），两者看起来都与二进制流相同。在 JSON 中，这一点会很清楚，因为每条数据都有命名，但在 ASN.1 中则不然。现在仅类型是不够的，现在我们还需要一个名称。这就是非通用标签发挥作用的地方。将其更改为：

Image ::= SEQUENCE { 
    Width  [0] INTEGER OPTIONAL,
    Height [1] INTEGER OPTIONAL,
    Title  UTF8String
 }

如果您收到以下数据：

[1]INTEGER(750), UTF8String("A funny kitten")

您知道 750 是高度而不是宽度（根本没有宽度）。在这里，您声明一个新标签（在这种情况下是特定于上下文的标签），它有两个用途：它告诉编码器/解码器这是一个整数值（键入），并告诉它这是哪个整数值（命名）。

但是隐式标记和显式标记之间有什么区别？区别在于隐式标记只是命名数据，en-/解码器需要知道该名称的隐式类型 ，同时显式标记名称并显式键入数据。

如果标记是显式的，则数据将发送为：

[1]INTEGER(xxx), UTF8String(yyy)

因此，即使解码器不知道 [1] 表示高度，它也知道字节“xxx”将被解析/解释为整数值。显式标记的另一个重要优点是将来可以更改类型而无需更改标记。例如

Length ::= [0] INTEGER

可以改为

Length ::= [0] CHOICE { 
    integer INTEGER,
    real    REAL 
}

Tag[0]仍然表示长度，但现在长度可以是整数或浮点值。由于类型是显式编码的，解码器将始终知道如何正确解码该值，因此这种更改是向前和向后兼容的（至少在解码器级别，不一定在应用程序级别向后兼容）。

如果标记是隐式的，则数据将被发送为：

[1](xxx), UTF8String(yyy)

不知道 [1] 是什么的解码器，将不知道“xxx”的类型，因此无法正确解析/解释该数据。与 JSON 不同，ASN.1 中的值只是字节。因此“xxx”可能是一个、两个、三个或四个字节，如何解码这些字节取决于它们的数据类型，而数据类型本身并未提供。此外，更改 [1] 的类型肯定会破坏现有的解码器。

好的，但是为什么有人会使用隐式标记呢？始终使用显式标记不是更好吗？对于显式标记，类型还必须在数据流中进行编码，这将需要每个标记两个额外的字节。对于包含数千个（甚至可能数百万个）标签的数据传输，并且可能每个字节都很重要（连接非常慢、数据包很小、丢包率很高、处理设备非常弱）并且双方都知道所有自定义标签，为什么要浪费带宽用于编码、传输和解码不必要的类型信息的存储器、存储和/或处理时间？

请记住，ASN.1 是一个相当古老的标准，它的目的是在网络带宽非常昂贵且处理器比现在慢数百倍的时代实现高度紧凑的数据表示。如果您查看当今所有的 XML 和 JSON 数据传输，就会发现每个标签节省两个字节似乎是荒谬的。

In ASN.1 tagging, in fact, serves two purposes: typing and naming. Typing means it tells an en-/decoder what kind of data type that is (is it a string, an integer, a boolean, a set, etc.), naming means that if there are multiple fields of the same type and some (or all of them) are optional, it tells the en-/decoder for which field that value is.

If you compare ASN.1 to, let's say, JSON, and you look at the following JSON data:

"Image": {
    "Width":  800,
    "Height": 600,
    "Title":  "View from 15th Floor"
}

You'll notice that in JSON every field is always explicitly named ("Image", "Width", "Height", "Title") and either explicitly or implicitly typed ("Title" is a string, because its value is surrounded by quotes, "Width" is an integer, because it has no quotes, only digits, it's not "null", "true" or "false", and it has no decimal period).

In ASN.1 this piece of data would be:

Image ::= SEQUENCE { 
    Width  INTEGER,
    Height INTEGER,
    Title  UTF8String
 }

This will work without any special tagging, here only the universal tags are required. Universal tags don't name data, they just type data, so en-/decoder know that the first two values are integers and the last one is a string. That the first integer is Width and the second one is Height doesn't need to be encoded in the byte stream, it is defined by their order (sequences have a fixed order, sets don't. On the page you referred to sets are being used).

Now change the schema as follows:

Image ::= SEQUENCE { 
    Width  INTEGER OPTIONAL,
    Height INTEGER OPTIONAL,
    Title  UTF8String
 }

Okay, now we have a problem. Assume that the following data is received:

INTEGER(750), UTF8String("A funny kitten")

What is 750? Width or Height? Could be Width (and Height is missing) or could be Height (and Width is missing), both would look the same as a binary stream. In JSON that would be clear as every piece of data is named, in ASN.1 it isn't. Now a type alone isn't enough, now we also need a name. That's where the non-universal tags enter the game. Change it to:

Image ::= SEQUENCE { 
    Width  [0] INTEGER OPTIONAL,
    Height [1] INTEGER OPTIONAL,
    Title  UTF8String
 }

And if you receive the following data:

[1]INTEGER(750), UTF8String("A funny kitten")

You know that 750 is the Height and not the Width (there simply is no Width). Here you declare a new tag (in that case a context specific one) that serves two purposes: It tells the en-/decoder that this is an integer value (typing) and it tells it which integer value that is (naming).

But what is the difference between implicit and explicit tagging? The difference is that implicit tagging just names the data, the en-/decoder needs to know the type implicitly for that name, while explicit tagging names and explicitly types the data.

If tagging is explicit, the data will be sent as:

[1]INTEGER(xxx), UTF8String(yyy)

so even if a decoder has no idea that [1] means Height, it knows that the bytes "xxx" are to be parsed/interpreted as an integer value. Another important advantage of explicit tagging is that the type can be changed in the future without changing the tag. E.g.

Length ::= [0] INTEGER

can be changed to

Length ::= [0] CHOICE { 
    integer INTEGER,
    real    REAL 
}

Tag [0] still means length, but now length can either be an integer or a floating point value. Since the type is encoded explicitly, decoders will always know how to correctly decode the value and this change is thus forward and backward compatible (at least at decoder level, not necessarily backward compatible at application level).

If tagging is implicit, the data will be sent as:

[1](xxx), UTF8String(yyy)

A decoder that doesn't know what [1] is, will not know the type of "xxx" and thus cannot parse/interpret that data correctly. Unlike JSON, values in ASN.1 are just bytes. So "xxx" may be one, two, three or maybe four bytes and how to decode those bytes depends on their data type, which is not provided in the data stream itself. Also changing the type of [1] will break existing decoders for sure.

Okay, but why would anyone use implicit tagging? Isn't it better to always use explicit tagging? With explicit tagging, the type must also be encoded in the data stream and this will require two additional bytes per tag. For data transmissions containing several thousand (maybe even millions of) tags and where maybe every single byte counts (very slow connection, tiny packets, high packet loss, very weak processing devices) and where both sides know all custom tags anyway, why wasting bandwidth, memory, storage and/or processing time for encoding, transmitting and decoding unnecessary type information?

Keep in mind that ASN.1 is a rather old standard and it was intended to achieve a highly compact representation of data at a time where network bandwidth was very expensive and processors several hundred times slower than today. If you look at all the XML and JSON data transfers of today, it seems ridiculous to even think about saving two bytes per tag.

回复收藏 0 原文

流绪微梦 2024-09-18 02:17:52

我发现这个帖子足够清楚，它还包含（小）的例子甚至很难，它们是相当“极端”的例子。使用隐式标记的更“现实”的示例可以在此页面中找到。

回复收藏 0 原文

仅此而已 2024-09-18 02:17:52

使用接受的答案作为编码示例：

Image ::= SEQUENCE { 
    Width  INTEGER,
    Height INTEGER,
    Title  UTF8String
}

编码示例如下：

内部序列分为：

显式可选

如果您有 EXPLICIT可选值：

Image ::= SEQUENCE { 
    Width  [0] EXPLICIT INTEGER OPTIONAL,
    Height [1] EXPLICIT INTEGER OPTIONAL,
    Title  UTF8String
}

编码序列可能是：

SEQUENCE 30 15 A1 02 02 02 EE 0C 0E 41 20 66 75 6E 6E 79 20 6B 69 74 74 65 6E （21字节）

内部序列分解为：

CONTEXT[1] INTEGER: A1 02 02 02 EE 750（2 字节）
UTF8STRING：0C 0E 41 20 66 75 6E 6E 79 20 6B 69 74 74 65 6E “一只有趣的小猫”(14 -字节）

Using the accepted answer as an example of encoding:

Image ::= SEQUENCE { 
    Width  INTEGER,
    Height INTEGER,
    Title  UTF8String
}

An example of encoding would be:

The internal sequence breaks down into:

Explicit Optional

If you then have EXPLICIT OPTIONAL values:

Image ::= SEQUENCE { 
    Width  [0] EXPLICIT INTEGER OPTIONAL,
    Height [1] EXPLICIT INTEGER OPTIONAL,
    Title  UTF8String
}

The encoded sequence might be:

SEQUENCE 30 15 A1 02 02 02 EE 0C 0E 41 20 66 75 6E 6E 79 20 6B 69 74 74 65 6E (21-bytes)

And the internal sequence breaks down into:

CONTEXT[1] INTEGER: A1 02 02 02 EE 750 (2-bytes)
UTF8STRING: 0C 0E 41 20 66 75 6E 6E 79 20 6B 69 74 74 65 6E "A funny kitten" (14-bytes)