MIME RFC“内容类型”参数混乱? RFC 规范不明确

发布于 2024-09-06 07:09:46 字数 2913 浏览 12 评论 0原文

我正在尝试在 C++/Qt 中为 multipart/lated 实现一个基本的 MIME 解析器。

到目前为止,我一直在为标头编写一些基本的解析器代码,并且我正在阅读 RFC 以了解如何尽可能接近规范。不幸的是,RFC 中有一部分让我有点困惑:

来自 RFC882 第 3.1 节.1:

每个标头字段都可以被视为单个逻辑行 ASCII 字符,包括字段名称和字段主体。 为了方便起见,这个概念的字段主体部分 实体可以拆分为多行表示;这 称为“折叠”。一般规则是,无论哪里 可能是线性空白(不仅仅是 LWSP 字符)、CRLF 紧接着至少一个 LWSP-char 可以改为 插入。因此,单行

好吧,所以我简单地解析一个标头字段,如果 CRLF 后面跟着线性空白,我只需以一种有用的方式将它们连接起来就可以得到一个标头行。让我们继续...

来自 RFC2045 第 5.1 节:

在 RFC 822 的增强 BNF 表示法中,Content-Type 标头字段 值定义如下:

 content := "Content-Type" ":" type "/" subtype
            *(";" parameter)
            ; Matching of media type and subtype
            ; is ALWAYS case-insensitive.

[...]

 parameter := attribute "=" value
 attribute := token
              ; Matching of attributes
              ; is ALWAYS case-insensitive.
 value := token / quoted-string
 token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
             or tspecials>

好的。因此,如果您想指定带有参数的 Content-Type 标头,只需这样做:

Content-Type: multipart/related; foo=bar; something=else

... 同一标头的折叠版本将如下所示:

Content-Type: multipart/related;
    foo=bar;
    something=else

正确吗?好的。当我继续阅读 RFC 时,我在 RFC2387 第 5.1 节中发现了以下内容(示例):

 Content-Type: Multipart/Related; boundary=example-1
         start="<[email protected]>";
         type="Application/X-FixedRecord"
         start-info="-o ps"

 --example-1
 Content-Type: Application/X-FixedRecord
 Content-ID: <[email protected]>

 [data]
 --example-1
 Content-Type: Application/octet-stream
 Content-Description: The fixed length records
 Content-Transfer-Encoding: base64
 Content-ID: <[email protected]>

 [data]

 --example-1--

嗯,这很奇怪。您看到 Content-Type 标头了吗?它有许多参数,但并非所有参数都有“;”作为参数分隔符。

也许我只是没有正确阅读 RFC,但如果我的解析器严格按照规范定义工作,则 typestart-info 参数将生成单个字符串或更糟糕的是,解析器错误。

小伙伴们,对此你们有什么看法呢?只是 RFC 中的拼写错误?或者我错过了什么?

谢谢!

I'm trying to implement a basic MIME parser for the multipart/related in C++/Qt.

So far I've been writing some basic parser code for headers, and I'm reading the RFCs to get an idea how to do everything as close to the specification as possible. Unfortunately there is a part in the RFC that confuses me a bit:

From RFC882 Section 3.1.1:

Each header field can be viewed as a single, logical line of
ASCII characters, comprising a field-name and a field-body.
For convenience, the field-body portion of this conceptual
entity can be split into a multiple-line representation; this
is called "folding". The general rule is that wherever there
may be linear-white-space (NOT simply LWSP-chars), a CRLF
immediately followed by AT LEAST one LWSP-char may instead be
inserted. Thus, the single line

Alright, so I simply parse a header field and if a CRLF follows with linear whitespace, I simply concat those in a useful manner to result in a single header line. Let's proceed...

From RFC2045 Section 5.1:

In the Augmented BNF notation of RFC 822, a Content-Type header field
value is defined as follows:

 content := "Content-Type" ":" type "/" subtype
            *(";" parameter)
            ; Matching of media type and subtype
            ; is ALWAYS case-insensitive.

[...]

 parameter := attribute "=" value
 attribute := token
              ; Matching of attributes
              ; is ALWAYS case-insensitive.
 value := token / quoted-string
 token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
             or tspecials>

Okay. So it seems if you want to specify a Content-Type header with parameters, simply do it like this:

Content-Type: multipart/related; foo=bar; something=else

... and a folded version of the same header would look like this:

Content-Type: multipart/related;
    foo=bar;
    something=else

Correct? Good. As I kept reading the RFCs, I came across the following in RFC2387 Section 5.1 (Examples):

 Content-Type: Multipart/Related; boundary=example-1
         start="<[email protected]>";
         type="Application/X-FixedRecord"
         start-info="-o ps"

 --example-1
 Content-Type: Application/X-FixedRecord
 Content-ID: <[email protected]>

 [data]
 --example-1
 Content-Type: Application/octet-stream
 Content-Description: The fixed length records
 Content-Transfer-Encoding: base64
 Content-ID: <[email protected]>

 [data]

 --example-1--

Hmm, this is odd. Do you see the Content-Type header? It has a number of parameters, but not all have a ";" as parameter delimiter.

Maybe I just didn't read the RFCs correctly, but if my parser works strictly like the specification defines, the type and start-info parameters would result in a single string or worse, a parser error.

Guys, what's your thought on this? Just a typo in the RFCs? Or did I miss something?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清秋悲枫 2024-09-13 07:09:46

这是示例中的拼写错误。即使折叠时,参数也必须始终用分号正确分隔。折叠并不是为了改变标题的语义,只是为了提高可读性并考虑到有行长度限制的系统。

It is a typo in the examples. Parameters must always be delimited with semicolons correctly, even when folded. The folding is not meant to change the semantics of a header, only to allow for readability and to account for systems that have line length restrictions.

海螺姑娘 2024-09-13 07:09:46

很可能是一个拼写错误,但一般来说(根据经验),您也应该能够“在野外”处理这种事情。特别是,邮件客户端在生成有效消息和遵循所有相关规范的能力方面差异很大(如果有的话,电子邮件/SMTP 世界比 WWW 世界更糟糕!)

Quite possibly a typo, but in general (and from experience) you should be able to handle this kind of thing "in the wild" as well. In particular, mail clients vary wildly in their ability to generate valid messages and follow all of the relevant specifications (if anything, it's even worse in the email/SMTP world than it is the WWW world!)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文