C#:用于解码 Quoted-Printable 编码的类?
C# 中是否存在可以将 Quoted-Printable 编码转换为 String 的现有类?单击上面的链接以获取有关编码的更多信息。
以下内容摘自上述链接,方便您参考。
可以对任何 8 位字节值进行编码 包含 3 个字符,后跟一个“=” 两个十六进制数字(0–9 或 A–F) 表示字节的数值。 例如,US-ASCII 换页符 字符(十进制值12)可以是 由“=0C”和 US-ASCII 表示 等号(十进制值 61)是 用“=3D”表示。所有角色 除了可打印的 ASCII 字符或 行尾字符必须进行编码 以这种方式。
所有可打印的 ASCII 字符 (33 到 126 之间的十进制值) 可以由他们自己代表, 除了“=”(十进制 61)。
ASCII 制表符和空格字符, 十进制值 9 和 32 可能是 由他们自己代表,除非 这些字符出现在末尾 一条线。如果这些字符之一 出现在一行的末尾,它必须 编码为“=09”(制表符)或“=20” (空格)。
如果正在编码的数据包含 有意义的换行符,它们必须是 编码为 ASCII CR LF 序列, 不作为它们的原始字节值。 相反,如果字节值 13 和 10 具有行尾以外的含义 那么它们必须被编码为 =0D 并且 =0A。
引用可打印编码数据行 不得超过 76 个字符。 为了满足这个要求,无需 改变编码文本,软线 可以根据需要添加中断。软软的 换行符由“=”组成 编码行的末尾,并且不 导致解码后换行 文本。
Is there an existing class in C# that can convert Quoted-Printable encoding to String
? Click on the above link to get more information on the encoding.
The following is quoted from the above link for your convenience.
Any 8-bit byte value may be encoded
with 3 characters, an "=" followed by
two hexadecimal digits (0–9 or A–F)
representing the byte's numeric value.
For example, a US-ASCII form feed
character (decimal value 12) can be
represented by "=0C", and a US-ASCII
equal sign (decimal value 61) is
represented by "=3D". All characters
except printable ASCII characters or
end of line characters must be encoded
in this fashion.All printable ASCII characters
(decimal values between 33 and 126)
may be represented by themselves,
except "=" (decimal 61).ASCII tab and space characters,
decimal values 9 and 32, may be
represented by themselves, except if
these characters appear at the end of
a line. If one of these characters
appears at the end of a line it must
be encoded as "=09" (tab) or "=20"
(space).If the data being encoded contains
meaningful line breaks, they must be
encoded as an ASCII CR LF sequence,
not as their original byte values.
Conversely if byte values 13 and 10
have meanings other than end of line
then they must be encoded as =0D and
=0A.Lines of quoted-printable encoded data
must not be longer than 76 characters.
To satisfy this requirement without
altering the encoded text, soft line
breaks may be added as desired. A soft
line break consists of an "=" at the
end of an encoded line, and does not
cause a line break in the decoded
text.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
框架库中有执行此操作的功能,但它似乎没有完全公开。该实现位于内部类
System.Net.Mime.QuotedPrintableStream
中。此类定义了一个名为DecodeBytes
的方法,它可以执行您想要的操作。该方法似乎仅由一种用于解码 MIME 标头的方法使用。该方法也是内部方法,但在几个地方相当直接地调用,例如Attachment.Name
setter。演示:产生输出:
您可能需要进行一些测试以确保回车符等得到正确处理,尽管在快速测试中我似乎确实如此。但是,依赖此功能可能并不明智,除非您的用例足够接近 MIME 标头字符串的解码,并且您认为对库所做的任何更改都不会破坏它。您最好编写自己的引用可打印解码器。
There is functionality in the framework libraries to do this, but it doesn't appear to be cleanly exposed. The implementation is in the internal class
System.Net.Mime.QuotedPrintableStream
. This class defines a method calledDecodeBytes
which does what you want. The method appears to be used by only one method which is used to decode MIME headers. This method is also internal, but is called fairly directly in a couple of places, e.g., theAttachment.Name
setter. A demonstration:Produces the output:
You may have to do some testing to ensure carriage returns, etc are treated correctly although in a quick test I did they seem to be. However, it may not be wise to rely on this functionality unless your use-case is close enough to decoding of a MIME header string that you don't think it will be broken by any changes made to the library. You might be better off writing your own quoted-printable decoder.
我扩展了马丁·墨菲的解决方案,我希望它适用于所有情况。
I extended the solution of Martin Murphy and I hope it will work in every case.
我写得很快。
I wrote this up real quick.
我一直在寻找动态解决方案,并花了 2 天尝试不同的解决方案。此解决方案将支持日语字符和其他标准字符集
然后您可以使用此调用该函数
最初发现此处
I was looking for a dynamic solution and spent 2 days trying different solutions. This solution will support Japanese characters and other standard character sets
Then you can call the function with
This was originally found here
如果您使用 UTF-8 编码解码引用的可打印序列,则需要注意,如果存在一起运行的引用的可打印字符,则无法一次解码每个引用的可打印序列,正如其他人所显示的那样。
例如 - 如果您有以下序列 =E2=80=99 并使用 UTF8 一次解码它,您会得到三个“奇怪”的字符 - 如果您改为构建一个三个字节的数组并将这三个字节转换为UTF8 编码你会得到一个单撇号。
显然,如果您使用 ASCII 编码,那么一次一个是没有问题的,但是解码运行意味着无论使用什么文本编码器,您的代码都可以工作。
哦,别忘了 =3D 是一种特殊情况,这意味着您需要再次解码您拥有的任何内容......这是一个疯狂的陷阱!
希望有帮助
If you are decoding quoted-printable with UTF-8 encoding you will need to be aware that you cannot decode each quoted-printable sequence one-at-a-time as the others have shown if there are runs of quoted printable characters together.
For example - if you have the following sequence =E2=80=99 and decode this using UTF8 one-at-a-time you get three "weird" characters - if you instead build an array of three bytes and convert the three bytes with the UTF8 encoding you get a single aphostrope.
Obviously if you are using ASCII encoding then one-at-a-time is no problem however decoding runs means your code will work regardless of the text encoder used.
Oh and don't forget =3D is a special case that means you need to decode whatever you have one more time... That is a crazy gotcha!
Hope that helps
这个引用的可打印解码器效果很好!
This Quoted Printable Decoder works great!
唯一对我有用的。
http://sourceforge.net/apps/trac/syncmldotnet/wiki/Quoted%20Printable
如果您只需要解码 QP,请从上面的链接中提取代码中的这三个函数:
然后:
享受
The only one that worked for me.
http://sourceforge.net/apps/trac/syncmldotnet/wiki/Quoted%20Printable
If you just need to decode the QPs, pull inside of your code those three functions from the link above:
And then just:
Enjoy
更好的解决方案
Better solution
有时,EML 文件中的字符串由多个编码部分组成。这是一个在这些情况下使用 Dave 方法的函数:
Sometimes the string into an EML file is composed by several encoded parts. This is a function to use the Dave's method for these cases:
请注意:
互联网上到处都有“input.Replace”的解决方案,但它们仍然不正确。
看,如果您有一个解码符号,然后使用“替换”,
“输入”中的所有符号将被替换,然后所有后续解码将被破坏。
更正确的解决方案:
Please note:
solutions with "input.Replace" are all over Internet and still they are not correct.
See, if you have ONE decoded symbol and then use "replace",
ALL symbols in "input" will be replaced, and then all following decoding will be broken.
More correct solution:
我知道这是老问题,但这应该有帮助
I know its old question, but this should help
Martin Murphy 的(非工作)代码的一点改进版本:
A bit improved version of (non-working) code from Martin Murphy:
从 @Dave 解决方案开始,这会使用多种编码来解码带引号的可打印字符串,例如
"=?utf-8?Q?Firststring?=\t=?utf-8?Q?_-_1.250=2C50_=E2=82=AC=5F1000=5F2646.pdf?="
Starting from @Dave solution, this decodes quoted printable strings with more than one encoding, for example
"=?utf-8?Q?Firststring?=\t=?utf-8?Q?_-_1.250=2C50_=E2=82=AC=5F1000=5F2646.pdf?="