在哪里可以找到 MSIL 字符串常量中的转义字符列表?
我编写了一个程序(用 C# 编写),用于读取和操作从 C# 程序生成的 MSIL 程序。我错误地认为 MSIL 字符串常量的语法规则与 C# 相同,但后来遇到了以下情况:
此 C# 语句
string s = "Do you wish to send anyway?";
被编译为(以及其他 MSIL 语句),
IL_0128: ldstr "Do you wish to send anyway\?"
我没想到会出现反斜杠用于逃避问号。现在,我显然可以在处理过程中考虑这个反斜杠,但主要是出于好奇,我想知道当 C# 编译器将 C# 常量字符串转换为 MSIL 常量字符串时,是否有一个列表,其中的字符会被转义。
谢谢。
I've written a program (in C#) that reads and manipulates MSIL programs that have been generated from C# programs. I had mistakenly assumed that the syntax rules for MSIL string constants are the same as for C#, but then I ran into the following situation:
This C# statement
string s = "Do you wish to send anyway?";
gets compiled into (among other MSIL statements) this
IL_0128: ldstr "Do you wish to send anyway\?"
I wasn't expecting the backslash that is used to escape the question mark. Now I can obviously take this backslash into account as part of my processing, but mostly out of curiosity I'd like to know if there is a list somewhere of which characters get escaped when the C# compiler converts C# constant strings to MSIL constant strings.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
更新
基于使用 C# 编译器 + ildasm.exe 进行的实验:也许没有转义字符列表的原因是因为转义字符太少:恰好是 6。
来自 ildasm 生成的 IL ,来自 Visual Studio 2010 编译的 C# 程序:
\t
:0x09:(制表符)\n
:0x0A:(换行符)\r
: 0x0D : (回车)\"
: 0x22 : (双引号)\?
: 0x3F : (问号)\\
: 0x5C : (反斜杠)示例 1: 0x7E 以上的 ASCII:简单的重音 é (U+00E9)。
C#:
"é"
或"\u00E9"
变为(E9
字节首先)示例2: UTF-16:求和符号 Σ (U+2211)
C#:
"Σ"
或"\u2211"
变为 (11< /代码>字节首先)
示例 3: UTF-32:双击数学
Update
Based on experimentation using the C# compiler + ildasm.exe: perhaps the reason there is no list of escaped characters is because there are so few: precisely 6.
Going from the IL generated by ildasm, from C# programs compiled by Visual Studio 2010:
\t
: 0x09 : (tab)\n
: 0x0A : (newline)\r
: 0x0D : (carriage return)\"
: 0x22 : (double quote)\?
: 0x3F : (question mark)\\
: 0x5C : (backslash)Example 1: ASCII above 0x7E: A simple accented é (U+00E9)
C#: Either
"é"
or"\u00E9"
becomes (E9
byte comes first)Example 2: UTF-16: Summation symbol ∑ (U+2211)
C#: Either
"∑"
or"\u2211"
becomes (11
byte comes first)Example 3: UTF-32: Double-struck mathematical ???? (U+1D538)
C#: Either
"????"
or UTF-16 surrogate pair"\uD835\uDD38"
becomes (bytes within char reversed, but double-byte chars in overall order)Example 4: Byte array conversion is for an entire string containing a non-Ascii character
C#:
"In the last decade, the German word \"über\" has come to be used frequently in colloquial English."
becomesDirectly, "you can't" (find a list of MSIL string escapes), but here are some helpful tidbits...
ECMA-335, which contains the strict definition of CIL, does not specify which characters must be escaped in QSTRING literals, only that they may be escaped using the backslash
\
character. The most important notes are:\042
, not\u0022
).\
character--see belowThe only explicitly mentioned escapes are tab
\t
, linefeed\n
, and octal numeric escapes. This is a bit annoying for you purposes since C# does not have an octal literal -- you'll have to do your own extraction and conversion, such as by using theConvert.ToInt32([string], 8)
method.Beyond that the choice of escapes is "implementation-specific" to the "hypothetical IL assembler" described in the spec. So your question rightly asks about the rules for MSIL, which is Microsoft's strict implementation of CIL. As far as I can tell, MS has not documented their choice of escapes. It could be helpful at least to ask the Mono folks what they use. Beyond that, it may be a matter of generating the list yourself -- make a program that declares a string literal for every character
\u0000
- whatever, and see what the compiledldstr
statements are. If I get to it first, I'll be sure to post my results.Additional notes:
To properly parse *IL string literals -- known as QSTRINGS or SQSTRINGS -- you will have to account for more than just character escapes. Take in-code string concatenation, for example (and this is verbatim from Partition II::5.2):