最新的 Delphi 编译器版本和 String 类型兼容性
我正在尝试使一些字符串处理例程与 最新的德尔福版本。我正在使用Delphi2005和2007,但我不完全确定兼容性。
这里有一些示例,它们是否与旧的和新的字符串类型兼容? (我将使用一个虚构的 STRING_UNICODE 指令)。
类型定义:
<前><代码>{$IFNDEF UNICODE_STRING} TextBuffer = Char 数组[0..13]; {$其他} TextBuffer = WideChar 数组[0..13]; {$ENDIF}没用还是没用? Char 类型(变成了原来的样子)是 Unicode String 之前的 WideChar,还是仍然有区别?
一个函数:
函数RemoveBlanks(Text: String): String; 瓦尔 我:整数; 开始 结果:= ''; 对于 i:= 0 到长度(文本) 执行 开始 {$IFNDEF UNICODE_STRING} 如果字节(文本[i])< 21 然后继续; {$其他} 如果 Word(Text[i]) < 21 然后继续; {$ENDIF} 如果 Text[i] = ' ' 则继续; 结果 := 结果 + 文本[i]; 结尾;
Word() 转换正常吗?
这里还存在
' '
问题。空间如何处理 Unicode 版本?我还应该使用该指令吗 区分' '
和' '
还是自动处理' '
作为 2 字节空白?换行:
NewLineBegin := CanReadText( aPTextBuffer, #13#10 );
第二个参数 (
#13#10
) 在 Unicode 版本中如何解释?兼容吗?它会被翻译成字节块00130010
吗?如果不是,那么该指令是否应该与常量#0013#0010
一起使用?
I'm trying to make some String processing routines compatible with
newest delphi version. I'm using Delphi2005 and 2007 but I'm not totally sure of the compatibility.
Here are a few samples, are they compatible with both the old and the new string type ?
( I'll use an imaginary STRING_UNICODE directive ).
a Type definition:
{$IFNDEF UNICODE_STRING} TextBuffer = Array[0..13] Of Char; {$ELSE} TextBuffer = Array[0..13] Of WideChar; {$ENDIF}
Useless or not? Is the Char type (becomes what was) a WideChar before the Unicode String, or is there still a difference?
a Function:
Function RemoveBlanks(Text: String): String; Var i: integer; Begin result := ''; For i:= 0 To Length(Text) Do Begin {$IFNDEF UNICODE_STRING} If Byte(Text[i]) < 21 Then Continue; {$ELSE} If Word(Text[i]) < 21 Then Continue; {$ENDIF} If Text[i] = ' ' Then Continue; Result := Result + Text[i]; End;
Is the Word() casting OK?
Here there is also the
' '
problem. How is the space handled
in Unicode version? Should I also use the directive to
differentiate' '
and' '
or will the' '
be automatically handled
as a 2-byte blank?a line jump:
NewLineBegin := CanReadText( aPTextBuffer, #13#10 );
How is the the second argument (
#13#10
) interpreted in the Unicode version? Is it compatible? Will it be translated to the byte block00130010
? If not, then should the directive be used instead with the constant#0013#0010
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先要做的是阅读 Marco Cantú 的论文
Unicode:http://edn.embarcadero.com/article/38980
问题 1< /em>
只要一直使用 Char,没有条件代码,它就可以在旧版和新版中工作。
Char 是一种特殊类型,在旧版本的 Delphi 中是 8 位类型,在新的 Unicode 版本中是 16 位类型。
问题 2
Char 是序数类型,因此您可以编写
if s[i]<#21
。对于字符串,您还需要从 1 开始循环,因为它们使用基于 1 的索引。
问题3
不需要写#0013,#13就可以。
简而言之,几乎所有编写良好的代码都不需要更改。
The first thing to do is read Marco Cantú's paper on
Unicode: http://edn.embarcadero.com/article/38980
Question 1
Just use Char all the time with no conditional code and it will work in old and new.
Char is a special type that is an 8 bit type in old versions of Delphi and a 16 bit type in new Unicode versions.
Question 2
Char is an ordinal type so you can write
if s[i]<#21
.You also need to start loops at 1 for strings since they use 1-based indexing.
Question 3
Writing #0013 is not needed, #13 is fine.
In short almost all well written code will need no changes.
编译器指令
一般来说,我建议您对编译器指令非常保持警惕。它们达到了它们的目的,但对于一般用途,它们可能应该完全避免。
第一个问题是您必须编译您的应用程序并对其进行两次测试,因为它对于指令开/关有根本和/或细微的不同。
对于每个附加指令,这种情况会变得更糟,因为您通常必须排列组合:
D1 On,D2 On
D1 开启,D2 关闭
D1 关闭,D2 开启
D1 Off、D2 Off
3 个指令是 8 种排列...等等。Unicode
字符串
请参阅:使用 Delphi 7 进行开发时,为 Delphi 2009 及更高版本做好准备了吗?
它有一些很好的答案供您考虑。
问题1
如前所述,我建议不要这样做。我在回答上述问题时也出于其他原因建议不要这样做。
更具体地说:
问题 2
不仅因为与问题 1 相同的原因而不明智,而且实际上存在一些微妙的问题。
更精确的
Text
(字符串)类型由您的Delphi 版本决定。所以:此外,还有一些特殊的考虑因素,以及针对“特殊”字符的新支持类。你会想调查一下这些。请参阅:如何在按键时识别 unicode 键?
问题 3
我很确定 #13 将被视为单个字符,因此在 Delphi >=2009 中,Char == WideChar,即字符将占用2个字节。
然而,再次在 Delphi 中查找
Linebreak
常量。System.sLinebreak
可能是在 Kylix 时代引入的。Compiler Directives
In general, I'd advise you to be very wary of compiler directives. They serve their purpose, but for general use, they should probably be avoided altogether.
The first problem is that you have to compile your app and test it twice, because it is fundamentally and/or subtly different for a directive on/off.
This situation get worse for each additional directive, because you usually have to permute the combinations:
D1 On, D2 On
D1 On, D2 Off
D1 Off, D2 On
D1 Off, D2 Off
3 directives is 8 permutations... etc.
Unicode Strings
Please see: Get ready for Delphi 2009 and up when developing with Delphi 7?
It has some nice answers for you to consider.
Question 1
As said, I advise against it. I also advise against for other reasons in my answer to the above mentioned question.
More specifically:
Question 2
Not only is this ill advised for the same reasons as Question 1, but it actually has some subtle problems.
The more precise type of
Text
(String) is determined by your Delphi version. So:Also, there are some special considerations, and new support classes for 'special' characters. You'll want to look into those. Refer to: How to identify unicode keys on key press?
Question 3
I'm pretty sure that #13 will be treated as a single character, so in Delphi >=2009 where Char == WideChar, that character will take up 2 bytes.
However, again look for
Linebreak
constants in Delphi.System.sLinebreak
was probably introduced back in the Kylix days.通用类型
Char
变为基本类型AnsiChar
或基本类型 WideChar(向上阅读关于通用类型与基本类型)。顺便说一句,已经有 UNICODE 符号 $DEFINEd 为您提供,但是根本不需要分支,直到需要特定的字节大小。第二部分有气味,彻底刮掉。这是对类型转换的滥用,并人为地产生了条件编译的需要。要获取给定
Char
的无符号整数字符代码,请使用Ord()
函数(或如其他答案中所述 - 使用ordinal Char 类型的特征)。对于第三部分,字符常量已经是泛型类型 Char。同样,无需担心
#13
会变成字节大小的$0D
或字大小的$0D00
(记住小字节序)Generic type
Char
becomes either fundamental typeAnsiChar
or fundamental type WideChar (read up on generic vs. fundamental types). BTW, there is UNICODE symbol $DEFINEd for you already, however there is no need to branch at all, until specific byte size is required.Second part smells, scratch it completely. It is an abuse of typecasts and creates a need for conditional compilation artifically. To get unsigned integer character code of given
Char
useOrd()
function instead (or as said in the other answer - use ordinal traits of Char type).For the third part, character constants are of generic type Char already. Again, there is no need to worry about,
#13
becomes either byte sized$0D
or word sized$0D00
(remember about little endianess)