Delphi XE - RawByteString 与 AnsiString
我在这里有一个类似的问题: Delphi XE - 我应该使用 String还是 AnsiString? 。在决定在我的(大型)库中使用 ANSI 字符串是正确的之后,我意识到我实际上可以使用 RawByteString 而不是 ANSI。因为我将 UNICODE 字符串与 ANSI 字符串混合在一起,所以我的代码现在很少有地方可以在它们之间进行转换。然而,看起来如果我使用 RawByteString 我就可以摆脱这些转换。
请让我知道您对此的看法。
谢谢。
更新:
这似乎令人失望。看起来编译器仍然进行从 RawByteString 到字符串的转换。
procedure TForm1.FormCreate(Sender: TObject);
var x1, x2: RawByteString;
s: string;
begin
x1:= 'a';
x2:= 'b';
x1:= x1+ x2;
s:= x1; { <------- Implicit string cast from 'RawByteString' to 'string' }
end;
我认为它做了一些内部工作(例如复制数据),并且我的代码不会快得多,而且我仍然需要在代码中添加大量类型转换以使编译器保持沉默。
I had a similar question to this here: Delphi XE - should I use String or AnsiString? . After deciding that it is right to use ANSI strings in a (large) library of mine, I have realized that I can actually use RawByteString instead of ANSI. Because I mix UNICODE strings with ANSI strings, my code now has quite few places where it does conversions between them. However, it looks like if I use RawByteString I get rid of those conversions.
Please let me know your opinion about it.
Thanks.
Update:
This seems to be disappointing. It looks like the compiler still makes a conversion from RawByteString to string.
procedure TForm1.FormCreate(Sender: TObject);
var x1, x2: RawByteString;
s: string;
begin
x1:= 'a';
x2:= 'b';
x1:= x1+ x2;
s:= x1; { <------- Implicit string cast from 'RawByteString' to 'string' }
end;
I think it does some internal workings (such as copying data) and my code will not be much faster and I will still have to add lots of typecasts in my code in order to silence the compiler.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
RawByteString
是一个AnsiString
,默认情况下没有设置代码页。当您将另一个
string
分配给此RawByteString
变量时,您将复制源string
的代码页。这将包括转换。对不起。但是
RawByteString
还有另一种用途,即存储纯字节内容(例如数据库BLOB字段内容,就像字节数组
)总结一下:
RawByteString
应该用作方法或函数的“代码页不可知”参数;RawByteString
可以用作变量类型来存储一些 BLOB 数据。如果您想减少转换,并且宁愿在应用程序中使用 8 位字符
string
,则最好:AnsiString
类型,这将取决于当前系统代码页,您将通过它丢失数据;UnicodeString
转换时不会丢失任何数据;这正是我们为框架所做的。我们希望在其内核中使用 UTF-8,因为:
WideString
不是一个选择,因为它非常慢,而且您已经得到了隐式转换的同样问题。但是,为了达到最佳速度,我们编写了一些优化函数来处理自定义字符串类型:
并且我们保留了
RawByteString
类型来处理 BLOB 数据:源代码可用 在我们的存储库中。在本单元中,对UTF-8相关功能进行了深度优化,同时提供了pascal和asm版本,以获得更好的速度。我们有时会重载默认函数(例如
Pos
)以避免转换,或者有关我们如何在框架中处理文本的更多信息是 可在此处获取。最后一句话:
如果您确定您的应用程序中只有 7 位内容(无重音字符),则可以使用默认的
AnsiString
输入您的程序。但在这种情况下,您最好在uses
子句中添加AnsiStrings
单元,以具有重载的字符串函数,从而避免大多数不需要的转换。RawByteString
is anAnsiString
with no code page set by default.When you assign another
string
to thisRawByteString
variable, you'll copy the code page of the sourcestring
. And this will include a conversion. Sorry.But there is one another use of
RawByteString
, which is to store plain byte content (e.g. a database BLOB field content, just like anarray of byte
)To summarize:
RawByteString
should be used as a "code page agnostic" parameter to a method or function;RawByteString
can be used as a variable type to store some BLOB data.If you want to reduce conversion, and would rather use 8 bit char
string
in your application, you should better:AnsiString
type, which will depend on the current system code page, and by which you'll loose data;UnicodeString
;That exactly what we made for our framework. We wanted to use UTF-8 in its kernel because:
WideString
was not an option because it's dead slow and you've got the same problem of implicit conversions.But, in order to achieve best speed, we write some optimized functions to handle our custom string type:
And we reserved the
RawByteString
type for handling BLOB data:Source code is available in our repository. In this unit, UTF-8 related functions were deeply optimized, with both version in pascal and asm for better speed. We sometimes overloaded default functions (like
Pos
) to avoid conversion, or More information about how we handled text in the framework is available here.Last word:
If you are sure that you will only have 7 bit content in your application (no accentuated characters), you may use the default
AnsiString
type in your program. But in this case, you should better add theAnsiStrings
unit in youruses
clause to have overloaded string functions which will avoid most unwanted conversion.RawByteString仍然是一个“AnsiString”。最好将其描述为“通用接收器”,这意味着它将采用分配时源字符串的代码页,而无需强制进行代码页转换。 RawByteString 的目的是仅用作函数参数,以便您在调用采用 AnsiStrings 的实用程序函数时,不会在具有不同代码页亲和性的 AnsiStrings 之间发生转换。
然而,在上面的例子中,您将本质上是 AnsiString 的内容分配给 UnicodeString,这将导致转换。它必须进行转换,因为 RawByteString 具有基于 8 位的字符的有效负载,而字符串 (UnicodeString) 具有基于 16 位的字符的有效负载。
RawByteString is still an "AnsiString." It is best described as a "universal receiver" which means it will take on whatever the source-string's codepage is at the point of assignment without forcing a codepage conversion. RawByteString was intended to be used only as a function parameter so that you will, as you've discovered, not incur a conversion between AnsiStrings with differing code-page affinities when calling utility functions which take AnsiStrings.
However, in the case above, you're assigning what is essentially an AnsiString to a UnicodeString which will incur a conversion. It must do a conversion because the RawByteString has a payload of 8bit-based characters, whereas a string (UnicodeString) has a payload of 16bit-based characters.