boost::lexical_cast 性能非常差
Windows XP SP3。 酷睿 2 双核 2.0 GHz。 我发现 boost::lexical_cast 性能非常慢。 想找出加快代码速度的方法。 在 Visual C++ 2008 上使用 /O2 优化并与 java 1.6 和 python 2.6.2 进行比较,我看到以下结果。
整数转换:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(i);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
s = new Integer(i).toString();
}
python:
for i in xrange(1,10000000):
s = str(i)
我看到的时间是
c++: 6700 毫秒
java: 1178 毫秒
python: 6702 毫秒
c++ 和 python 一样慢,比 java 慢 6 倍。
双重转换:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(d);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
double d = i*1.0;
s = new Double(d).toString();
}
python:
for i in xrange(1,10000000):
d = i*1.0
s = str(d)
我看到的时间是
c++: 56129 毫秒
java: 2852 毫秒
python: 30780 毫秒
所以对于双打,c++ 实际上是 python 速度的一半,比 java 解决方案慢 20 倍! 。 关于提高 boost::lexical_cast 性能有什么想法吗? 这是否源于糟糕的 stringstream 实现,或者我们是否可以预期使用 boost 库导致性能普遍下降 10 倍。
Windows XP SP3. Core 2 Duo 2.0 GHz.
I'm finding the boost::lexical_cast performance to be extremely slow. Wanted to find out ways to speed up the code. Using /O2 optimizations on visual c++ 2008 and comparing with java 1.6 and python 2.6.2 I see the following results.
Integer casting:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(i);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
s = new Integer(i).toString();
}
python:
for i in xrange(1,10000000):
s = str(i)
The times I'm seeing are
c++: 6700 milliseconds
java: 1178 milliseconds
python: 6702 milliseconds
c++ is as slow as python and 6 times slower than java.
Double casting:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(d);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
double d = i*1.0;
s = new Double(d).toString();
}
python:
for i in xrange(1,10000000):
d = i*1.0
s = str(d)
The times I'm seeing are
c++: 56129 milliseconds
java: 2852 milliseconds
python: 30780 milliseconds
So for doubles c++ is actually half the speed of python and 20 times slower than the java solution!!. Any ideas on improving the boost::lexical_cast performance? Does this stem from the poor stringstream implementation or can we expect a general 10x decrease in performance from using the boost libraries.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
编辑 2012-04-11
rve 非常正确地评论了 lexical_cast 的性能,提供了一个链接:
http://www.boost.org/doc/libs/1_49_0/doc/html/boost_lexical_cast /performance.html
我现在无法访问 boost 1.49,但我确实记得在旧版本上使我的代码更快。 所以我猜:
原始答案
只是添加Barry 和 Motti 的精彩答案的信息:
一些背景
请记住 Boost 是由这个星球上最好的 C++ 开发人员编写的,并由同样最好的开发人员进行审查。 如果 lexical_cast 如此错误,那么有人就会通过批评或代码来攻击该库。
我想您错过了 lexical_cast 的真正价值……
比较苹果和橙子。
在 Java 中,您将整数转换为 Java 字符串。 您会注意到我不是在谈论字符数组或用户定义的字符串。 您还会注意到,我不是在谈论您的用户定义的整数。 我说的是严格的 Java 整数和严格的 Java 字符串。
在 Python 中,你或多或少也在做同样的事情。
正如其他帖子所述,本质上,您正在使用
sprintf
的 Java 和 Python 等效项(或不太标准的itoa
)。在 C++ 中,您使用的是非常强大的强制转换。 在原始速度性能方面并不强大(如果您想要速度,也许 sprintf 会更适合),但在可扩展性方面强大。
比较苹果。
如果您想要比较 Java
Integer.toString
方法,那么您应该将其与 Csprintf
或 C++ostream
工具进行比较。C++ 流解决方案(在我的 g++ 上)比 lexical_cast 快 6 倍,并且可扩展性相当差:
C
sprintf
解决方案将快 8 倍(在我的 g++ 上)比 lexical_cast 但安全性差很多:这两种解决方案要么与您的 Java 解决方案一样快,要么比您的 Java 解决方案更快(根据您的数据)。
比较橙子。
如果您想比较 C++
lexical_cast
,那么您应该将其与以下 Java 伪代码进行比较:源和目标可以是您想要的任何类型,包括像
boolean
这样的内置类型> 或int
,这在 C++ 中是可能的,因为有模板。可扩展性? 这是脏话吗?
不,但它有一个众所周知的成本:当由同一编码器编写时,特定问题的通用解决方案通常比为其特定问题编写的特定解决方案慢。
在当前情况下,从天真的角度来看,
lexical_cast
将使用流设施将类型A
转换为字符串流,然后从该字符串流转换为类型B
。这意味着只要您的对象可以输出到流中,并从流中输入,您就可以在其上使用 lexical_cast ,而无需触及任何一行代码。
那么,lexical_cast 有什么用途呢?
词法转换的主要用途是:
第 2 点在这里非常非常重要,因为这意味着我们有一个且只有一个接口/函数来将类型的值转换为相等或相似的值另一种类型的值。
这是您真正错过的一点,也是影响性能的一点。
但它太慢了!
如果您想要原始速度性能,请记住您正在处理 C++,并且您有很多工具可以有效地处理转换,并且仍然保持
lexical_cast
易于使用的功能。我花了几分钟时间查看 lexical_cast 源代码,并提出了一个可行的解决方案。 将以下代码添加到您的 C++ 代码中:
通过为字符串和整数启用 lexical_cast 的这种特殊化(通过定义宏
SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT
),我的代码在我的 g++ 编译器上运行速度提高了 5 倍,这意味着,根据你的数据,它的性能应该与Java类似。我花了 10 分钟查看 boost 代码,并编写了一个远程高效且正确的 32 位版本。 通过一些工作,它可能会变得更快、更安全(例如,如果我们可以直接写入 std::string 内部缓冲区,我们就可以避免使用临时外部缓冲区)。
Edit 2012-04-11
rve quite rightly commented about lexical_cast's performance, providing a link:
http://www.boost.org/doc/libs/1_49_0/doc/html/boost_lexical_cast/performance.html
I don't have access right now to boost 1.49, but I do remember making my code faster on an older version. So I guess:
Original answer
Just to add info on Barry's and Motti's excellent answers:
Some background
Please remember Boost is written by the best C++ developers on this planet, and reviewed by the same best developers. If
lexical_cast
was so wrong, someone would have hacked the library either with criticism or with code.I guess you missed the point of
lexical_cast
's real value...Comparing apples and oranges.
In Java, you are casting an integer into a Java String. You'll note I'm not talking about an array of characters, or a user defined string. You'll note, too, I'm not talking about your user-defined integer. I'm talking about strict Java Integer and strict Java String.
In Python, you are more or less doing the same.
As said by other posts, you are, in essence, using the Java and Python equivalents of
sprintf
(or the less standarditoa
).In C++, you are using a very powerful cast. Not powerful in the sense of raw speed performance (if you want speed, perhaps
sprintf
would be better suited), but powerful in the sense of extensibility.Comparing apples.
If you want to compare a Java
Integer.toString
method, then you should compare it with either Csprintf
or C++ostream
facilities.The C++ stream solution would be 6 times faster (on my g++) than
lexical_cast
, and quite less extensible:The C
sprintf
solution would be 8 times faster (on my g++) thanlexical_cast
but a lot less safe:Both solutions are either as fast or faster than your Java solution (according to your data).
Comparing oranges.
If you want to compare a C++
lexical_cast
, then you should compare it with this Java pseudo code:Source and Target being of whatever type you want, including built-in types like
boolean
orint
, which is possible in C++ because of templates.Extensibility? Is that a dirty word?
No, but it has a well known cost: When written by the same coder, general solutions to specific problems are usually slower than specific solutions written for their specific problems.
In the current case, in a naive viewpoint,
lexical_cast
will use the stream facilities to convert from a typeA
into a string stream, and then from this string stream into a typeB
.This means that as long as your object can be output into a stream, and input from a stream, you'll be able to use
lexical_cast
on it, without touching any single line of code.So, what are the uses of
lexical_cast
?The main uses of lexical casting are:
The point 2 is very very important here, because it means we have one and only one interface/function to cast a value of a type into an equal or similar value of another type.
This is the real point you missed, and this is the point that costs in performance terms.
But it's so slooooooowwww!
If you want raw speed performance, remember you're dealing with C++, and that you have a lot of facilities to handle conversion efficiently, and still, keep the
lexical_cast
ease-of-use feature.It took me some minutes to look at the lexical_cast source, and come with a viable solution. Add to your C++ code the following code:
By enabling this specialization of lexical_cast for strings and ints (by defining the macro
SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT
), my code went 5 time faster on my g++ compiler, which means, according to your data, its performance should be similar to Java's.And it took me 10 minutes of looking at boost code, and write a remotely efficient and correct 32-bit version. And with some work, it could probably go faster and safer (if we had direct write access to the
std::string
internal buffer, we could avoid a temporary external buffer, for example).您可以将
lexical_cast
专门用于int
和double
类型。 在您的专业领域中使用strtod
和strtol
。此变体比使用默认实现更快,因为在默认实现中存在大量流对象的构造。 它应该比 printf 快一点,因为 printf 应该解析格式字符串。
You could specialize
lexical_cast
forint
anddouble
types. Usestrtod
andstrtol
in your's specializations.This variant will be faster than using default implementation, because in default implementation there is construction of heavy stream objects. And it is should be little faster than
printf
, becauseprintf
should parse format string.lexical_cast
比您在 Java 和 Python 中使用的特定代码更通用。 毫不奇怪,适用于许多场景的通用方法(词法转换只不过是从临时流中流出然后返回到临时流)最终比特定例程慢。(顺便说一句,使用静态版本
Integer.toString(int)
,您可能会获得更好的 Java 性能。[1])最后,字符串解析和反解析通常对性能不那么敏感,除非正在编写一个编译器,在这种情况下 lexical_cast 可能过于通用,并且将在扫描每个数字时计算整数等。
[1] 评论者“stepancheg”对我关于静态版本可能提供更好性能的暗示表示怀疑。 这是我使用的源:
运行时,使用 JDK 1.6.0-14,服务器虚拟机:
在客户端虚拟机中:
尽管从理论上讲,转义分析可能允许在堆栈上进行分配,并且内联可能会将所有代码(包括复制)引入到本地方法,允许消除冗余复制,这种分析可能需要相当多的时间并导致相当多的代码空间,这在代码缓存中具有其他成本,而这些成本在实际代码中并不合理,这与微基准测试相反看到这里。
lexical_cast
is more general than the specific code you're using in Java and Python. It's not surprising that a general approach that works in many scenarios (lexical cast is little more than streaming out then back in to and from a temporary stream) ends up being slower than specific routines.(BTW, you may get better performance out of Java using the static version,
Integer.toString(int)
. [1])Finally, string parsing and deparsing is usually not that performance-sensitive, unless one is writing a compiler, in which case
lexical_cast
is probably too general-purpose, and integers etc. will be calculated as each digit is scanned.[1] Commenter "stepancheg" doubted my hint that the static version may give better performance. Here's the source I used:
The runtimes, using JDK 1.6.0-14, server VM:
And in client VM:
Even though theoretically, escape analysis may permit allocation on the stack, and inlining may introduce all code (including copying) into the local method, permitting elimination of redundant copying, such analysis may take quite a lot of time and result in quite a bit of code space, which has other costs in code cache that don't justify themselves in real code, as opposed to microbenchmarks like seen here.
词法转换在代码中所做的事情可以简化为:
不幸的是,每次调用 Cast() 时都会发生很多事情:
内存在您自己的代码中
:分配涉及进一步的分配和释放。 您可以通过使用:
来稍微减少这一点。
但是,如果性能对您来说确实很重要,您应该考虑使用不同的机制。 您可以编写自己的 Cast() 版本,它(例如)创建静态字符串流。 这样的版本不是线程安全的,但这对于您的特定需求可能并不重要。
总而言之,lexical_cast 是一个方便且有用的功能,但这种便利伴随着(它总是必须的)其他方面的权衡。
What lexical cast is doing in your code can be simplified to this:
There is unfortunately a lot going on every time you call Cast():
Thn in your own code:
the assignment involves further allocations and deallocations are performed. You may be able to reduce this slightly by using:
instead.
However, if performance is really importanrt to you, you should considerv using a different mechanism. You could write your own version of Cast() which (for example) creates a static stringstream. Such a version would not be thread safe, but that might not matter for your specific needs.
To summarise, lexical_cast is a convenient and useful feature, but such convenience comes (as it always must) with trade-offs in other areas.
不幸的是,我还没有足够的代表来发表评论...
lexical_cast
主要不是慢,因为它是通用的(模板查找发生在编译时,因此虚拟函数调用或其他查找/取消引用不会必要的)。 在我看来,lexical_cast
很慢,因为它基于 C++ iostream 构建,而 C++ iostream 主要用于流操作而不是单一转换,而且lexical_cast
必须检查并转换iostream 错误信号。 因此:sprintf
无法安全地处理缓冲区溢出)lexical_cast
必须检查stringstream
错误 (ss.fail()< /code>)以便在转换失败时抛出异常
lexical_cast
很好,因为(IMO)异常允许捕获所有错误而无需额外的努力,并且因为它具有统一的原型。 我个人不明白为什么这些属性中的任何一个都需要缓慢的操作(当没有发生转换错误时),尽管我不知道这样的 C++ 函数是快速的(可能是 Spirit 或 boost::xpressive?)。编辑:我刚刚发现一条消息提到使用
BOOST_LEXICAL_CAST_ASSUME_C_LOCALE
来启用“itoa”优化:http://old.nabble.com/lexical_cast-optimization-td20817583.html。 还有一篇链接的文章,其中包含更多详细信息。Unfortunately I don't have enough rep yet to comment...
lexical_cast
is not primarily slow because it's generic (template lookups happen at compile-time, so virtual function calls or other lookups/dereferences aren't necessary).lexical_cast
is, in my opinion, slow, because it builds on C++ iostreams, which are primarily intended for streaming operations and not single conversions, and becauselexical_cast
must check for and convert iostream error signals. Thus:sprintf
does, thoughsprintf
won't safely handle buffer overruns)lexical_cast
has to check forstringstream
errors (ss.fail()
) in order to throw exceptions on conversion failureslexical_cast
is nice because (IMO) exceptions allow trapping all errors without extra effort and because it has a uniform prototype. I don't personally see why either of these properties necessitate slow operation (when no conversion errors occur), though I don't know of such C++ functions which are fast (possibly Spirit or boost::xpressive?).Edit: I just found a message mentioning the use of
BOOST_LEXICAL_CAST_ASSUME_C_LOCALE
to enable an "itoa" optimisation: http://old.nabble.com/lexical_cast-optimization-td20817583.html. There's also a linked article with a bit more detail.与 Java 和 Python 相比,lexical_cast 可能会或可能不会像您的基准测试所表明的那样慢,因为您的基准测试可能存在一个微妙的问题。 由词法转换或其使用的 iostream 方法完成的任何工作区分配/解除分配均由您的基准测试来衡量,因为 C++ 不会推迟这些操作。 然而,就 Java 和 Python 而言,相关的释放实际上可能只是被推迟到未来的垃圾收集周期,并且被基准测试错过了。 (除非在基准测试过程中偶然发生 GC 循环,在这种情况下您会测量太多)。 因此,如果不检查 Java 和 Python 实现的细节,很难确定有多少“成本”应该归因于最终可能(或可能不会)施加的延迟 GC 负担。
这种问题显然可能适用于许多其他 C++ 与垃圾收集语言基准测试。
lexical_cast may or may not be as slow in relation to Java and Python as your bencharks indicate because your benchmark measurements may have a subtle problem. Any workspace allocations/deallocations done by lexical cast or the iostream methods it uses are measured by your benchmarks because C++ doesn't defer these operations. However, in the case of Java and Python, the associated deallocations may in fact have simply been deferred to a future garbage collection cycle and missed by the benchmark measurements. (Unless a GC cycle by chance occurs while the benchmark is in progress and in that case you'd be measuring too much). So it's hard to know for sure without examining specifics of the Java and Python implementations how much "cost" should be attributed to the deferred GC burden that may (or may not) be eventually imposed.
This kind of issue obviously may apply to many other C++ vs garbage collected language benchmarks.
正如 Barry 所说, lexical_cast 非常通用,您应该使用更具体的替代方案,例如查看 itoa (
int->string
) 和 atoi(字符串 -> int
)。As Barry said,
lexical_cast
is very general, you should use a more specific alternative, for example check out itoa (int->string
) and atoi (string -> int
).如果速度是一个问题,或者您只是对 C++ 的此类转换有多快感兴趣,那么有一个感兴趣的 线程 关于它。
Boost.Spirit 2.1(将与 Boost 1.40 一起发布)似乎非常快,甚至比 C 等效项(strtol()、atoi() 等)还要快。
if speed is a concern, or you are just interested in how fast such casts can be with C++, there's an interested thread regarding it.
Boost.Spirit 2.1(which is to be released with Boost 1.40) seems to be very fast, even faster than the C equivalents(strtol(), atoi() etc. ).
我对 POD 类型使用这个非常快速的解决方案......
I use this very fast solution for POD types...