提高 System.String 到 std::wstring 转换的性能?
我目前正在评估 ADO.NET 在当前使用普通旧式 ADO 的 C++ 应用程序中的使用情况。鉴于我们正在重做整个数据库交互,我们希望确定使用更现代、更积极开发的 ADO.NET 技术是否有益。
经过一些测量后发现,对于某些检索大量行和少数列(全部包含字符串)的测试查询,ADO.NET 实际上比使用普通 ADO 慢 20% 左右。我们的探查器表明,将 System.String 结果转换为应用程序使用的 std::wstring 是瓶颈之一。我无法将应用程序的任何上层切换为使用 System.String,因此我们陷入了这种特定的转换。
代码的大致轮廓如下:
System::Data::SqlClient::SqlCommand^ sqlCmd =
gcnew System::Data::SqlClient::SqlCommand(cmd, m_DBConnection.get());
System::Data::SqlClient::SqlDataReader^ reader = sqlCmd->ExecuteReader();
if (reader->HasRows)
{
using namespace msclr::interop;
while (reader->Read())
{
std::vector<std::wstring> results;
for (int i=0; i < reader->FieldCount; ++i)
{
std::wstring col_data;
TypeCode type = Type::GetTypeCode(reader->GetFieldType(i));
switch (type)
{
// ... omit lots of different types
case TypeCode::String:
{
System::String^ tmp = reader->GetString(i);
col_data = marshal_as<std::wstring>(tmp);
}
break;
// ... more type conversion code removed
}
results.push_back(col_data);
}
// NOTE: Callback into native result processing code
ResultsCallback(results);
}
我花了很多时间阅读从 System.String 中获取 std::wstring
的各种方法,并测量了其中的大部分方法。它们的表现似乎都大致相似——我们谈论的是 CPU 使用百分比的小数点。最后,我简单地选择使用 marshal_as
因为它是最具可读性的,并且看起来与其他解决方案一样高效(即使用 PtrToStringChars
或MSDN 此处中描述的方法) 。
从概念的角度来看,使用 DataReader 效果非常好,因为我们对数据所做的大部分处理都是面向行的。
我注意到的唯一另一个稍微出乎意料的瓶颈是结果列的 TypeCode
检索;我已经计划将其移到主结果处理循环之外,并且每个查询结果仅检索一次类型代码。
经过如此冗长的介绍,任何人都可以推荐一种成本较低的方法将字符串数据从 System.String
转换为 std::wstring
或者我已经在寻找最佳方案这里的表现?鉴于我已经尝试过所有普通的方法,我显然更寻找稍微不寻常的方法...
编辑:看起来我在这里陷入了自己制造的陷阱。是的,上面的代码比调试模式下的等效纯 ADO 代码慢大约 20%。然而,将其切换到“Release”模式时,瓶颈仍然是可测量的,但上面的 ADO.NET 代码突然比旧的 ADO 代码快了近 50%。因此,虽然我仍然有点担心字符串转换的成本,但它在发布模式下并不像第一次出现时那么大。
I'm currently evaluating the use of ADO.NET for a C++ application that currently uses plain old ADO. Given that we're redoing the whole database interaction, we'd like to determine if using the more modern and actively developed technology of ADO.NET would be beneficial.
After some measurements it appears that for certain test queries that retrieve a lot of rows with few columns that all contain strings, ADO.NET is actually about 20% slower for us than using plain ADO. Our profiler suggests that the conversion of System.String results into the std::wstring used by the application is one of the bottlenecks. I can't switch any of the upper layers of the application to using System.String, so we are stuck with this particular conversion.
A rough outline of the code looks like this:
System::Data::SqlClient::SqlCommand^ sqlCmd =
gcnew System::Data::SqlClient::SqlCommand(cmd, m_DBConnection.get());
System::Data::SqlClient::SqlDataReader^ reader = sqlCmd->ExecuteReader();
if (reader->HasRows)
{
using namespace msclr::interop;
while (reader->Read())
{
std::vector<std::wstring> results;
for (int i=0; i < reader->FieldCount; ++i)
{
std::wstring col_data;
TypeCode type = Type::GetTypeCode(reader->GetFieldType(i));
switch (type)
{
// ... omit lots of different types
case TypeCode::String:
{
System::String^ tmp = reader->GetString(i);
col_data = marshal_as<std::wstring>(tmp);
}
break;
// ... more type conversion code removed
}
results.push_back(col_data);
}
// NOTE: Callback into native result processing code
ResultsCallback(results);
}
I've spent a lot of time reading up on the various ways of getting a std::wstring
out of the System.String and measured most of them. They all seem to perform roughly similar - we're talking decimal points in the percentage of CPU usage. In the end I simply settled for using marshal_as<std::wstring>
as it's the most readable and appears to be as performant as the other solutions (ie, using PtrToStringChars
or the method described in MSDN here).
Using the DataReader works very well from a conceptual point of view as most of the processing we do on the data is row oriented anyway.
The only other slightly unexpected bottleneck I noticed is the retrieval of the TypeCode
for the results columns; I'm already planning to move that outside the main results processing loop and only retrieve the type codes once per query result.
After this lengthy introduction, can anybody recommend a less costly way to convert the string data from a System.String
to a std::wstring
or am I already looking at the optimum performance here? I'm obviously more looking for slightly out of the ordinary ways given that I've already tried all the ordinary ones...
EDIT: Looks like I fell into a trap of my own making here. Yes, the code above is about 20% slower than the equivalent plain ADO code in Debug mode. However switching it into Release mode, the bottleneck is still measurable but the ADO.NET code above is suddenly almost 50% faster than the older ADO code. So while I'm still concerned a little about the cost of the string conversion, it's not as big in Release mode as it first appeared.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不认为有任何方法可以优化它,因为
marshal_as
的实现只是获取内部 C 字符串并将其分配给std::wstring< /代码>。没有比这更高效的了。
我能看到的唯一解决方案是拆分行并让 N 线程并行处理它们。唯一的问题是,您需要在矢量中保留足够的空间,以防止在处理过程中调整大小,但这看起来很简单。
如果您使用的是 Visual Studio 2010,我认为 C++0x 线程库足以完成此任务,尽管我不确定到目前为止 Visual Studio 中实现了多少(如果有)。
I don't see there being any way to optimize that, since the implementation of
marshal_as<std::wstring>
just grabs the internal C string and assigns it to anstd::wstring
. You can't get much more efficient than that.The only solution I can see is splitting up your rows and having N threads process them in parallel. The only issue is that you would need to reserve enough space in your
vector
to prevent a resize from taking place during processing, but that looks easy enough.If you're using Visual Studio 2010, I think the C++0x threading library would be sufficient for this task, though I'm not sure how much (if any) is implemented in Visual Studio so far.