.NET System::String 转 UTF8 字节存储在 char* 中

发布于 2024-11-18 17:27:23 字数 1337 浏览 2 评论 0原文

我正在 .NET 项目中包装一些非托管 C++ 代码。为此，我需要将 System::String 转换为存储在 char* 中的 UTF8 字节。

我不确定这是否是最好的甚至是正确的方法，如果有人可以看一下并提供反馈，我将不胜感激。

谢谢，

/大卫

// Copy into blank VisualStudio C++/CLR command line solution.
#include "stdafx.h"
#include <stdio.h>

using namespace System;
using namespace System::Text;
using namespace System::Runtime::InteropServices;

// Test for calling with char* argument.
void MyTest(const char* buffer)
{
    printf_s("%s\n", buffer);
    return;
}

int main()
{

   // Create a UTF-8 encoding.
   UTF8Encoding^ utf8 = gcnew UTF8Encoding;

   // A Unicode string with two characters outside an 8-bit code range.
   String^ unicodeString = L"This unicode string contains two characters with codes outside an 8-bit code range, Pi (\u03a0) and Sigma (\u03a3).";
   Console::WriteLine(unicodeString);

   // Encode the string.
   array<Byte>^encodedBytes = utf8->GetBytes(unicodeString);

   // Get pointer to unmanaged char array
   int size = Marshal::SizeOf(encodedBytes[0]) * encodedBytes->Length;
   IntPtr pnt = Marshal::AllocHGlobal(size);
   Marshal::Copy(encodedBytes, 0, pnt, encodedBytes->Length);

   // Ugly, but necessary?
   char *charPnt= (char *)pnt.ToPointer();
   MyTest(charPnt);
   Marshal::FreeHGlobal(pnt);

}

原文

I am wrapping some unmanaged C++ code inside a .NET project. For this I need to convert System::String to UTF8-bytes stored in char*.

I am unsure if this is the best or even a correct way to do this and I'd appreciate if someone could take a look and provide feedback.

Thanks,

/David

// Copy into blank VisualStudio C++/CLR command line solution.
#include "stdafx.h"
#include <stdio.h>

using namespace System;
using namespace System::Text;
using namespace System::Runtime::InteropServices;

// Test for calling with char* argument.
void MyTest(const char* buffer)
{
    printf_s("%s\n", buffer);
    return;
}

int main()
{

   // Create a UTF-8 encoding.
   UTF8Encoding^ utf8 = gcnew UTF8Encoding;

   // A Unicode string with two characters outside an 8-bit code range.
   String^ unicodeString = L"This unicode string contains two characters with codes outside an 8-bit code range, Pi (\u03a0) and Sigma (\u03a3).";
   Console::WriteLine(unicodeString);

   // Encode the string.
   array<Byte>^encodedBytes = utf8->GetBytes(unicodeString);

   // Get pointer to unmanaged char array
   int size = Marshal::SizeOf(encodedBytes[0]) * encodedBytes->Length;
   IntPtr pnt = Marshal::AllocHGlobal(size);
   Marshal::Copy(encodedBytes, 0, pnt, encodedBytes->Length);

   // Ugly, but necessary?
   char *charPnt= (char *)pnt.ToPointer();
   MyTest(charPnt);
   Marshal::FreeHGlobal(pnt);

}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

疏忽 2024-11-25 17:27:23

您不需要创建编码器实例，您可以使用静态实例。
如果被调用函数不需要指向 HGlobal 堆的指针，则可以对缓冲区使用普通 C/C++ 内存分配（new 或 malloc）。
在您的示例中，该函数不取得所有权，因此您根本不需要副本，只需固定缓冲区即可。

例如：

// Encode the text as UTF8
array<Byte>^ encodedBytes = Encoding::UTF8->GetBytes(unicodeString);

// prevent GC moving the bytes around while this variable is on the stack
pin_ptr<Byte> pinnedBytes = &encodedBytes[0];

// Call the function, typecast from byte* -> char* is required
MyTest(reinterpret_cast<char*>(pinnedBytes), encodedBytes->Length);

或者，如果您需要像大多数 C 函数一样以零结尾的字符串（包括 OP 中的示例），那么您可能应该添加一个零字节。

// Encode the text as UTF8, making sure the array is zero terminated
array<Byte>^ encodedBytes = Encoding::UTF8->GetBytes(unicodeString + "\0");

// prevent GC moving the bytes around while this variable is on the stack
pin_ptr<Byte> pinnedBytes = &encodedBytes[0];

// Call the function, typecast from byte* -> char* is required
MyTest(reinterpret_cast<char*>(pinnedBytes));

You don't need to create an encoder instance, you can use the static instances.
If the called function doesn't expect a pointer to the HGlobal heap you can just use plain C/C++ memory allocation (new or malloc) for the buffer.
In your example the function doesn't take ownership so you don't need a copy at all, just pin the buffer.

Something like:

// Encode the text as UTF8
array<Byte>^ encodedBytes = Encoding::UTF8->GetBytes(unicodeString);

// prevent GC moving the bytes around while this variable is on the stack
pin_ptr<Byte> pinnedBytes = &encodedBytes[0];

// Call the function, typecast from byte* -> char* is required
MyTest(reinterpret_cast<char*>(pinnedBytes), encodedBytes->Length);

Or if you need the string zero-terminated like most C functions (including the example in the OP) then you should probably add a zero byte.

// Encode the text as UTF8, making sure the array is zero terminated
array<Byte>^ encodedBytes = Encoding::UTF8->GetBytes(unicodeString + "\0");

// prevent GC moving the bytes around while this variable is on the stack
pin_ptr<Byte> pinnedBytes = &encodedBytes[0];

// Call the function, typecast from byte* -> char* is required
MyTest(reinterpret_cast<char*>(pinnedBytes));

回复收藏 0 原文

~没有更多了~