在托管 C 中从 BSTR 创建 System::String 对象- 这种方式是个好主意吗?

发布于 2024-08-21 00:38:30 字数 2094 浏览 9 评论 0原文

我的同事正在通过以下方法使用非托管库中的双字节字符填充 System::String 对象:

RFC_PARAMETER aux;
Object* target;
RFC_UNICODE_TYPE_ELEMENT* elm;
elm = &(m_coreObject->m_pStructMeta->m_typeElements[index]);
aux.name = NULL;
aux.nlen = 0;
aux.type = elm->type;
aux.leng = elm->c2_length;
aux.addr = m_coreObject->m_rfcWa + elm->c2_offset;

GlobalFunctions::CreateObjectForRFCField(target,aux,elm->decimals);
GlobalFunctions::ReadRFCField(target,aux,elm->decimals);

其中 GlobalFunctions::CreateObjectForRFCField 创建一个 System::String 对象,该对象填充了非托管库的空格(用于填充)规定最大长度应该是:

static void CreateObjectForRFCField(Object*& object, RFC_PARAMETER& par, unsigned dec)
{
    switch (par.type)
    {
        case TYPC:
            object = new String(' ',par.leng / sizeof(_TCHAR));
            break;
        // unimportant afterwards.
    }
}

并且 GlobalFunctions::ReadRFCField() 将数据从库复制到创建的 String 对象中并保留空间填充:

static void ReadRFCField(String* target, RFC_PARAMETER& par)
{
    int lngt;
    _TCHAR* srce;
    switch (par.type)
    {
        case TYPC:
        case TYPDATE:
        case TYPTIME:
        case TYPNUM:
            lngt = par.leng / sizeof(_TCHAR);
            srce = (_TCHAR*)par.addr;
            break;

        case RFCTYPE_STRING:
            lngt = (*(_TCHAR**)par.addr != NULL) ? (int)_tcslen(*(_TCHAR**)par.addr) : 0;
            srce = *(_TCHAR**)par.addr;
            break;

        default:
            throw new DotNet_Incomp_RFCType2;
    }

    if (lngt > target->Length) lngt = target->Length;

    GCHandle gh = GCHandle::Alloc(target,GCHandleType::Pinned);
    wchar_t* buff = reinterpret_cast<wchar_t*>(gh.AddrOfPinnedObject().ToPointer());
    _wcsnset(buff,' ',target->Length);
    _snwprintf(buff,lngt,_T2WFSP,srce);
    gh.Free();
}

现在,有时我们会看到在 _snwprintf 调用中抛出访问冲突。我的问题实际上是:创建一个填充到一定长度的字符串(理想情况是预先分配内部缓冲区),然后使用 GCHandle::Alloc 和上面的混乱修改字符串是否合适。

是的,知道 System::String 对象应该是不可变的 - 我正在寻找一个明确的“这是错误,这就是原因”。

谢谢, 伊莱.

My co-worker is filling a System::String object with double-byte characters from an unmanaged library by the following method:

RFC_PARAMETER aux;
Object* target;
RFC_UNICODE_TYPE_ELEMENT* elm;
elm = &(m_coreObject->m_pStructMeta->m_typeElements[index]);
aux.name = NULL;
aux.nlen = 0;
aux.type = elm->type;
aux.leng = elm->c2_length;
aux.addr = m_coreObject->m_rfcWa + elm->c2_offset;

GlobalFunctions::CreateObjectForRFCField(target,aux,elm->decimals);
GlobalFunctions::ReadRFCField(target,aux,elm->decimals);

Where GlobalFunctions::CreateObjectForRFCField creates a System::String object filled with spaces (for padding) to what the unmanaged library states the max length should be:

static void CreateObjectForRFCField(Object*& object, RFC_PARAMETER& par, unsigned dec)
{
    switch (par.type)
    {
        case TYPC:
            object = new String(' ',par.leng / sizeof(_TCHAR));
            break;
        // unimportant afterwards.
    }
}

And GlobalFunctions::ReadRFCField() copies the data from the library into the created String object and preserves the space padding:

static void ReadRFCField(String* target, RFC_PARAMETER& par)
{
    int lngt;
    _TCHAR* srce;
    switch (par.type)
    {
        case TYPC:
        case TYPDATE:
        case TYPTIME:
        case TYPNUM:
            lngt = par.leng / sizeof(_TCHAR);
            srce = (_TCHAR*)par.addr;
            break;

        case RFCTYPE_STRING:
            lngt = (*(_TCHAR**)par.addr != NULL) ? (int)_tcslen(*(_TCHAR**)par.addr) : 0;
            srce = *(_TCHAR**)par.addr;
            break;

        default:
            throw new DotNet_Incomp_RFCType2;
    }

    if (lngt > target->Length) lngt = target->Length;

    GCHandle gh = GCHandle::Alloc(target,GCHandleType::Pinned);
    wchar_t* buff = reinterpret_cast<wchar_t*>(gh.AddrOfPinnedObject().ToPointer());
    _wcsnset(buff,' ',target->Length);
    _snwprintf(buff,lngt,_T2WFSP,srce);
    gh.Free();
}

Now, on occasion, we see access violations getting thrown in the _snwprintf call. My question really is: Is it appropriate to create a string padded to a length (ideally to pre-allocate the internal buffer), and then to modify the String using GCHandle::Alloc and the mess above.

And yes, I know that System::String objects are supposed to be immutable - I'm looking for a definitive "This is WRONG and here is why".

Thanks,
Eli.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

醉殇 2024-08-28 00:38:31

实际上,问题不在于 .NET 字符串作为输出缓冲区,而在于输入缓冲区。

sprintf("%s") 类函数(包括 wsprintf 等)将对字符串上的任何参数执行 strlen 类型操作 - 即使它是 snwprintf - “n”部分仅限制写入字符串的数量,不是从输入缓冲区读取的字符。

事实证明,输入缓冲区永远不能保证以空终止。很多时候我们很幸运,因为如果返回的数据很小,它会在到达坏内存之前达到空值。

但是,如果其中的数据足够大,它将到达内存页的末尾。当 strlen 继续运行时,它会离开页面,并出现访问冲突城市!

幸运的是,我在使用附加的本机模式调试器测试其他东西时发现了这一点,并准备好来自 MS 的 C 运行时的所有调试符号!

我们将 snwprintf 切换为 wcsncpy() (当我们必须进行 ANSI->Unicode 转换时,snwprintf 是一个遗留操作 - 为什么他不做 MultiByteToWideChar() ,我永远不会知道。

谢谢你的建议好歹。

Actually, the issue was not with the .NET string as an output buffer, but with the input buffer instead.

The sprintf("%s") class functions (including wsprintf and so on) will perform a strlen-type operation on any parameters on the string - EVEN if it is snwprintf - the "n" part only limits the amount WRITTEN to the string, NOT the characters read from the input buffer.

Turns out the input buffer was never guaranteed to be null terminated. Often times we get lucky because if the data returned was small, it would hit SOMETHING null before reaching bad memory.

However, if the data in there is big enough, it would go to the end of the memory page. When the strlen keeps going, it walks off the page, and Access Violation city!

Luckily I found this when testing something else with a native mode debugger attached and ready with all the debug symbols from MS for the C runtime!

We switched the snwprintf to a wcsncpy() instead (the snwprintf was a legacy operation when we had to do ANSI->Unicode conversion - why he didn't do MultiByteToWideChar() instead, I'll never know.

thanks for the advice at any rate.

一影成城 2024-08-28 00:38:30

我很惊讶这似乎有效。如果我理解的话,您可以固定一个 String 对象,获取它的地址,然后将其转换为字符缓冲区。它不是字符缓冲区。 CLR 对象以 8 字节标头开头(无论如何都是 32 位)。您可能会在垃圾回收中丢弃 CLR 使用的内部数据。

为什么不分配一个本机缓冲区(std::vector 会很棒)来传递给本机 API,然后从该缓冲区安全地构造一个 CLR 字符串?

更新:

好的,这是一个参考:http://www.drdobbs.com /cpp/184403869

事实证明,所使用的固定 API 对 String 的布局有特殊的了解,并且知道如何查找并返回原始内部字符缓冲区。是啊!

但引用那篇文章:

最后一点很重要:在某些方面
这些例子,我展示了如何
pin_cast 可用于访问
托管字符串的私有数据缓冲区
和数组,可能是非常量
方式。鉴于这些是密封类型,
整个实施过程是
未知,假设你是坏人
可以安全地修改内容
这些缓冲区,即使内存是
已固定。

有趣的是,API 文档并没有更不用说字符串的特殊行为了。

I'm amazed this ever appears to work. If I understand it, you pin a String object, get the address of it, and then cast it to a buffer of characters. It's not a buffer of characters. CLR objects start with an 8-byte header (in 32-bit, anyway). You're probably trashing the internal data used by the CLR in garbage collection.

Why not allocate a native buffer (std::vector<wchar_t> would be great) to pass to the native API, and then safely construct a CLR String from that buffer?

Update:

Okay, here's a reference: http://www.drdobbs.com/cpp/184403869

Turns out that the pinning API being used has special knowledge of the layout of String, and knows how to find and return the raw internal character buffer. Yeesh!

But to quote that article:

One last important point: in some of
these examples, I’m showing how
pin_cast can be used to access the
private data buffer of managed Strings
and Arrays, possibly in a non-const
manner. Given these are sealed types,
and that the entire implementation is
unknown, it would be bad to assume you
could safely modify the contents of
these buffers, even if the memory is
pinned.

Interestingly, the API documentation doesn't mention the special behaviour for strings.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文