Ruby-FFI (ruby 1.8)：读取 UTF-16LE 编码的字符串

发布于 2025-01-06 08:52:23 字数 539 浏览 2 评论 0原文

我正在 Ruby 1.8 上使用 Ruby-FFI 来包装一个使用 UTF-16LE 字符串的库。该库有一个返回此类字符串的 C 函数。

是否用 read_string 包装函数

attach_function [:getVersion, [], :pointer]

并在返回的指针上调用 read_string，或者是否用 read_string 包装函数

attach_function [:getVersion, [], :string]

我返回的只是第一个字符，因为第二个字符为 null (\000< /code>)，结果，FFI 停止读取该字符串，显然是因为它假设它正在处理正常的、以单 null 结尾的字符串。

我是否需要做一些事情，也许是在 Ruby 程序或 FFI 或其他程序的初始化中，让它知道我希望字符串采用 UTF-16LE 编码？我还能怎样解决这个问题？

原文

I am working with Ruby-FFI on Ruby 1.8 to wrap a library that uses UTF-16LE strings. The library has a C function that returns such a String.

Whether I wrap the function with

attach_function [:getVersion, [], :pointer]

and call read_string on the returned pointer, or whether I wrap it with

attach_function [:getVersion, [], :string]

What I get back is only the first character, because the second character is null (\000) and as a result, FFI stops reading the string there, obviously because it assumes it is dealing with a normal, single-null terminated string.

Is there something I need to do, perhaps in initialization of my Ruby program or FFI or otherwise, to make it know that I expect strings to be UTF-16LE encoded? How else can I get around this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拧巴小姐 2025-01-13 08:52:23

好的，这是我迄今为止的（不优雅的）解决方法。它涉及向 FFI::Pointer 添加一个方法。在我的库的上下文中调用应该是安全的，因为所有字符串都应该是 UTF-16LE 编码的，但否则，它可能不好，因为它可能永远不会遇到双空，并且只会继续读取过去的内容字符串在内存中的边界。

module FFI
  class Pointer

    # Read string until we encounter a double-null terminator
    def read_string_dn
      cont_nullcount = 0
      offset = 0
      # Determine the offset in memory of the expected double-null
      until cont_nullcount == 2
        byte = get_bytes(offset,1)
        cont_nullcount += 1 if byte == "\000"
        cont_nullcount = 0 if byte != "\000"
        offset += 1
      end
      # Return string with calculated length (offset) including terminator
      get_bytes(0,offset+1)
    end

  end

end

OK, this is the (inelegant) workaround I have so far. It involves adding a method to FFI::Pointer. It should be safe to call in the context of my library, because all strings are supposed to be UTF-16LE encoded, but otherwise, it may not be good, because it might never encounter a double null and would just carry on reading past the the bounds of the string in memory.

module FFI
  class Pointer

    # Read string until we encounter a double-null terminator
    def read_string_dn
      cont_nullcount = 0
      offset = 0
      # Determine the offset in memory of the expected double-null
      until cont_nullcount == 2
        byte = get_bytes(offset,1)
        cont_nullcount += 1 if byte == "\000"
        cont_nullcount = 0 if byte != "\000"
        offset += 1
      end
      # Return string with calculated length (offset) including terminator
      get_bytes(0,offset+1)
    end

  end

end

回复收藏 0 原文

不必在意 2025-01-13 08:52:23

基于相同想法的更优雅的解决方案。也处理编码。

module FFI
  class Pointer
    def read_wstring
      offset = 0
      while get_bytes(offset, 2) != "\x00\x00"
        offset += 2
      end
      get_bytes(0, offset).force_encoding('utf-16le').encode('utf-8')
    end
  end
end

More elegant solution based on the same idea. Does handle the encoding too.

module FFI
  class Pointer
    def read_wstring
      offset = 0
      while get_bytes(offset, 2) != "\x00\x00"
        offset += 2
      end
      get_bytes(0, offset).force_encoding('utf-16le').encode('utf-8')
    end
  end
end

回复收藏 0 原文

~没有更多了~