Ruby-FFI (ruby 1.8):读取 UTF-16LE 编码的字符串

发布于 2025-01-06 08:52:23 字数 539 浏览 2 评论 0原文

我正在 Ruby 1.8 上使用 Ruby-FFI 来包装一个使用 UTF-16LE 字符串的库。该库有一个返回此类字符串的 C 函数。

是否用 read_string 包装函数

attach_function [:getVersion, [], :pointer]

并在返回的指针上调用 read_string,或者是否用 read_string 包装函数

attach_function [:getVersion, [], :string]

我返回的只是第一个字符,因为第二个字符为 null (\000< /code>),结果,FFI 停止读取该字符串,显然是因为它假设它正在处理正常的、以单 null 结尾的字符串。

我是否需要做一些事情,也许是在 Ruby 程序或 FFI 或其他程序的初始化中,让它知道我希望字符串采用 UTF-16LE 编码?我还能怎样解决这个问题?

I am working with Ruby-FFI on Ruby 1.8 to wrap a library that uses UTF-16LE strings. The library has a C function that returns such a String.

Whether I wrap the function with

attach_function [:getVersion, [], :pointer]

and call read_string on the returned pointer, or whether I wrap it with

attach_function [:getVersion, [], :string]

What I get back is only the first character, because the second character is null (\000) and as a result, FFI stops reading the string there, obviously because it assumes it is dealing with a normal, single-null terminated string.

Is there something I need to do, perhaps in initialization of my Ruby program or FFI or otherwise, to make it know that I expect strings to be UTF-16LE encoded? How else can I get around this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

拧巴小姐 2025-01-13 08:52:23

好的,这是我迄今为止的(不优雅的)解决方法。它涉及向 FFI::Pointer 添加一个方法。在我的库的上下文中调用应该是安全的,因为所有字符串都应该是 UTF-16LE 编码的,但否则,它可能不好,因为它可能永远不会遇到双空,并且只会继续读取过去的内容字符串在内存中的边界。

module FFI
  class Pointer

    # Read string until we encounter a double-null terminator
    def read_string_dn
      cont_nullcount = 0
      offset = 0
      # Determine the offset in memory of the expected double-null
      until cont_nullcount == 2
        byte = get_bytes(offset,1)
        cont_nullcount += 1 if byte == "\000"
        cont_nullcount = 0 if byte != "\000"
        offset += 1
      end
      # Return string with calculated length (offset) including terminator
      get_bytes(0,offset+1)
    end

  end

end

OK, this is the (inelegant) workaround I have so far. It involves adding a method to FFI::Pointer. It should be safe to call in the context of my library, because all strings are supposed to be UTF-16LE encoded, but otherwise, it may not be good, because it might never encounter a double null and would just carry on reading past the the bounds of the string in memory.

module FFI
  class Pointer

    # Read string until we encounter a double-null terminator
    def read_string_dn
      cont_nullcount = 0
      offset = 0
      # Determine the offset in memory of the expected double-null
      until cont_nullcount == 2
        byte = get_bytes(offset,1)
        cont_nullcount += 1 if byte == "\000"
        cont_nullcount = 0 if byte != "\000"
        offset += 1
      end
      # Return string with calculated length (offset) including terminator
      get_bytes(0,offset+1)
    end

  end

end
不必在意 2025-01-13 08:52:23

基于相同想法的更优雅的解决方案。也处理编码。

module FFI
  class Pointer
    def read_wstring
      offset = 0
      while get_bytes(offset, 2) != "\x00\x00"
        offset += 2
      end
      get_bytes(0, offset).force_encoding('utf-16le').encode('utf-8')
    end
  end
end

More elegant solution based on the same idea. Does handle the encoding too.

module FFI
  class Pointer
    def read_wstring
      offset = 0
      while get_bytes(offset, 2) != "\x00\x00"
        offset += 2
      end
      get_bytes(0, offset).force_encoding('utf-16le').encode('utf-8')
    end
  end
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文