如何处理带有“右”空字节的 Python unicode 字符串方式?
问题
看来 PyWin32 很乐意将 null 终止的 unicode 字符串作为返回值。我想以“正确”的方式处理这些字符串。
假设我得到的字符串如下:u'C:\\Users\\Guest\\MyFile.asy\x00\x00sy'
。这似乎是一个 C 风格的以 null 结尾的字符串,挂在 Python unicode 对象中。我想将这个坏男孩修剪成常规的字符串,例如,我可以将其显示在窗口标题栏中。
在第一个空字节处修剪字符串是处理它的正确方法吗?
我没想到会得到这样的返回值,所以我想知道我是否遗漏了一些关于 Python、Win32 和 unicode 如何一起运行的重要内容……或者这只是一个 PyWin32 bug。
背景
我正在使用 Win32 文件选择器函数 GetOpenFileNameW
来自 PyWin32 包。根据文档,此函数返回一个包含完整文件名路径的元组作为 Python unicode 对象。
当我打开带有现有路径和文件名集的对话框时,我得到一个奇怪的返回值。
例如,我将默认设置为: C:\\Users\\Guest\\MyFileIsReallyReallyAwesome.asy
在对话框中,我将名称更改为 MyFile.asy
并单击“保存” 。
返回值的完整路径部分是: u'C:\Users\Guest\MyFile.asy\x00wesome.asy'`
我期望它是: u'C:\\Users\\Guest\\MyFile .asy'
该函数返回一个回收的缓冲区,而不删除终止字节。不用说,我的其余代码没有设置用于处理 C 样式以 null 结尾的字符串。
演示代码
以下代码演示了 GetSaveFileNameW 返回值中的 null 终止字符串。
说明: 在对话框中将文件名更改为“MyFile.asy”,然后单击“保存”。观察控制台上打印的内容。我得到的输出是u'C:\\Users\\Guest\\MyFile.asy\x00wesome.asy'
。
import win32gui, win32con
if __name__ == "__main__":
initial_dir = 'C:\\Users\\Guest'
initial_file = 'MyFileIsReallyReallyReallyAwesome.asy'
filter_string = 'All Files\0*.*\0'
(filename, customfilter, flags) = \
win32gui.GetSaveFileNameW(InitialDir=initial_dir,
Flags=win32con.OFN_EXPLORER, File=initial_file,
DefExt='txt', Title="Save As", Filter=filter_string,
FilterIndex=0)
print repr(filename)
注意:如果您没有足够缩短文件名(例如,如果您尝试 MyFileIsReally.asy),则字符串将是完整的,没有空字节。
环境
Windows 7 Professional 64位(无服务包),Python 2.7.1,PyWin32 Build 216
更新:PyWin32 Tracker Artifact
根据我收到的评论和回答,到目前为止,这可能是 pywin32 错误,所以我提交了 跟踪器工件。
更新 2:已修复!
Mark Hammond 在跟踪器工件中报告说,这确实是一个错误。修订版 f3fdaae5e93d 已签入修复程序,因此希望这将在下一个版本中发布。
我认为下面 Aleksi Torhamo 的答案是修复之前 PyWin32 版本的最佳解决方案。
Question
It seems that PyWin32 is comfortable with giving null-terminated unicode strings as return values. I would like to deal with these strings the 'right' way.
Let's say I'm getting a string like: u'C:\\Users\\Guest\\MyFile.asy\x00\x00sy'
. This appears to be a C-style null-terminated string hanging out in a Python unicode object. I want to trim this bad boy down to a regular ol' string of characters that I could, for example, display in a window title bar.
Is trimming the string off at the first null byte the right way to deal with it?
I didn't expect to get a return value like this, so I wonder if I'm missing something important about how Python, Win32, and unicode play together... or if this is just a PyWin32 bug.
Background
I'm using the Win32 file chooser function GetOpenFileNameW
from the PyWin32 package. According to the documentation, this function returns a tuple containing the full filename path as a Python unicode object.
When I open the dialog with an existing path and filename set, I get a strange return value.
For example I had the default set to: C:\\Users\\Guest\\MyFileIsReallyReallyReallyAwesome.asy
In the dialog I changed the name to MyFile.asy
and clicked save.
The full path part of the return value was: u'C:\Users\Guest\MyFile.asy\x00wesome.asy'`
I expected it to be: u'C:\\Users\\Guest\\MyFile.asy'
The function is returning a recycled buffer without trimming off the terminating bytes. Needless to say, the rest of my code wasn't set up for handling a C-style null-terminated string.
Demo Code
The following code demonstrates null-terminated string in return value from GetSaveFileNameW.
Directions: In the dialog change the filename to 'MyFile.asy' then click Save. Observe what is printed to the console. The output I get is u'C:\\Users\\Guest\\MyFile.asy\x00wesome.asy'
.
import win32gui, win32con
if __name__ == "__main__":
initial_dir = 'C:\\Users\\Guest'
initial_file = 'MyFileIsReallyReallyReallyAwesome.asy'
filter_string = 'All Files\0*.*\0'
(filename, customfilter, flags) = \
win32gui.GetSaveFileNameW(InitialDir=initial_dir,
Flags=win32con.OFN_EXPLORER, File=initial_file,
DefExt='txt', Title="Save As", Filter=filter_string,
FilterIndex=0)
print repr(filename)
Note: If you don't shorten the filename enough (for example, if you try MyFileIsReally.asy) the string will be complete without a null byte.
Environment
Windows 7 Professional 64-bit (no service pack), Python 2.7.1, PyWin32 Build 216
UPDATE: PyWin32 Tracker Artifact
Based on the comments and answers I have received so far, this is likely a pywin32 bug so I filed a tracker artifact.
UPDATE 2: Fixed!
Mark Hammond reported in the tracker artifact that this is indeed a bug. A fix was checked in to rev f3fdaae5e93d, so hopefully that will make the next release.
I think Aleksi Torhamo's answer below is the best solution for versions of PyWin32 before the fix.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我想说这是一个错误。处理它的正确方法可能是修复 pywin32,但如果您觉得不够冒险,只需修剪它即可。
您可以使用
filename.split('\x00', 1)[0]
获取第一个'\x00'
之前的所有内容。I'd say it's a bug. The right way to deal with it would probably be fixing pywin32, but in case you aren't feeling adventurous enough, just trim it.
You can get everything before the first
'\x00'
withfilename.split('\x00', 1)[0]
.在我测试的 PyWin32/Windows/Python 版本上不会发生这种情况;即使返回的字符串很短,我也不会得到任何空值。您可以调查上述其中一项的较新版本是否修复了该错误。
This doesn't happen on the version of PyWin32/Windows/Python I tested; I don't get any nulls in the returned string even if it's very short. You might investigate if a newer version of one of the above fixes the bug.
ISTR 说我几年前就遇到过这个问题,然后我发现这样的 Win32 文件名对话框相关函数返回一个
'filename1\0filename2\0...filenameN\0\0'
序列,而包括可能的垃圾字符,具体取决于 Windows 分配的缓冲区。现在,您可能更喜欢列表而不是原始返回值,但这将是 RFE,而不是错误。
PS 当我遇到这个问题时,我很理解为什么人们会期望
GetOpenFileName
可能返回文件名列表,而我无法想象为什么GetSaveFileName
会返回。也许这被认为是 API 的统一。无论如何,我应该认识谁?ISTR that I had this issue some years ago, then I discovered that such Win32 filename-dialog-related functions return a sequence of
'filename1\0filename2\0...filenameN\0\0'
, while including possible garbage characters depending on the buffer that Windows allocated.Now, you might prefer a list instead of the raw return value, but that would be a RFE, not a bug.
PS When I had this issue, I quite understood why one would expect
GetOpenFileName
to possibly return a list of filenames, while I couldn't imagine whyGetSaveFileName
would. Perhaps this is considered as API uniformity. Who am I to know, anyway?