解码传统二进制格式
我试图弄清楚如何解码来自 Windows 应用程序的“遗留”二进制文件(anno ±1990)。具体来说,我很难理解正在存储的字符串使用什么特定编码。
示例:unicode 字符串“Düsseldorf”表示为“Du\06sseldorf”或十六进制“44 75 06 73 73 65 6C 64 6F 72 66”,其中所有内容都是单字节,除了“u + \06”,它神秘地变成了 u-元音变音。
它是完全专有的吗?有什么想法吗?
I am trying to figure out how to decode a "legacy" binary file that is coming from a Windows application (anno ±1990). Specifically I have a trouble to understand what specific encoding is used for the strings that are being stored.
Example: a unicode string "Düsseldorf" is represented as "Du\06sseldorf" or hex "44 75 06 73 73 65 6C 64 6F 72 66" where everything is single-byte except "u + \06" that mysteriously become an u-umlaut.
Is it completely proprietary? Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于这个应用程序早于 DBCS 和 Unicode,我怀疑它是专有的。看起来他们可能使用低于 31 的非 ASCII 值来表示各种重音符号。
\06
可能表示“在前一个字符上添加变音符号”。尝试将字符串替换为
"Du\05sseldorf"
并查看 u 上的重音是否发生变化。然后尝试 1 到 31 之间的其他转义值,我怀疑您也许能够为这些转义字符找到一个映射。当然,一旦有了映射,您就可以轻松创建一个例程,将所有字符串替换为适当的现代 Unicode 字符串,并带有适当的重音符号。Since this app pre-dates DBCS and Unicode, I suspect that it is proprietary. It looks like they might be using the non-ASCII values below 31 to represent the various accent marks.
\06
may indicate "put an umlaut on the previous character".Try replacing the string with
"Du\05sseldorf"
and see if the accent changes over the u. Then try other escaped values between 1 and 31, and I suspect you may be able to come up with a map for these escape characters. Of course, once you have the map, you could easily create a routine to replace all of the strings with proper modern Unicode strings with the accents in place.