确定文件使用哪种编码?

发布于 2024-10-17 01:31:43 字数 2251 浏览 1 评论 0原文

我有一个文件,我认为它是 XML 类型文件,但是当我将扩展名更改为 TXT 并通过文本编辑器打开它时,我得到了“

.�2�'��7cõ’¥¶_ä™πUUUN?¯ÖÀuóbåqW÷õxó_i}Ï08Y‚û¡d≈§•§è«/Óÿ`*∆cÅ·x…ëë«Öµ¶fi—

是否有任何方法可以确定正在使用哪种编码?”

编辑:

该文件是用于合法证词软件的 ptx 文件。我正在尝试为 Mac 创建一个阅读器。

这是十六进制编辑器中的文件内容。



I have a file that I believe to be an XML type file but when I change the extension to TXT and open it through a text editor I get

.�2�'��7cõ’¥¶_ä™πUUUN?¯ÖÀuóbåqW÷õxó_i}Ï08Y‚û¡d≈§•§è«/Óÿ`*∆cÅ·x…ëë«Öµ¶fi—

Is there anyway to determine what kind of encoding is being used?

EDIT:

The file is a ptx file used for legal deposition software. I am trying to create a reader for the mac.

This is contents of the file from a hex editor.



如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

温柔一刀 2024-10-24 01:31:43

看起来不像常见的字符集。几乎所有编码都以某种形式保留 ASCII 字符。

所以我看到了一些可能性:

  1. 压缩文本/xml 文件中的文件
  2. 加密文本/xml 文件中的文件
  3. 这是一种二进制文件格式
  4. 它已被混淆

现在,如果我们查看它,我们会看到带有重复字符的序列,例如“UUU”和“ ëë”经常出现。由于加密数据看起来完全随机,这使得第二种选择不太可能。

是什么让您认为这应该是一个 xml 文件?由于该文件是二进制文件,您可能希望以十六进制而不是文本形式发布文件的开头。

看看你的十六进制转储,我很确定它没有加密。像“01 00”这样的序列是未压缩二进制格式的典型序列。所以这是我最好的猜测。


仅从示例文件编写解析器相当困难。我首先尝试的是在网上寻找格式规范。

如果您尝试找出文件格式,您可以开始逆向创建它们的应用程序(如果程序被编译为程序集,则相当困难,而对于字节码(例如 java 或 .net)使用则相当容易)。

或者您从一个简单的文件开始。在程序中对其进行最小的更改并比较文件中的差异。这是一项繁重的工作,而且只适用于相当简单的文件格式。


我在搜索时找不到规范。而且似乎只有一个实现。您可以尝试联系创建它的公司,但我不知何故怀疑他们会提供帮助。所以我想你需要自己对格式进行逆向工程。这可能并不容易,而且需要相当多的工作。看起来不错。

Doesn't look like an common charset. Almost any encoding preserves ASCII characters in some form.

So I see a few possibilities:

  1. The file in a compressed text/xml file
  2. The file in a encrypted text/xml file
  3. It's a binary file format
  4. It's obfuscated

Now if we look at it we see sequences with repeated characters like "UUU" and "ëë" occur quite often. Since encrypted data appears completely random this makes the second option unlikely.

What makes you think that this should be an xml file? And since the file is binary you might want to post the beginning of the file in hex instead of text.

Looking at your hex-dump I'm pretty sure it's not encrypted. And sequences like "01 00" are typical for an uncompressed binary format. So that's my best guess.


Writing a parser from just a sample file is rather hard. First thing I'd try is looking for a format specification on the net.

If you try to figure out a file format you can either start reversing the application creating them(That's rather hard if the program is compiled to assembly, and rather easy for byte-code such as java or .net use).

Or you start with a simple file. Make minimal changes to it in the program and compare the differences in the file. That's a lot of work, and only possible for rather simple file formats.


I couldn't find a specification when searching a bit. And there seems to be only a single implementation. You could try contacting the company who created it, but I somehow doubt they'll help. So I guess you need to reverse engineer the format yourself. That's probably not easy and quite a bit of work. Good look.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文