从未知文件中提取序列化数据

发布于 2024-12-10 04:27:58 字数 551 浏览 1 评论 0原文

我最亲爱的 stackoverflowers,

我想访问包含在对我来说奇怪的扩展名的文件中的序列化数据。大部分数据似乎位于 .st.idt 文件中。

该程序旨在在 Windows 上运行,而 unix file 命令只给我误报。关于这些扩展的含义或如何调查和提取其内容有什么想法吗?

下面我在一个长列表中提供了完整的扩展,希望有人能认出它们。谷歌搜索也给我带来了误报。例如:.st 通常用于 ATARI 仿真文件。

提前致谢!

  • .cix
  • .cmp
  • .cnt
  • .dam
  • .das
  • .drf
  • .idt
  • .irc
  • .lxp
  • .mp
  • .mbr
  • .str
  • .vlf
  • .rpf
  • .st
  • .st

My dearest stackoverflowers,

I want to access the serialized data contained in files with strange, to me, extensions. The bulk of the data seems to be in a .st and an .idt file.

The program is meant to be run on Windows, and the unix file command gives me only false positives. Any ideas on either what these extensions mean or on how to investigate and extract their contents?

Below I provide the entirety of the extensions in a long list in hope somebody recognizes them. Googling also gives me false positives. For example: .st is commonly used for ATARI emulation files.

Thanks in advance!

  • .cix
  • .cmp
  • .cnt
  • .dam
  • .das
  • .drf
  • .idt
  • .irc
  • .lxp
  • .mp
  • .mbr
  • .str
  • .vlf
  • .rpf
  • .st
  • .st

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

陌路黄昏 2024-12-17 04:27:58

关于如何解决此问题的一些一般建议:

  1. 解决此问题的一种方法是使用 http://filex.com/ 尝试找出文件来自哪里。这可能很困难,因为并不是任何地方都有文件扩展名标准 - 任何人都可以使用任何扩展名,因此您将需要解决很多冲突/消歧问题。
  2. 有时您会很幸运,如果您在纯文本编辑器中打开文件,您偶尔会看到可读的纯字符串数据,这可以帮助识别文件中包含的一般数据类型,从而有助于减少文件的可能来源数量。例如,我经常帮助那些收到带有扩展名的电子邮件附件的文件的人,使用此技术找出文件类型,添加文件扩展名,然后以适当的方式打开它。程序。
  3. 还有一些网站,例如 http://www.oldversion.com/ 保留了您所使用的程序的旧版本(通常)可以免费下载。如果您正在使用的数据是 5 年前在专有程序中创建的,并且该程序不再可以从创建它的供应商那里获得/购买,则这尤其有用。
  4. 一旦您很好地了解了哪些文件属于哪些程序,那么您可能会花费大量时间尝试查找有关文件结构的在线资源。如果不可用,您可以获取原始程序的副本,但该程序不会打开您感兴趣的文件,或者您仍然希望对数据进行原始访问,然后尝试使用以下命令生成一些示例输出文件输入的数据,然后Rosetta Stone对其进行比较,已知文件到原始文件。
  5. 从那里,您可能需要的额外知识是尝试找出软件是用什么语言/编译器编写的,这可以让您首先了解使用哪些代码库来序列化数据。一旦您了解了所有这些,那么就需要阅读有关序列化过程的任何可用文档,然后编写反序列化器。

该技术无法解决的一件事是,如果您正在处理损坏/截断的数据文件,则可能很难区分它们之间的区别以及文件结构是否正确。在这种情况下,“罗塞塔石碑”技术可能会有所帮助。

根据您所讨论的源软件的不同数量,听起来像是一个相当大的项目。祝你好运!

Some general advice on how to approach this:

  1. One way to approach this is to use a site like http://filext.com/ to try to figure out where the files came from. This can be tough, because it's not like there's a file extension standard anywhere - anyone can use any extension, so you're going to have a lot of conflicts/disambiguation issues to solve.
  2. Sometimes you can get lucky, and if you open up the files in a plain text editor you can occasionally see plain string data that is readable, which can help identify the general sort of data contained in a file, and therefore help cut down on the possible number of sources for a file. For example, I have often helped people who received a file as an email attachment with no extension, figure out what file type it was using this technique, adding the file extension, and then opening it in the appropriate program.
  3. There are also sites like http://www.oldversion.com/ that keep old versions of programs that you (typcially) can download for free. This is especially helpful if the data you're working with was created 5+ years ago, in a proprietary program, and that program is no longer available/purchasable from the vendor who created it.
  4. Once you have a good idea of what files belong to what programs, then you're probably going to spend a lot of time trying to find online resources for what the structure of the files are. If that isn't available, you can get a copy of the original program, but either the program won't open the files you're interested in or you still want raw access to the data, then try generating some sample output files with data that you input, and go Rosetta Stone on it, comparing your known file to the original file.
  5. From there, the additional knowledge you'll probably want, is to try to find out what language/compiler the software was written in, which can give you a lead on what code libraries were used to serialize the data in the first place. Once you know all that, then it's matter of reading through any available documentation on the serialization process, and then writing a deserializer.

The one thing this technique won't solve is, if you're dealing with corrupt/truncated data files, it may be very difficult to tell the difference between that and whether or not you have the file structure correct. The "Rosetta Stone" technique might be helpful in that case.

Depending on how many different pieces of source software you're talking about, sounds like a pretty big project. Good luck!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文