如何将 MathType 方程转换为 MathML 格式?

发布于 2024-11-19 06:27:57 字数 925 浏览 11 评论 0原文

我想将保存为 GIF 格式的 MathType 方程转换为 MathML。首先,我打开这些 GIF 文件并将它们保存在 MathType 6.7 中。结果,MathML 文本被插入到 GIF 文件的末尾。然而,当我使用 Perl 脚本从这些 GIF 文件中提取 MathML 文本时,我发现 MathML 文本中存在一些乱码,如下所示:

<mn>xxx

在上面的行中,在'mn'标签之前插入了一个乱码字符。这是MathType的BUG吗?我该如何解决这个问题?我已经上传了我的测试 GIF 文件。网址为: http://ubuntuone.com/p/1352/

更新: 我尝试在这里粘贴完整的 MathML 块,但我发现 MathML 文本的语法格式很混乱。所以我将 MathML 粘贴到 GitHub 上:https://gist.github.com/1068723

MathML 文本的第七行有一个乱码:“  ?#x00A0;”。

不包含 MathML 文本的原始 GIF 文件: http://ubuntuone.com/p/13Ba/

Perl 脚本,从 MathType 生成的 GIF 图像中提取 MathML:https://gist.github.com/1068749

谢谢, 思考

I want to convert MathType equation saved as GIF format to MathML. Firstly, I opened these GIF files and saved them within MathType 6.7. As a result, MathML text is inserted into the end of GIF files. However, when I extracted MathML text from these GIF files using Perl script, I found some garbled characters in the MathML text as following text:

<mn>xxx</mn>

In the above line, a garbled character  is inserted before 'mn' label. Is this MathType 's BUG? How can I work around this problem? I have uploaded my test GIF files. URL is: http://ubuntuone.com/p/1352/

Update:
I have tried to paste full block of MathML here, but I found the syntax format of MathML text was messed. So I pasted the MathML on GitHub: https://gist.github.com/1068723.

There is a garbled character in the seventh line of MathML text: "  ?#x00A0;".

The original GIF file which doesn't contain MathML text: http://ubuntuone.com/p/13Ba/

Perl script that extracts MathML from GIF image generated by MathType: https://gist.github.com/1068749

Thanks,
thinkhy

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

愁以何悠 2024-11-26 06:27:57

谢谢思希。可能是您错误地提取了数据(我们还没有查看您的脚本)。您的 GIF 中只有一张具有 MathML,即文件名以 106R 开头的一张。在那个例子中,如果你只是抓取从看起来像 MathML 的第一位到最后的所有字节,你确实会定期得到奇数字节,除了最后一个字节之外,大部分都是 255。 (然而,这似乎不是您所看到的垃圾字符。) 255 的原因是 MathML 分布在多个注释记录中,每个注释记录都以记录中的字节计数开始。从 MathType SDK(免费下载;链接如下):


GIF 图像文件

MathML 文本作为应用程序扩展记录嵌入到 GIF 文件中,其中包含 14 字节标头(应用程序扩展描述符),后跟 MTEF 数据。标头包含:

Byte Introducer = 0x21;
Byte ExtensionLabel = 0xFF;
Byte BlockSize = 0x0B;
Byte ApplicationId[8] = "MathType";
Byte AuthenticationCode[3] = "003";

数据跟随此标头,并被写入一系列块,每个块包含 255 个字节或更少。每个块以单个字节计数开始,后跟数据。末尾被标记为长度为 0 的块。

标头足够唯一,提取数据的最简单方法可能是扫描文件中的 14 字节标头,然后期望后面是 MathML 数据块。正确解码 GIF 记录也不难,但显然需要您阅读 GIF 规范。


您可能已经在使用该 SDK,但您没有说明是否使用过,因此链接如下: http://www.dessci.com/en/reference/sdk/

Thanks thinkhy. It could be you extracting the data incorrectly (we haven't looked at your script yet). Only one of your GIFs had MathML -- the one that has a file name starting 106R. In that one, if you just grab all the bytes from the first bit that looks like MathML until the end, you do periodically get odd bytes in there, mostly 255's except the last one. (This however doesn't appear to be the junk character you're seeing.) The reason for the 255's is that the MathML is distributed over multiple comment records, each one of which starts with a count of the bytes in the record. From the MathType SDK (free download; link below):


GIF Image Files

MathML text is embedded into a GIF file as an Application Extension Record, which consists of a 14-byte header (Application Extension Descriptor), followed by the MTEF data. The header contains:

Byte Introducer = 0x21;
Byte ExtensionLabel = 0xFF;
Byte BlockSize = 0x0B;
Byte ApplicationId[8] = "MathType";
Byte AuthenticationCode[3] = "003";

The data follows this header and is written as a series of blocks each containing 255 bytes or less. Each block starts with a single byte count followed by the data. The end is marked as a block with length 0.

The header is unique enough that the easiest way to extract the data might be to scan the file for the 14-byte header, then expect the MathML data blocks to follow. Properly decoding the GIF records isn't that hard either, but obviously requires you read the GIF specification.


You may already be using the SDK, but you didn't say whether you were or not, so here's the link: http://www.dessci.com/en/reference/sdk/.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文