将印地语/阿拉伯语字符集从 pdf 文件转换为 mobi 文件
我正在创建一项在线服务,但我完全不知道从哪里开始进行多语言 pdf 到 mobi 文件转换。我创建了一个英语应用程序,这非常简单,但 pdf 和多字节字符集的问题是它们被解释为图像,这意味着它们不会被解释为 mobi 文件格式的单词。
有没有办法/在线服务/api/代码来做到/实现这个?任何可以在文件到文件的基础上执行此操作的 Windows 应用程序也可以......
I am creating an online service and I have absolutely no idea on where to even start on multilingual pdf to mobi file conversion. I have created an app for english language and that is pretty easy but the problem with the pdf and multibyte charsets is that they are interpreted as images which means that they are not interpreted as words in mobi file format.
Is there a way/online service/api/code to do/implement this? Any windows application that can do this on file to file basis would be fine as well...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要为此编写自己的插件,因为市场上不支持此功能。
但是,您可以使用自定义字母映射对其进行转换,这意味着每个图像首先由 ocr 读取,然后用于查找 utf 等效项。
You will need to write your own plugin for this as there is no support for this in the market.
However, you can convert it using a custom map of letters which would mean that each image is first read by ocr and then that is used to find a utf equivalent.