Evernote 导出格式 (ENEX) 为 HTML,包括图片?

发布于 2024-08-09 02:30:08 字数 4429 浏览 5 评论 0原文

@Solved

我创建的两个子问题已经解决(是的,将这个问题分开!),所以这个问题已经解决了。我会将复选标记授予 samjudson,因为他的答案是最接近的。对于实际的工作解决方案,请参阅以下子问题;我实施的解决方案和检查的答案。

@Deprecated

我将这个问题分成两个单独的问题,因为这是一个相当复杂的问题。不过,仍然欢迎答案。

问题是:

  1. XSLT:将base64数据转换为 图像文件
  2. XSLT:获取或匹配哈希值 对于base64编码数据

嗨,只是想知道这里是否有人成功转换Evernote导出格式(XML)到HTML(包括图片)。我确实知道 Evernote 有一个导出到 HTML 的功能可以做到这一点,但我最终想用它做更多奇特的事情。

我已成功仅使用以下 XSLT 获取文本:

已删除示例代码

请参阅子问题以获取已实施的解决方案。

然而,自动柜员机这只是忽略任何图片,这就是我需要帮助的地方。

绊脚石#1:Evernote 将其图片存储为 GIF 或 PNG,导出时会嵌入这些 GIF 和 PNG 格式。 PNG 直接在 XML 中使用似乎是 base64 的内容(我可能是错的)。我需要能够重新组合图片。如果您在文本编辑器中打开该文件,请在 **//note/resource/data** 中查找巨大的数据块。例如(手动添加缩进):

<resource>
<data encoding="base64">
R0lGODlhEAAQAPMAMcDAwP/crv/erbigfVdLOyslHQAAAAECAwECAwECAwECAwECAwECAwECAwEC
AwECAyH/C01TT0ZGSUNFOS4wGAAAAAxtc09QTVNPRkZJQ0U5LjAHgfNAGQAh/wtNU09GRklDRTku
MBUAAAAJcEhZcwAACxMAAAsTAQCanBgAIf8LTVNPRkZJQ0U5LjATAAAAB3RJTUUH1AkWBTYSQXe8
fQAh+QQBAAAAACwAAAAAEAAQAAADSQhgpv7OlDGYstCIMqsZAXYJJEdRQRWRrHk2I9t28CLfX63d
ZEXovJ7htwr6dIQB7/hgJGXMzFApOBYgl6n1il0Mv5xuhBEGJAAAOw==
</data>
<mime>image/gif</mime>
<resource-attributes>
    <file-name>clip_image001.gif</file-name>
</resource-attributes>
</resource>

绊脚石#2:Evernote将资源节点下每张图片的文件名存储
**//note/resource/resource-attributes/file-name**
然而,在引用图片的实际注释中,它不是通过文件名而是通过其哈希引用图片,例如:

<en-media hash="4aaafc3e14314027bb1d89cf7d59a06c" type="image/gif" border="0" width="16" height="16" alt="Alt Text"/>

任何人都可以阐明如何处理 XML 内的(base64)编码的二进制数据吗?

编辑

我从评论中了解到答案是普通的 XSLT 无法完成处理图像的工作。我使用的 XSLT 处理器是 Xalan ,但是,如果这不好对于图像处理或 Base64 来说足够了,那么我请推荐一个可以做到这些的!

另外,根据要求,这里有一个 Evernote 导出文件示例。上面的代码片段只是其中的选定部分。我将其精简为仅包含一个注释,并编辑了其中的大部分文本,并为了清晰起见添加了缩进。

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export.dtd">
<en-export export-date="20091029T063411Z" application="Evernote/Windows" version="3.0">

<note>
    <title>A title here</title>
    <content><![CDATA[
        <?xml version="1.0" encoding="UTF-8"?>
        <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml.dtd">
        <en-note bgcolor="#FFFFFF">
            <p>Some text here (followed by the picture)
            <p><en-media hash="4aaafc3e14314027bb1d89cf7d59a06c" type="image/gif" border="0" width="16" height="16" alt="A picture"/></p>
            <p>Some more text here (preceded by the picture)
        </en-note>
    ]]></content>
    <created>20090925T063154Z</created>
    <note-attributes>
        <author/>
    </note-attributes>
    <resource>
        <data encoding="base64">
R0lGODlhEAAQAPMAMcDAwP/crv/erbigfVdLOyslHQAAAAECAwECAwECAwECAwECAwECAwECAwEC
AwECAyH/C01TT0ZGSUNFOS4wGAAAAAxtc09QTVNPRkZJQ0U5LjAHgfNAGQAh/wtNU09GRklDRTku
MBUAAAAJcEhZcwAACxMAAAsTAQCanBgAIf8LTVNPRkZJQ0U5LjATAAAAB3RJTUUH1AkWBTYSQXe8
fQAh+QQBAAAAACwAAAAAEAAQAAADSQhgpv7OlDGYstCIMqsZAXYJJEdRQRWRrHk2I9t28CLfX63d
ZEXovJ7htwr6dIQB7/hgJGXMzFApOBYgl6n1il0Mv5xuhBEGJAAAOw==
        </data>
        <mime>image/gif</mime>
        <resource-attributes>
            <file-name>clip_image001.gif</file-name>
        </resource-attributes>
    </resource>
</note>

</en-export>

这需要转换成这样:

<html>
    <body>
        <p>Some text here (followed by the picture)
        <p><img src="clip_image001.gif" border="0" width="16" height="16" alt="A picture"/></p>
        <p>Some more text here (preceded by the picture)
    </body>
</html>

生成并保存文件clip_image001.gif

@Solved

The two subquestions I have created have been solved (yay for splitting this one up!), so this one is solved. I'll award the check mark to samjudson, since his answer was the closest. For actual working solutions though, see the below subquestions; both my implemented solutions and the checked answers.

@Deprecated

I am splitting this question into two separate questions, since this is a fairly complicated problem. Answers are still welcome though.

The suquestions are:

  1. XSLT: Convert base64 data into
    image files
  2. XSLT: Obtaining or matching hashes
    for base64 encoded data

Hi, just wondering if anyone here has had any success in converting Evernote's export format, which is XML, to HTML including the pictures. I do know that Evernote has an export to HTML function which does this, but I eventually want to do more fancy stuff with it.

I have managed to accomplish getting the text only using the following XSLT:

Sample code removed

See child questions for implemented solutions.

However, a.t.m. this simply ignores any pictures, and this is where I need help.

Stumbling block #1: Evernote stores its pictures as GIFs or PNGs, and when exported, it embeds these GIFs & PNGs directly in the XML using what appears to be base64 (I could be wrong). I need to be able to reconsitute the pictures. If you open the file in a text editor, look for the huge blocks of data in the **//note/resource/data**. For example (indents added manually):

<resource>
<data encoding="base64">
R0lGODlhEAAQAPMAMcDAwP/crv/erbigfVdLOyslHQAAAAECAwECAwECAwECAwECAwECAwECAwEC
AwECAyH/C01TT0ZGSUNFOS4wGAAAAAxtc09QTVNPRkZJQ0U5LjAHgfNAGQAh/wtNU09GRklDRTku
MBUAAAAJcEhZcwAACxMAAAsTAQCanBgAIf8LTVNPRkZJQ0U5LjATAAAAB3RJTUUH1AkWBTYSQXe8
fQAh+QQBAAAAACwAAAAAEAAQAAADSQhgpv7OlDGYstCIMqsZAXYJJEdRQRWRrHk2I9t28CLfX63d
ZEXovJ7htwr6dIQB7/hgJGXMzFApOBYgl6n1il0Mv5xuhBEGJAAAOw==
</data>
<mime>image/gif</mime>
<resource-attributes>
    <file-name>clip_image001.gif</file-name>
</resource-attributes>
</resource>

Stumbling block #2: Evernote stores the file names of each picture under the resource node
**//note/resource/resource-attributes/file-name**
however, in the actual note in which it refers to the picture, it references the picture not by the filename, but by its hash, for example:

<en-media hash="4aaafc3e14314027bb1d89cf7d59a06c" type="image/gif" border="0" width="16" height="16" alt="Alt Text"/>

Can anyone shed some light on how to deal with (base64) encoded binary data inside XML?

Edit

I understand from the comments & answers that plain ol' XSLT won't get the job done handling images. The XSLT processor I am using is Xalan , however, if this is not good enough for the purposes of image processing or base64, then I am please suggest one that does do these!

Also, as requested, here is a sample Evernote export file. The code clips above are merely selected parts of this. I have stripped it down such that it contains just one note and edited most of the text out of it, and added indents for clarity.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export.dtd">
<en-export export-date="20091029T063411Z" application="Evernote/Windows" version="3.0">

<note>
    <title>A title here</title>
    <content><![CDATA[
        <?xml version="1.0" encoding="UTF-8"?>
        <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml.dtd">
        <en-note bgcolor="#FFFFFF">
            <p>Some text here (followed by the picture)
            <p><en-media hash="4aaafc3e14314027bb1d89cf7d59a06c" type="image/gif" border="0" width="16" height="16" alt="A picture"/></p>
            <p>Some more text here (preceded by the picture)
        </en-note>
    ]]></content>
    <created>20090925T063154Z</created>
    <note-attributes>
        <author/>
    </note-attributes>
    <resource>
        <data encoding="base64">
R0lGODlhEAAQAPMAMcDAwP/crv/erbigfVdLOyslHQAAAAECAwECAwECAwECAwECAwECAwECAwEC
AwECAyH/C01TT0ZGSUNFOS4wGAAAAAxtc09QTVNPRkZJQ0U5LjAHgfNAGQAh/wtNU09GRklDRTku
MBUAAAAJcEhZcwAACxMAAAsTAQCanBgAIf8LTVNPRkZJQ0U5LjATAAAAB3RJTUUH1AkWBTYSQXe8
fQAh+QQBAAAAACwAAAAAEAAQAAADSQhgpv7OlDGYstCIMqsZAXYJJEdRQRWRrHk2I9t28CLfX63d
ZEXovJ7htwr6dIQB7/hgJGXMzFApOBYgl6n1il0Mv5xuhBEGJAAAOw==
        </data>
        <mime>image/gif</mime>
        <resource-attributes>
            <file-name>clip_image001.gif</file-name>
        </resource-attributes>
    </resource>
</note>

</en-export>

And this needs to be transformed into this:

<html>
    <body>
        <p>Some text here (followed by the picture)
        <p><img src="clip_image001.gif" border="0" width="16" height="16" alt="A picture"/></p>
        <p>Some more text here (preceded by the picture)
    </body>
</html>

With the file clip_image001.gif being generated and saved.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

叫思念不要吵 2024-08-16 02:30:08

有一个新的数据 URI 规范 http://en.wikipedia.org/wiki/Data_URI_scheme如果您只想支持现代浏览器,并且您的图像很小(例如 IE8 仅支持 <32k 图像),这可能会有所帮助。

除此之外,您唯一可以做的就是使用一些外部脚本将图像数据导出到文件并使用它们。这在很大程度上取决于您使用的 XSLT 处理器。

There is a new Data URI specification http://en.wikipedia.org/wiki/Data_URI_scheme which may be of some help provided you are only intending to support modern browsers, and your images are small (for example IE8 only support <32k images).

Other than that the only other thing you can do is use some external scripts to export the image data to file and use them. This would depend greatly on what XSLT processor you are using.

茶花眉 2024-08-16 02:30:08

我刚刚制作了一个新的 Python 脚本,用于将 .enex 文件转换为 .html,包括图像/附件等。这个脚本并不完美,但至少是一个好的开始。

https://github.com/eirikora/enex2html

下载、尝试并贡献!

此致,
埃里克·Y·奥拉

I just made a new Python script to convert .enex files to .html including images/attachments, etc. This script is not perfect, but at least a good start.

https://github.com/eirikora/enex2html

Download, try, and contribute!

Best regards,
Eirik Y. Øra

亽野灬性zι浪 2024-08-16 02:30:08

对于这个问题,存在一个纯 XSLT 答案; 查看此页面

It exists a pure XSLT answer to this issue ; look at this page

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文