当前位置：文江博客话题详情

用于提取 Microsoft Onenote 文档信息的库/服务

发布于 2024-12-17 21:22:39 字数 253 浏览 1 评论 0原文

是否存在 PHP/Ruby 库或 Web 服务，可以从 Microsoft Onenote 文档中以编程方式提取信息？

该解决方案将在 Web 应用程序后端实施。

我不是在寻找特定于 Windows 的解决方案。此外，我并不是在寻找需要用户下载应用程序扩展或可安装软件的解决方案。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゞ记忆︶ㄣ 2024-12-24 21:22:39

这是一个跨平台的单音符解析器。 (.one -> .html) 它非常原始，但它是开源的，可能会帮助您

https: //github.com/dropbox/onenote-parser
如果可以帮助您解析文件格式。

随意使用它（apache 许可证）

回复收藏 0 原文

羞稚 2024-12-24 21:22:39

简单的解决方案

您可以使用 Microsoft.Office.Interop.OneNote API 用 C# 轻松编写自己的提取器实用程序。

您可以在这篇 msdn 文章中找到详细的演练，然后您可以访问内容与代码类似：

using System;
using System.Linq;
using System.Xml.Linq;
using Microsoft.Office.Interop.OneNote;

class Program
{
  static void Main(string[] args)
  {
    var onenoteApp = new Application();

    string notebookXml;
    onenoteApp.GetHierarchy(null, HierarchyScope.hsPages, out notebookXml);

    var doc = XDocument.Parse(notebookXml);
    var ns = doc.Root.Name.Namespace;
    var pageNode = doc.Descendants(ns + "Page").Where(n => 
      n.Attribute("name").Value == "Test page").FirstOrDefault();
    if (pageNode != null)
    {
      string pageXml;
      onenoteApp.GetPageContent(pageNode.Attribute("ID").Value, out pageXml);
      Console.WriteLine(XDocument.Parse(pageXml));
    }
  }
}

可以阅读api文档此处，其中还包含举几个例子。

低级方法

如果您的环境不允许使用这个官方库，那么我不知道unix端口，但Office文档以XML格式存储。您只需要一个 XML 解析器来提取您需要的信息。
这里有 OneNote 格式规范。（顶部有最新更新的 pdf 链接）
然后，您可以使用您选择的解析器并创建您的小实用程序。我对 ruby 的建议是 libxml。

我希望这适合您的需求。

Easy solution

You could easily write your own extractor utility in C# using the Microsoft.Office.Interop.OneNote API.

You can find a detailed walkthrough in this msdn article, then you could access the content with a code similar to this:

using System;
using System.Linq;
using System.Xml.Linq;
using Microsoft.Office.Interop.OneNote;

class Program
{
  static void Main(string[] args)
  {
    var onenoteApp = new Application();

    string notebookXml;
    onenoteApp.GetHierarchy(null, HierarchyScope.hsPages, out notebookXml);

    var doc = XDocument.Parse(notebookXml);
    var ns = doc.Root.Name.Namespace;
    var pageNode = doc.Descendants(ns + "Page").Where(n => 
      n.Attribute("name").Value == "Test page").FirstOrDefault();
    if (pageNode != null)
    {
      string pageXml;
      onenoteApp.GetPageContent(pageNode.Attribute("ID").Value, out pageXml);
      Console.WriteLine(XDocument.Parse(pageXml));
    }
  }
}

You can read the api documentation here, which also contains a few examples.

Low level approach

In the case your environment does not allow to use this official library, then I don't know of a unix port, but an Office document is stored in XML format. You only need an XML parser to extract the information you need.
Here you have the OneNote format specification. (there is a pdf link to the latest update at the top)
You may then use the parser of your choice and create your little utility. My suggestion for ruby would be libxml.

I hope this suits your needs.

回复收藏 0 原文