从 .doc 中提取超链接
有没有办法从 .doc 中提取超链接。我在文档中得到了一堆超链接,我需要将其导入到我的数据库中。
我尝试将 doc 转换为 HTML,但超链接未传输。
问候, 姆拉登
Is there any way to extract hyperlinks from .doc. I got bunch of hyperlinks in doc that I need to import in my database.
I have tried converting doc to HTML, but hyperlinks are not transferred.
Regardz,
Mladen
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我们遇到了类似的问题,最终使用了名为 Aspose.Words 的第三方组件。
您可以在这里找到它: http://www.aspose.com
它适用于 .NET 和 Java。
We had a similar issue and ended up using a third party component called Aspose.Words.
You can find it here: http://www.aspose.com
It's available for .NET and Java.
您可以尝试将文件导入到OpenOffice中,看看是否传输了超链接。 OpenDocument 只是一个内部包含 XML 的 ZIP 文件,一旦掌握了它的窍门,就很容易解析它。
You could try importing the file into OpenOffice and see whether hyperlinks are transferred. OpenDocument is just a ZIP file with XML inside, very easy to parse once you've got the hang of it.
我做了以下事情。我用officeXP打开了.doc文件,然后将其发布为博客,之后我以过滤网页的形式保存了该博客。这为您提供了可以轻松解析的漂亮 HTML。
I have done the following thing. I have opened the .doc file with officeXP, then published it as a blog and after that I have saved that blog in the form of filtered web page. That gives you nice HTML which you can parse with ease.
我意识到这是在您最初提出问题几个月后,但是,您还可以通过 Word Automation 提取 .doc 文件中的超链接。 API 中有超链接对象,您可以轻松提取。
I realise this is some months after your initial question, however, You can also extract hyperlinks in a .doc file through through Word Automation. There are hyperlink objects in the API that you can easily extract.