将数百个.doc页面放入网页中

发布于 2024-09-09 03:31:20 字数 435 浏览 3 评论 0原文

我有数百个 .doc 文件,其中包含我需要放在网页上的文本。

我意识到我可以将每个 .doc 文件转换为 .txt,然后使用服务器端包含将每个页面的内容嵌入到网页中。这将节省大量时间,因为我只需拥有一个 .php?txt=... 页面,该页面将根据用户按下的链接显示不同的 .txt 包含内容。这在内容方面非常有效。

但是,当转换为 .txt 时,所有格式都会丢失(标题应为粗体)

当我使用 Microsoft Word 将这些 .doc 文件转换为 .html 时,~20 行文档变得臃肿 > 300 行 .htm 文件(可能是因为每个段落都放入文本框中)

Dreamweaver 的“清理 Word HTML”有点帮助,但代码仍然非常臃肿。

您建议如何解决这个问题?

编辑:我可能已经解决了我自己的问题,尝试将 Google 文档嵌入到我的页面中。

I have hundreds of .doc files with text that I need put on web pages.

I realize I could convert every .doc file to .txt, then use a server side include to embed the contents of each page into a webpage. This would save a lot of time because I could simply have one .php?txt=... page which will display a different .txt include depending on the link the user pressed to get there. This works perfectly content-wise.

However, all formatting is lost when it is converted to .txt (titles should be in bold)

When I convert these .doc files to .html using Microsoft Word, the ~20 line documents become bloated >300 line .htm files (probably because each paragraph is put into textboxes)

Dreamweaver's "Clean up Word HTML" helped a bit but the code was still extremely bloated.

How would you suggest going about this?

edit: I may have solved my own question, trying to embed Google docs into my page.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

苏大泽ㄣ 2024-09-16 03:31:20

有一个名为 wv(以前的 mswordview)的程序套件。它有一个程序wvWare。该软件可以将Word文档转换为HTML。

此外,您可以使用 Word 的输出并通过 tidy 发送。这可以纠正标记,并且通常可以处理 Word 所犯的错误。

There is a program suite called wv (former mswordview). It has a program wvWare. This software can transform Word documents to HTML.

Furthermore you can use the output from Word and send it through tidy. This corrects markup and usually can handle the mistakes made by Word.

暖伴 2024-09-16 03:31:20

MS Word 是英国媒体报道软件。它自己的标记很臃肿,因此任何将其自动转换为 HTML 的尝试都会继承这些问题。你最终会得到像这样的垃圾: 无缘无故。

Dreamweaver 可以对其进行很多清理,但只有剥离/重新标记才能获得干净的结果。

这就是大多数人使用 PDF 来解决此类问题的原因。

MS Word is bloatware. Its own markup is bloated, and therefore any attempt to automatically convert it to HTML will inherit these problems. You end up with garbage like: <strong><strong></strong></strong> for no good reason.

Dreamweaver can clean it up a lot, but nothing short of strip/remarkup is going to get you clean results.

That's why most people use PDFs for this type of issue.

青衫儰鉨ミ守葔 2024-09-16 03:31:20

我的第一反应是将文档转换为 PDF。这通常会很好地保留格式,并且用户通常会将浏览器设置为以一种或另一种方式查看 PDF(少数不这样做的人无疑习惯于无法在许多网站上查看大量文档)。

My immediate reaction would be to convert the docs to PDFs. That will normally preserve formatting quite well, and users typically have their browsers set up to view PDFs one way or another (and the few who don't are undoubtedly accustomed to being unable to view a lot of documents on a lot of sites).

不回头走下去 2024-09-16 03:31:20

好的,感谢大家的建议,但我想让每个没有 pdf 查看器的人也可以访问此页面。

Google 文档允许您批量上传文本文件(并为您转换它们),

然后您可以将它们导出到 iframe 中以嵌入到任何 html 文档中。

Alright thanks everyone for your suggestions, but I wanted to make this page accessible to everyone without pdf viewers as well.

Google docs allows you to bulk upload your text files (and converts them for you too)

You can then export them into an iframe to embed in any html document.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文