啊。 Word 以其臃肿、复杂、不符合标准、无语义的 HTML 而臭名昭著。不幸的是,我有一位教授要求我们按照非常严格的标准制定大纲。我不想手写它,所以我决定做一些对我的同学也有用的东西。我在 Mac 上的 NeoOffice 中使用简单的编号列表创建了大纲,将其导出为 HTML,并编写了大量 CSS 来设置其样式。然后,我找人在 Word for Windows 中创建了一个有序列表,将其导出为 html,然后将其发送给我以检查兼容性。在页面向下滚动数英里后,我试图抑制住颤抖,但发现了一个问题。 Word 不使用
和
。它使用了大量的嵌套
,其中的类超出了wazoo。我不想看到我的所有工作都被浪费了,但是这些内容是不可能使用的——我必须在文档到文档的基础上设置样式,而不是使用通用样式表。
理想情况下,Word 会使用标准标签生成 HTML,以便我可以像任何其他列表一样设置其样式,但情况似乎并非如此。我怎样才能让它生成实际使用
和
而不是
的列表,或者至少修改我的代码中的某些内容以某种方式使用它创建列表的奇怪方式?
Ugh. Word is notorious for its bloated, convoluted, non-standards-compliant, non-semantic HTML. Unfortunately, I have a professor who is requiring us to generate an outline to very exacting standards. I'd rather not hand-write it, so I decided to make something that would be useful for my classmates as well. I created the outline using a simple numbered list in NeoOffice on my Mac, exported it as HTML, and wrote quite a bit of CSS to style it. Then, I got someone to create an ordered list in Word for Windows, export it as html, and send it to me to check compatibility. After scrolling miles down the page, trying to repress a shudder, I saw a problem. Word did not use <ol>
and <li>
. It used mountains of nested <span>
s with classes out the wazoo. I hate to see all my work go to waste, but this content is impossible to work with—I'd have to style on a document-to-document basis, rather than with a universal stylesheet.
Ideally, Word would generate HTML using standard tags so that I could style it just like any other list, but this doesn't seem to be the case. How can I make it generate lists that actually use <ul>
and <li>
rather than <span>
, or at least modify something in my code to somehow work with the way weird way it does create lists?
发布评论
评论(9)
编写 Winword 及其 HTML 生成程序的人都是聪明人。如果以纯粹的方式使用 HTML 功能很容易,他们就会这样做。
Word 旨在创建纸张优化的布局。它支持 HTML 不支持或刚刚开始支持的制表位和多级编号等概念。因此,Word 文档的 HTML 版本并不是“好的”HTML,而是准确保留 Word 文档功能的尝试。
当 Word 重新打开已保存的 HTML 文件时,它会对文档进行一些巧妙的逆向工程,以便在 Word 中呈现的内容看起来与开始时非常相似。同样,如果您将 HTML 作为片段插入到网页中,并保留 Word CSS,结果也相当准确。在这种情况下,网页的底层 CSS 和 Word 的 CSS 之间存在文化冲突,需要付出一些努力来弥补糟糕的工作。 Word HTML 也不使用 UTF-8,这需要一些处理。
HTMLTidy 可用于撕掉 Word 标记,但此后需要进行更多处理才能在网页中良好呈现。我在一个产品上工作了 15 年,它可以混合 Word 和网页,如果你微调 CSS,结果会非常好。
我们使用 Word 是因为我们要创建纸质版本,并从 Word 编写的报告中导入文本,而不是因为我们找不到专用的 HTML 编辑器。
我不建议使用 Word 创建整洁、纯粹的 HTML。您不会使用开罐器来打开一瓶酒,对吗?
如果满足以下条件,生活会简单得多:
a) 微软重新设计了其高度混乱的“项目符号和数字”功能的无数选项,
b) HTML 提供了本机的、功能正确的多级编号支持,而不是当前可用的事后考虑的方法。 HTML 在这方面的弱点可以从 Google 文档中脆弱的编号选项中看出。
HTML 5 已经改进了很多,也许我们可以希望 HTML 6 能够帮助弥合文字处理器/HTML 编辑器之间的鸿沟。
The guys who wrote Winword and its HTML generation are smart guys. If it was easy to use HTML features in a purist way they would have done so.
Word is about creating paper-optimised layouts. it supports concepts such as tab-stops and multi-level numbering that HTML doesn't support, or is only just starting to. As a result, the HTML version of a Word document is not 'nice' HTML, but an attempt to retain the features of the Word document accurately.
When Word re-opens an HTML file it has saved, it does some clever reverse-engineering on the document, so that renders in Word looking pretty much like it started. Equally, if you insert the HTML as a snippet into a web-page, retaining Word CSS, the results are pretty faithful. In this case there is a culture clash between the underlying CSS of the webpage and Word's CSS, and some effort is required to make the best of a bad job. The Word HTML doesn't use UTF-8 either, which needs some handling.
HTMLTidy can be used to rip out Word mark-up, but some more massaging is required after this for good rendering within a webpage. I have worked on a product for 15 years which does this mixing of Word and web pages, and the results can be quite good if you fine tune the CSS.
We used Word because we are creating paper-versions, and importing text from reports written in Word, not because we couldn't find a dedicated HTML editor.
I would not recommend using Word to create tidy purist HTML. You wouldn't use a can-opener to open a bottle of wine, would you?
Life would be much simpler if:
a) Microsoft re-engineered the myriad options on its highly confusing 'bullets and number' feature,
b) HTML provided native, and properly featured, multi-level numbering support, instead of the after-thought approaches currently available. The weakness of HTML in this area can be seen in the flimsy numbering options available in Google Docs.
So much has improved with HTML 5, maybe we can hope that HTML 6 will help bridge the word processor / HTML editor divide.
如果您有 Windows PC,请使用 Notepad++ (http://notepad-plus-plus.org/) 粘贴代码,然后选择插件来格式化代码。
If you can get your hands on a Windows PC, use Notepad++ (http://notepad-plus-plus.org/) to paste the code, and then select the plugin to format the code.
使用所见即所得编辑器作为列表生成器。这将消除用户处理原始 CSS 的需要,但代价是让他们脱离 Microsoft Word 的舒适区。
Use a WYSIWYG editor as the list generator. This would remove the need for the users to deal with raw CSS, at the cost of taking them out of the comfort zone of Microsoft Word.
创造性地使用 Word 的“查找和替换”功能也可能有效。例如,使用记事本打开 HTML 文件,将文本复制并粘贴回 Word 文档中。打开查找和替换。如果 HTML 看起来像这样(例如),“这是文本的第一行”作为第一个行项目:
然后查找并替换为
\
并替换为任何内容。这可能需要一系列查找/替换。 HTML 标记很丰富,但其他一切都相同,至少是一致的。
Creative use of Word's Find and Replace might also work. For example, open the HTML file with NotePad, copy and paste the text back into a Word document. Open Find and Replace. If the HTML looks like this (for instance), with "This is the first line of text" being the first line item:
Then find and replace with Wildcards on for
\<p*line-height:115%'\
and replace with nothing. It may take a series of Finds/Replaces. The HTML markup is copious but everything else equal, it is consistent at least.如果您手头有 Dreamweaver,那么有一个神奇的“清理 Word HTML”按钮可以在这种情况下发挥作用。
If you've got dreamweaver handy, there is a magic "clean up word HTML" button that does wonders in this scenario.
MSWord 的智能程度取决于作者 - 只有在 MSWord 中创建的有序列表才会被转换为 HTML。这意味着列表必须按照 MSWord 构造进行格式化,而不是在页面上显示的方式。许多人会使用制表符和其他格式创建“看起来”有序或无序的列表,而不是使用 MSWord 列表功能。保存为 HTML 会尝试按编写方式保存,而不是按显示方式保存。
MSWord is only as smart as the author - an ordered list is coverted as such into HTML only if it was created in MSWord as such. This means that a list must be formatted as such per MSWord constructs and not how it is displayed on the page. Many people will create lists that "appear" to be ordered or undordered using tabs and other formatting and not using MSWord list functions. Saving to HTML tries to save it as it was written, not how it was displayed.
通过一些研究,将文档转换为 HTML 的方法似乎并不实用。 Word 在文件保存和单个文档的 HTML 生成方面变化太大,更不用说不同版本的 Word 之间的差异了。与 Wyatt 的建议类似,可能有一些方法可以清理代码,但没有一个是完美的。深入研究 API 可能会提供一种更轻松地解析此问题的方法,但事实可能会证明这在实践中同样令人费解。看来使用word作为列表生成工具根本不现实。
From doing some research, it appears that the approach of converting the document to HTML isn't practical. Word is simply too variable in its approach to file saving and HTML generation for a single document, not to mention differences among different versions of Word. Similar to Wyatt's suggestion, there may be ways to clean up the code, but none of them are perfect. Digging around the API may provide a way to parse this more easily, but it may turn out that this is in practice just as convoluted. It seems that using word as a list-generation tool simply is unrealistic.
使用此资源 http://word2cleanhtml.com/ 将 Word 文档转换为干净的 HTML。我认为非常有用。
Use this resource http://word2cleanhtml.com/ to convert Word documents to clean HTML. Very useful, in my opinion.
您可以将外部样式表链接到 Work 中“开发人员”选项卡下的 HTML 文档 ->文档模板 ->链接的 CSS。然后,您可以使用它来覆盖几乎所有由 Word 生成的样式。
信用:https:// superuser.com/questions/65107/how-to-apply-external-css-stylesheet-to-document-in-microsoft-word/65144#65144
注意:我使用 Word 2013 执行此操作,但它不是一个新功能。
You can link an external stylesheet to an HTML document in Work under the Developer tab -> Document Template -> Linked CSS. You can then use this to override almost any style generated by Word.
Credit: https://superuser.com/questions/65107/how-to-apply-external-css-stylesheet-to-document-in-microsoft-word/65144#65144
Note: I did this using Word 2013, but it is not a new feature.