如何应对“特殊”的情况? MS Word 添加的字符?

发布于 2024-07-19 05:47:22 字数 301 浏览 3 评论 0原文

我想知道你如何清理 MS Word 中的特殊字符,例如 m 和 n 破折号以及弯引号?

我经常发现自己从 Word 中复制客户端内容并粘贴到静态 HTML 页面中,但内容最终会出现奇怪的字符,因为特殊字符未转换为正确的 ACSII 代码,因此显示为乱码文本。 (对于这些基本网站,我使用 Dreamweaver。)

当客户将 Word 内容复制到纯文本字段(主要是文本区域)时,我看到了很多类似的问题。 当我将其放入 PDF(通过 PHP)或它显示在页面上时,它也有乱码。

你如何处理这个问题? 您使用清洁服务或程序吗?

I'm wondering how you clean the special characters that MS Word as, such as m- and n-dashes and curly quotes?

I often find myself copying content from clients from Word and pasting into a static HTML page, but the content ends up with weird characters because the special characters are not converted to their correct ACSII codes and therefore show up as garbled text. (For these basic websites, I'm using Dreamweaver.)

I have seen a lot of similar problems when clients copy content from Word into text only fields (mostly textareas). When I put this into a PDF (through PHP) or it shows up on the page it too has garbled text.

How do you deal with this? Is there a cleaning service or program you use?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

风铃鹿 2024-07-26 05:47:22

关于客户在文本区域中发布来自 Word 的复制/粘贴文本:

确保客户以任何特定编码向您发送文本的最可靠方法(因此希望从 CP-1252 [或其他编码进行任何转换] Word 使用]为您),就是将 accept-charset="..." 属性添加到您的所有

中。 例如:

<form ... accept-charset="UTF-8">
   ...
</form>

大多数浏览器都会遵守这一点,并确保任何“特定于单词”的字符在到达您的网站之前都转换为适当的字符集。

一旦无效文本到达您的网站,您几乎无法可靠地修复它,因此最好简单地检查所有输入在您使用的任何字符集中是否有效,并丢弃任何包含无效文本的请求。 即使对于accept-charset,这也是必要的,因为毫无疑问,有些客户端会忽略它。

With regards to clients posting copy/pasted text from Word in textareas:

The most reliable way to ensure that the client sends you text in any particular encoding (thus hopefully doing any conversion from CP-1252 [or whatever Word uses] for you), is to add the accept-charset="..." attribute to all your <form>s. E.g.:

<form ... accept-charset="UTF-8">
   ...
</form>

Most browsers will obey that and make sure any "Word-specific" characters are converted to the appropriate character set before it gets to your website.

Once invalid text gets to your website, there's very little you can do to fix it reliably, so it's best to simply check all input for being valid in whatever character set you use, and discard any requests that have invalid text. This is necessary even with accept-charset, because undoubtedly there are some clients out there that will ignore it.

青春有你 2024-07-26 05:47:22

您可以使用 preg_replace 函数调用来删除字符串中单词或其他字符的所有特殊字符

 preg_replace('/[^\x00-\x7F]+/', '', $str);

You can use preg_replace function call to remove all special characters of word or others from your string

 preg_replace('/[^\x00-\x7F]+/', '', $str);
幸福不弃 2024-07-26 05:47:22

注意在各处指定编码并使用 UTF-8,那么那些“特殊”字符应该可以正常存在。 但是,一旦它们经历了无法代表它们的编码,它最初是哪个字符的信息就会丢失,因此无法修复(除了一些特定但可能非常常见的情况,例如在 Cp1252 和 ISO 之间切换) 8859-1)。

Pay attention to specify an encoding everywhere and use UTF-8, then those "special" characters should survive just fine. But once they've gone through an encoding that can't represent them, the information which character it was originally is lost, so it can't be repaired (except for some specific though probably very common cases like switching between Cp1252 and ISO-8859-1).

稀香 2024-07-26 05:47:22

确保 Word 配置为使用 UTF-8 作为“另存为..”HTML。

这是在“选项”>“选项”中 单词选项> 高级> 网页选项> 编码

Make sure Word is configured to use UTF-8 for "Save As.." HTML.

This is in Options > Word Options > Advanced > Web Options > Encoding

傻比既视感 2024-07-26 05:47:22

您可以尝试Demoroniser

You might try the Demoroniser.

最美不过初阳 2024-07-26 05:47:22

如果它是一个只有文本的 Word 文件(即:没有图形、表格等),您可以尝试在 Word 中另存为 HTML,将生成的 HTML 复制/粘贴到 Dreamweaver 中的文档中,然后使用 Dreamweaver 的“清理 Word” HTML”功能(在命令菜单下)。

作为替代方案,您可以尝试修复我的 HTML,尽管我个人没有尝试过使用 Word 文本,因此结果可能会各不相同。

If it's a Word file that's just text (i.e.: no graphics, tables, etc.), you might try Saving As HTML from within Word, copy/pasting the resulting HTML into your document in Dreamweaver, and then use Dreamweaver's "Clean Up Word HTML" function (under the Command menu).

As an alternative, you can try fix my HTML, though I've not personally tried it with Word text, so results may vary.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文