使用 PHP 从 Microsoft Word 粘贴的字符串中去除样式注释
我有一个文本区域,用户通常将 Microsoft Word 中的内容粘贴到其中。我正在使用 Tiny MCE 进行格式化。问题是粘贴的字符串总是具有被注释掉的样式定义。我需要一种方法将这些注释的内容从字符串中删除。
以下是添加的注释的示例:
<!-- /* Font Definitions */ @font-face {font-family:"Courier New"; panose-1:2 7 3 9 2 2 5 2 4 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} @font-face {font-family:Wingdings; panose-1:5 2 1 2 1 8 4 8 7 8; mso-font-charset:2; -->
这只是其中的一小部分,通常有数百行。
无论如何,我使用 strip_tags 来删除不需要的 HTML 标签,并且我尝试使用以下 preg_replace 但样式注释始终存在:
$e_description = preg_replace('/<!--(.|\s)*?-->/', '',$_POST['description']);
关于如何删除这些垃圾的任何建议?
I have a text area that users typically paste content from Microsoft Word into. I am using Tiny MCE for formatting. The problem is they string that gets pasted always has style definitions that are commented out. I need a way to strip this commented stuff out of the string.
Here is an example of the comments that get added:
<!-- /* Font Definitions */ @font-face {font-family:"Courier New"; panose-1:2 7 3 9 2 2 5 2 4 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} @font-face {font-family:Wingdings; panose-1:5 2 1 2 1 8 4 8 7 8; mso-font-charset:2; -->
This is just a very small chunk of it, it ussually goes on for hundreds of lines.
anyway, im using strip_tags to get rid of unwanted HTML tags and i've tried using the follow preg_replace but the style comments are always there:
$e_description = preg_replace('/<!--(.|\s)*?-->/', '',$_POST['description']);
Any suggestions on how to get rid of this junk?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为什么不直接添加
ms
修饰符(m
是多行,s
是“dot-all”,其中.
> 匹配所有字符:这可能对您有用(尝试一下)...
Why not just add the
ms
modifiers (m
is multi-line,s
is "dot-all" where.
matches all characters:That MAY work for you (try it out)...