如何摆脱复制和复制在 ajax html 编辑器中粘贴文本样式

发布于 2024-11-09 11:57:05 字数 1133 浏览 0 评论 0原文

我正在使用 ajax html 编辑器作为新闻描述页面。当我从word或互联网复制粘贴内容时,它会复制该文本、段落等的样式,这克服了html编辑器文本框的默认类样式,我想要的是摆脱如下所示的内联样式,但不是摆脱html有
我想将其保留在段落中

<span id="ContentPlaceHolder1_newsDetaildesc" class="newsDetails"><span style="font-family: arial, helvetica, sans; font-size: 11px; line-height: 14px; color: #000000; "><strong>Lorem Ipsum</strong>&nbsp;is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.<BR /> It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</span></span></p>

#left_column .newsDetails span[style] { 字体系列:Arial!重要; 字体大小:小!重要; 字体粗细:正常!重要; 颜色:#808080!重要; }

I am using ajax html editor for news description page. When I copy paste the stuff from word or internet , it copies the styling of that text , paragraph etc which overcomes the default class style of the html editor textbox, What I want is to get rid of inline style like below but not the html which have
i want to keep that into paragraph

<span id="ContentPlaceHolder1_newsDetaildesc" class="newsDetails"><span style="font-family: arial, helvetica, sans; font-size: 11px; line-height: 14px; color: #000000; "><strong>Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.<BR /> It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</span></span></p>

#left_column .newsDetails span[style]
{
font-family: Arial !important;
font-size: small !important;
font-weight: normal !important;
color: #808080 !important;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

痴情换悲伤 2024-11-16 11:57:05

首先,请注意,通过从 Word(或任何其他 HTML 源)粘贴而收到的 HTML 将根据源的不同而有很大差异。即使是不同版本的 Word 也会为您提供完全不同的输入。如果您设计的一些代码可以完美地处理您所拥有的 MS Word 版本的内容,那么它可能根本不适用于其他版本的 MS Word。

此外,某些来源会粘贴看起来像 HTML 的内容,但实际上是垃圾。当您将 HTML 内容粘贴到浏览器的富文本区域时,您的浏览器与 HTML 的生成方式无关。无论你的想象力如何,都不要指望它是有效的。此外,当 HTML 插入到富文本区域的 DOM 中时,您的浏览器将进一步处理 HTML。

由于潜在的输入变化很大,并且可接受的输出很难定义,因此很难为此类事情设计合适的滤波器。此外,您无法控制 MS Word 的未来版本将如何处理其 HTML 内容,因此您的代码将难以面向未来。

不过,请放心!如果世界上所有的问题都是简单的,那将是一个非常无聊的地方。有一些潜在的解决方案。 可以保留 HTML 的好的部分并丢弃坏的部分。

看起来您的基于 HTML 的 RTE 的工作方式与大多数 HTML 编辑器一样。具体来说,它有一个 iframe,并且在 iframe 内的文档上,它已将 designMode 设置为“on”。

paste 事件发生在该 iframe 内文档的 元素中时,您需要捕获该事件。我在这里非常具体,因为我必须:不要将其困在 iframe 上;不要将其困在 iframe 的窗口上;不要将其困在 iframe 的文档中。将其捕获在 iframe 内文档的 元素上。非常重要。

var iframe = your.rich.text.editor.getIframe(), // or whatever
    win = iframe.contentWindow,
    doc = win.document,
    body = doc.body;

// Use your favorite library to attach events. Don't actually do this
// yourself. But if you did do it yourself, this is how it would be done.
if (win.addEventListener) {
    body.addEventListener('paste', handlePaste, false);
} else {
    body.attachEvent("onpaste", handlePaste);
}

请注意,我的示例代码附加了一个名为 handlePaste 的函数。我们接下来会讨论这个问题。粘贴事件很有趣:有些浏览器在粘贴之前触发它,有些浏览器在粘贴之后触发它。您需要对其进行标准化,以便始终在粘贴后处理粘贴的内容。为此,请使用超时方法。

function handlePaste() {
    window.setTimeout(filterHTML, 50);
}

因此,粘贴事件后 50 毫秒,将调用 filterHTML 函数。这就是工作的核心:您需要过滤 HTML 并删除任何不需要的样式或元素。在这里你有很多事情要担心!

我个人见过 MSWord 粘贴在这些元素中:

  1. meta
  2. link
  3. style
  4. o:p (不同版本中的段落namespace)
  5. shapetype
  6. shape
  7. 注释,如
  8. font
  9. 当然还有MsoNormal 类。

filterHTML 函数应在适当的时候删除这些内容。您可能还希望删除您认为必要的其他项目。下面是一个 filterHTML 示例,它删除了上面列出的项目。

// Your favorite JavaScript library probably has these utility functions.
// Feel free to use them. I'm including them here so this example will
// be library-agnostic.
function collectionToArray(col) {
    var x, output = [];
    for (x = 0; x < col.length; x += 1) {
        output[x] = col[x];
    }
    return output;
}

// Another utility function probably covered by your favorite library.
function trimString(s) {
    return s.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
}

function filterHTML() {
    var iframe = your.rich.text.editor.getIframe(),
        win = iframe.contentWindow,
        doc = win.document,
        invalidClass = /(?:^| )msonormal(?:$| )/gi,
        cursor, nodes = [];

    // This is a depth-first, pre-order search of the document's body.
    // While searching, we want to remove invalid elements and comments.
    // We also want to remove invalid classNames.
    // We also want to remove font elements, but preserve their contents.

    nodes = collectionToArray(doc.body.childNodes);
    while (nodes.length) {
        cursor = nodes.shift();
        switch (cursor.nodeName.toLowerCase()) {

        // Remove these invalid elements.
        case 'meta':
        case 'link':
        case 'style':
        case 'o:p':
        case 'shapetype':
        case 'shape':
        case '#comment':
            cursor.parentNode.removeChild(cursor);
            break;

        // Remove font elements but preserve their contents.
        case 'font':

            // Make sure we scan these child nodes too!
            nodes.unshift.apply(
                nodes,
                collectionToArray(cursor.childNodes)
            );

            while (cursor.lastChild) {
                if (cursor.nextSibling) {
                    cursor.parentNode.insertBefore(
                        cursor.lastChild,
                        cursor.nextSibling
                    );
                } else {
                    cursor.parentNode.appendChild(cursor.lastChild);
                }
            }

            break;

        default:
            if (cursor.nodeType === 1) {

                // Remove all inline styles
                cursor.removeAttribute('style');

                // OR: remove a specific inline style
                cursor.style.fontFamily = '';

                // Remove invalid class names.
                invalidClass.lastIndex = 0;
                if (
                    cursor.className &&
                        invalidClass.test(cursor.className)
                ) {

                    cursor.className = trimString(
                        cursor.className.replace(invalidClass, '')
                    );

                    if (cursor.className === '') {
                        cursor.removeAttribute('class');
                    }
                }

                // Also scan child nodes of this node.
                nodes.unshift.apply(
                    nodes,
                    collectionToArray(cursor.childNodes)
                );
            }
        }
    }
}

您包含了一些想要过滤的示例 HTML,但没有包含您想要查看的示例输出。如果您更新问题以显示过滤后您希望示例的外观,我将尝试调整 filterHTML 函数以匹配。目前,请将此函数视为设计您自己的过滤器的起点。

请注意,此代码不会尝试区分粘贴的内容和粘贴之前存在的内容。它不需要这样做;它删除的内容无论出现在哪里都被视为无效。

另一种解决方案是使用正则表达式针对文档正文的 innerHTML 来过滤这些样式和内容。我已经走了这条路,我建议不要这样做,而赞成我在这里提出的解决方案。您通过粘贴收到的 HTML 变化很大,基于正则表达式的解析很快就会遇到严重问题。


编辑:

我想我现在明白了:您正在尝试删除内联样式属性本身,对吗?如果是这样,您可以在 filterHTML 函数中包含以下行来执行此操作:

cursor.removeAttribute('style');

或者,您可以针对特定的内联样式进行删除,如下所示:

cursor.style.fontFamily = '';

我已经更新了 filterHTML 函数以显示这些行的去向。

祝你好运,编码愉快!

First, be aware that the HTML you receive by pasting from Word (or any other HTML source) is going to vary wildly depending on the source. Even different versions of Word will give you radically different input. If you design some code that works perfectly on content from the version of MS Word that you have, it may not work at all for a different version of MS Word.

Also, some sources will paste content that looks like HTML, but is actually garbage. When you paste HTML content into a rich text area in your browser, your browser has nothing to do with how that HTML is generated. Do not expect it to be valid by any stretch of your imagination. In addition, your browser will further munge the HTML as it is inserted into the DOM of your rich text area.

Because the potential inputs vary so much, and because the acceptable outputs are difficult to define, it is hard to design a proper filter for this sort of thing. Further, you cannot control how future versions of MS Word will handle their HTML content, so your code will be difficult to future-proof.

However, take heart! If all the world's problems were easy ones, it would be a pretty boring place. There are some potential solutions. It is possible to keep the good parts of the HTML and discard the bad parts.

It looks like your HTML-based RTE works like most HTML editors out there. Specifically, it has an iframe, and on the document inside the iframe, it has set designMode to "on".

You'll want to trap the paste event when it occurs in the <body> element of the document inside that iframe. I was very specific here because I have to be: don't trap it on the iframe; don't trap it on the iframe's window; don't trap it on the iframe's document. Trap it on the <body> element of the document inside the iframe. Very important.

var iframe = your.rich.text.editor.getIframe(), // or whatever
    win = iframe.contentWindow,
    doc = win.document,
    body = doc.body;

// Use your favorite library to attach events. Don't actually do this
// yourself. But if you did do it yourself, this is how it would be done.
if (win.addEventListener) {
    body.addEventListener('paste', handlePaste, false);
} else {
    body.attachEvent("onpaste", handlePaste);
}

Notice my sample code has attached a function called handlePaste. We'll get to that next. The paste event is funny: some browsers fire it before the paste, some browsers fire it afterwards. You'll want to normalize that, so that you are always dealing with the pasted content after the paste. To do this, use a timeout method.

function handlePaste() {
    window.setTimeout(filterHTML, 50);
}

So, 50 milliseconds after a paste event, the filterHTML function will be called. This is the meat of the job: you need to filter the HTML and remove any undesireable styles or elements. You have a lot to worry about here!

I have personally seen MSWord paste in these elements:

  1. meta
  2. link
  3. style
  4. o:p (A paragraph in a different namespace)
  5. shapetype
  6. shape
  7. Comments, like <!-- comment -->.
  8. font
  9. And of course, the MsoNormal class.

The filterHTML function should remove these when appropriate. You may also wish to remove other items as you deem necessary. Here is an example filterHTML that removes the items I have listed above.

// Your favorite JavaScript library probably has these utility functions.
// Feel free to use them. I'm including them here so this example will
// be library-agnostic.
function collectionToArray(col) {
    var x, output = [];
    for (x = 0; x < col.length; x += 1) {
        output[x] = col[x];
    }
    return output;
}

// Another utility function probably covered by your favorite library.
function trimString(s) {
    return s.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
}

function filterHTML() {
    var iframe = your.rich.text.editor.getIframe(),
        win = iframe.contentWindow,
        doc = win.document,
        invalidClass = /(?:^| )msonormal(?:$| )/gi,
        cursor, nodes = [];

    // This is a depth-first, pre-order search of the document's body.
    // While searching, we want to remove invalid elements and comments.
    // We also want to remove invalid classNames.
    // We also want to remove font elements, but preserve their contents.

    nodes = collectionToArray(doc.body.childNodes);
    while (nodes.length) {
        cursor = nodes.shift();
        switch (cursor.nodeName.toLowerCase()) {

        // Remove these invalid elements.
        case 'meta':
        case 'link':
        case 'style':
        case 'o:p':
        case 'shapetype':
        case 'shape':
        case '#comment':
            cursor.parentNode.removeChild(cursor);
            break;

        // Remove font elements but preserve their contents.
        case 'font':

            // Make sure we scan these child nodes too!
            nodes.unshift.apply(
                nodes,
                collectionToArray(cursor.childNodes)
            );

            while (cursor.lastChild) {
                if (cursor.nextSibling) {
                    cursor.parentNode.insertBefore(
                        cursor.lastChild,
                        cursor.nextSibling
                    );
                } else {
                    cursor.parentNode.appendChild(cursor.lastChild);
                }
            }

            break;

        default:
            if (cursor.nodeType === 1) {

                // Remove all inline styles
                cursor.removeAttribute('style');

                // OR: remove a specific inline style
                cursor.style.fontFamily = '';

                // Remove invalid class names.
                invalidClass.lastIndex = 0;
                if (
                    cursor.className &&
                        invalidClass.test(cursor.className)
                ) {

                    cursor.className = trimString(
                        cursor.className.replace(invalidClass, '')
                    );

                    if (cursor.className === '') {
                        cursor.removeAttribute('class');
                    }
                }

                // Also scan child nodes of this node.
                nodes.unshift.apply(
                    nodes,
                    collectionToArray(cursor.childNodes)
                );
            }
        }
    }
}

You included some sample HTML that you wanted to filter, but you did not include a sample output that you would like to see. If you update your question to show what you want your sample to look like after filtering, I will try to adjust the filterHTML function to match. For the time being, please consider this function as a starting point for devising your own filters.

Note that this code makes no attempt to distinguish pasted content from content that existed prior to the paste. It does not need to do this; the things that it removes are considered invalid wherever they appear.

An alternative solution would be to filter these styles and contents using regular expressions against the innerHTML of the document's body. I have gone this route, and I advise against it in favor of the solution I present here. The HTML that you will receive by pasting will vary so much that regex-based parsing will quickly run into serious issues.


Edit:

I think I see now: you are trying to remove the inline style attributes themselves, right? If that is so, you can do this during the filterHTML function by including this line:

cursor.removeAttribute('style');

Or, you can target specific inline styles for removal like so:

cursor.style.fontFamily = '';

I've updated the filterHTML function to show where these lines would go.

Good luck and happy coding!

过去的过去 2024-11-16 11:57:05

这是一个潜在的解决方案,可以从 HTML 中删除文本。它的工作原理是首先将文本作为 HTML 复制到一个元素中(该元素可能应该隐藏,但在我的示例中显示以进行比较)。接下来,您将获取该元素的内部文本。然后您可以将该文本放入编辑器中您喜欢的任何位置。您必须捕获编辑器上的粘贴事件,运行此序列以获取文本,然后将该文本放置在编辑器中您喜欢的任何位置。

以下是如何执行此操作的示例:从 HTML 获取文本

Here is a potential solution that strips out the text from the HTML. It works by first copying the text as HTML into an element (which probably should be hidden but is shown for comparison in my example). Next, you get the innerText of that element. Then you can put that text into your editor wherever you like. You will have to capture the paste event on the editor, run this sequence to get the text, and then put that text wherever you like in your editor.

Here is a fiddle for an example of how to do this: Getting text from HTML

伊面 2024-11-16 11:57:05

如果您使用的是 Firefox,则可以安装此扩展: https://addons.mozilla.org/en-US/firefox/addon/extended-copy-menu-fix-vers/。它允许您从任何网站复制文本而无需格式化。

If you are using Firefox, you can install this extension: https://addons.mozilla.org/en-US/firefox/addon/extended-copy-menu-fix-vers/. It allows you to copy the text from any website without the formatting.

‘画卷フ 2024-11-16 11:57:05

一般来说,当最终用户支持 HTML 编辑时,我会选择利用许多可靠的客户端 HTML 编辑控件之一,这些控件已经内置了处理此类内容所需的功能。有许多商业版本,例如来自 Component Art,如以及一些很棒的免费/开源版本,例如 CKEditor

所有好的都具有可靠的 Word 粘贴支持,可以去除/修复过多的 CSS。我要么只利用一个(简单的方法),要么看看他们是如何做到的(困难的方法)。

Generally when supporting HTML editing by end users I have opted for leveraging one of a number of solid client-side HTML editing controls that already have the requisite functionality built in to handle stuff like this. There are a number of commercial versions, such as from Component Art, as well as some great free/open source versions, such as CKEditor.

All of the good ones have solid paste-from-Word support to strip out/fix this excessive CSS. I would either just leverage one (the easy way) or see how they do it (the hard way).

相对绾红妆 2024-11-16 11:57:05

我总是遇到这样的问题,这很有趣。我的方法非常简单,只需在 Windows 中打开记事本,将文本粘贴到记事本中,然后复制到 AJAX 文本编辑器中。它将删除您所有的文本样式。

:)

I always get this kind of problem, it is interesting one. Well the way I do is very simple, just open Notepad in windows and paste your text into Notepad and copy over to your AJAX text editor. It will strip all your text styling.

:)

情感失落者 2024-11-16 11:57:05

据我从您的问题中了解到,您使用的是所见即所得编辑器。当从其他网页或Word文档复制和粘贴文本时,你会得到一些带有内联样式等的丑陋的html。

我建议你根本不用费心去解决这个问题,因为交叉处理这个问题是一团糟。浏览器。如果你真的想修复它,我建议使用 TinyMCE,它可以得到你想要的确切行为。

您可以访问 http://tinymce.moxiecode.com/tryit/full.php 进行实际尝试 只需将一些文本复制到编辑器中,然后将其全部提交即可查看生成的 html。很干净。

在我看来,TinyMCE 可能是最好的所见即所得编辑器。因此,与其自己构建一些东西,不如使用它并根据您的具体需求进行定制。

From what I understand from your question, you are using a WYSIWYG editor. And when copying and pasting text from other web pages or word documents you get some ugly html with inline-styles etc.

I would suggest that you don't bother at all to fix this, because it's a mess to deal with this issue cross-browser. If you really want to fix it though I would recommend using TinyMCE which got this exact behavior that you want.

You can try it in action by visiting http://tinymce.moxiecode.com/tryit/full.php and just copy some text into the editor and then submit it all to see the generated html. It's clean.

TinyMCE is probably the best WYSIWYG editor that you'll find imo. So instead of building something on your own, just use it and customize it to your exact needs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文