javascript 和 DOM 中的 URL 解析

发布于 2024-11-09 23:49:03 字数 1376 浏览 8 评论 0原文

我正在编写一个支持聊天应用程序，我希望在其中解析文本以获取网址。我找到了类似问题的答案，但没有找到以下问题的答案。

我所拥有的

function ReplaceUrlToAnchors(text) {
    var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/|www.)
              [-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
    return text.replace(exp,"<a href='$1' target='_blank'>$1</a>"); 
}

该模式是我在互联网上找到的模式的修改版本。其中包括www.在第一个令牌中，因为并非所有网址都以 protocol:// 开头，但是，当 www.google.com 替换为

<a href='www.google.com' target='_blank'>www.google.com</a>

which 时，会拉起 MySite.com/webchat/wwww.google.com ，我得到一个 404

，即我的第一个问题，我的第二个问题是......

在我的脚本中生成消息到日志中，我被迫以一种黑客的方式来做：

var last = 0;
function UpdateChatWindow(msgArray) {

    var chat = $get("MessageLog");
    for (var i = 0; i < msgArray.length; i++) {
        var element = document.createElement("div");
        var linkified = ReplaceUrlToAnchors(msgArray[i]);
        element.setAttribute("id", last.toString());
        element.innerHTML = linkified;
        chat.appendChild(element);
        last = last + 1;
    }
}

为了获得“链接化”字符串以正确呈现HTML，我必须使用非标准 .innerHTML元素的属性。我更喜欢一种可以将字符串解析为标记（文本标记和锚标记）的方法，然后调用 createTextNode 或 createElement("a") 并将它们与 DOM 缝合在一起。

所以问题 1 是我应该如何解析 www.site.com，甚至 site.com？问题 2 是我如何仅使用 DOM 来做到这一点？

原文

I am writing a support chat application where I want text to be parsed for urls. I have found answers for similar questions but nothing for the following.

what i have

function ReplaceUrlToAnchors(text) {
    var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/|www.)
              [-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
    return text.replace(exp,"<a href='$1' target='_blank'>$1</a>"); 
}

that pattern is a modified version of one i found on the internet. It includes www. in the first token, because not all urls start with protocol:// However, when www.google.com is replaced with

<a href='www.google.com' target='_blank'>www.google.com</a>

which pulls up MySite.com/webchat/wwww.google.com and I get a 404

that is my first problem, my second is...

in my script for generating messages to the log, I am forced to do it a hacky way:

var last = 0;
function UpdateChatWindow(msgArray) {

    var chat = $get("MessageLog");
    for (var i = 0; i < msgArray.length; i++) {
        var element = document.createElement("div");
        var linkified = ReplaceUrlToAnchors(msgArray[i]);
        element.setAttribute("id", last.toString());
        element.innerHTML = linkified;
        chat.appendChild(element);
        last = last + 1;
    }
}

To get the "linkified" string to render HTML out correctly I have to use the non-standard .innerHTML attribute of element. I would prefer a way were i could parse the string as tokens - text tokens and anchor tokens - and call either createTextNode or createElement("a") and stitch them together with DOM.

so question 1 is how should I go about www.site.com parsing, or even site.com?
and question 2 is how would could I do this using only DOM?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

昔日梦未散 2024-11-16 23:49:03

您可以做的另一件事是：

function ReplaceUrlToAnchors(text) {
    var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/|www.)
              [-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
    return text.replace(exp, function(_, url) {
      return '<a href="' +
        (/^www\./.test(url) ? "http://" + url : url) +
        'target="_blank">' +
        url +
        '</a>';
    }); 
}

这有点像您的解决方案，但它会检查传递给“.replace()”的回调中的“www”URL。

请注意，您不会选择“stackoverflow.com”或“newegg.com”或类似的内容，我知道这可能是不可避免的（考虑到您会收到误报，甚至是可取的）。

Another thing you could do is this:

function ReplaceUrlToAnchors(text) {
    var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/|www.)
              [-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
    return text.replace(exp, function(_, url) {
      return '<a href="' +
        (/^www\./.test(url) ? "http://" + url : url) +
        'target="_blank">' +
        url +
        '</a>';
    }); 
}

That is kind-of like your solution, but it does the check for "www" URLs in that callback passed in to ".replace()".

Note that you won't be picking up "stackoverflow.com" or "newegg.com" or anything like that, which I understand may be unavoidable (and even desirable, given the false positives you'd pick up).

回复收藏 0 原文

乖乖哒 2024-11-16 23:49:03

这是我想出的，也许有人有更好的东西？

function replaceUrlToAnchors(text) {
    var naked = /(\b(www.)[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|](.com|.net|.org|.co.uk|.ca|.))/ig;
    text = text.replace(naked, "http://$1");

    var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/)([-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]))/ig;
    return text.replace(exp,"<a href='$1' target='_blank'>$3</a>"); 
}

第一个正则表达式会将 www.google.com 替换为 http://www.google.com 并且足以满足我在做什么。但是，我不会将其标记为答案，因为我也想将 (www.) 设为可选，但是当我这样做时 (www.) 呢？它将每个单词替换为 http://word/

Here is what I came up with, perhaps someone has something better?

function replaceUrlToAnchors(text) {
    var naked = /(\b(www.)[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|](.com|.net|.org|.co.uk|.ca|.))/ig;
    text = text.replace(naked, "http://$1");

    var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/)([-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]))/ig;
    return text.replace(exp,"<a href='$1' target='_blank'>$3</a>"); 
}

the first regex will replace www.google.com with http://www.google.com and is good enough for what I am doing. However, I will hold off marking this as the answer because I would also like to make (www.) optional but when I do (www.)? it replaces every word with http://word/

回复收藏 0 原文

~没有更多了~