当前位置：文江博客话题详情

从通过 XMLHTTPRequest 加载的网页中提取 URL 的最佳方法？

发布于 2024-12-08 10:12:59 字数 707 浏览 0 评论 0原文

问题概述

我有一个动态生成的网页 X，其中包含链接到网页 Y1、Y2< 的搜索结果/code>、Y3 等。
Y1 包含资源 URL R1，Y2 包含资源 URL R2，等等。
我想通过指向资源 R1、R2 等的链接动态增强页面 X。

可能的解决方案

我目前正在考虑使用 JavaScript 和 XMLHTTPRequest 从网页 Y1、Y2 等检索 HTML，然后使用用于提取 URL 的正则表达式。

Y1、Y2 等页面的 HTML 大小均在 30-100KB 范围内。

这听起来像是一个好计划吗？或者我是否可以更好地以 JSON 格式检索每个网页并从中提取资源 URL？如果 HTML 是最佳选择，对于搜索 30-100 KB 的文本，您是否有任何建议的优化/快捷方式？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风铃鹿 2024-12-15 10:12:59

您不想使用正则表达式来提取 URL。我建议使用 jQuery 执行 AJAX 请求，然后使用 jQuery 从服务器返回的 HTML 中解析和过滤 URL。

jQuery.ajax({
    url: "http://my.url.here",
    dataType: "html";
    ...
    success: function(data) {
        jQuery("a", data).each(function() {
            var $link = jQuery(this);
            ...
            ...
        });
    }
    ...
});

如果 jQuery 不是一个选项，您可以在收到响应时执行以下操作：

var html = XHR.responseText;
var div = document.createElement("div");
div.innerHTML = html;

//you can now search for nodes inside your div.
//The following gives you all the anchor tags
div.getElementsByTagName('a'); 
...

You don't want to use regex to extract the URL. I suggest using jQuery to perform the AJAX request, and then use jQuery to parse and filter out the URLs from the HTML that is returned from the server.

jQuery.ajax({
    url: "http://my.url.here",
    dataType: "html";
    ...
    success: function(data) {
        jQuery("a", data).each(function() {
            var $link = jQuery(this);
            ...
            ...
        });
    }
    ...
});

If jQuery is not an option, you can do something like this when you get your response back:

var html = XHR.responseText;
var div = document.createElement("div");
div.innerHTML = html;

//you can now search for nodes inside your div.
//The following gives you all the anchor tags
div.getElementsByTagName('a'); 
...

回复收藏 0 原文

~没有更多了~