Javascript 中的 HTML 解析器
大家好,我现在尝试为我们的新粉丝页面解析一些 HTML 新闻。 因为该公司不提供 RSS 源。
我得到了一个新的 JS 文件,其中包含该文件
function getNews() {
y = 0;
news = new Array(7);
news_content = new Array(5);
for (var i = 0; i < news.length; i++)
{
var table = document.getElementById('news').contentWindow.getElementsByTagName('table')[y];
news_content[0] = table.rows[0].cells[0].getElementsByTagName('img')[0].src;
news_content[1] = table.rows[0].cells[1].getElementsByTagName('span')[0].innerHTML;
news_content[2] = table.rows[0].cells[2].getElementsByTagName('span')[0].innerHTML;
news_content[3] = table.rows[1].cells[0].getElementsByTagName('p')[0].innerHTML;
news_content[4] = table.rows[0].cells[0].getElementsByTagName('a')[0].href;
//alert(news[0] + "\n" + news[1] + "\n" + news[2] + "\n" + news[3] + "\n" + news[4]);
news[i] = news_content[0] + "\n" + news_content[1] + "\n" + news_content[2] + "\n" + news_content[3] + "\n" + news_content[4] + "\n";
y = y + 2;
}
alert (news[0] + "\n" + news[1] + "\n" + news[2] + "\n" + news[3] + "\n" + news[4])
}
和该 html
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Unbenanntes Dokument</title>
<script src="test.js"></script>
</head>
<body>
<a href="page.html" onclick="getNews()">Hier klicken</a>
<iframe id="news" src="http://www.aerosoft-shop.com/list_news.php?cat=fs&lang=de">
</body>
</html>
最后,如果我将源代码粘贴到 html 文件中,它可以工作,但是没有办法从外部页面解析吗?
Hi i at the momment try to parse some HTML news for our new fan page.
Caus the company do not offer a RSS Feed.
I got a new JS File with that included
function getNews() {
y = 0;
news = new Array(7);
news_content = new Array(5);
for (var i = 0; i < news.length; i++)
{
var table = document.getElementById('news').contentWindow.getElementsByTagName('table')[y];
news_content[0] = table.rows[0].cells[0].getElementsByTagName('img')[0].src;
news_content[1] = table.rows[0].cells[1].getElementsByTagName('span')[0].innerHTML;
news_content[2] = table.rows[0].cells[2].getElementsByTagName('span')[0].innerHTML;
news_content[3] = table.rows[1].cells[0].getElementsByTagName('p')[0].innerHTML;
news_content[4] = table.rows[0].cells[0].getElementsByTagName('a')[0].href;
//alert(news[0] + "\n" + news[1] + "\n" + news[2] + "\n" + news[3] + "\n" + news[4]);
news[i] = news_content[0] + "\n" + news_content[1] + "\n" + news_content[2] + "\n" + news_content[3] + "\n" + news_content[4] + "\n";
y = y + 2;
}
alert (news[0] + "\n" + news[1] + "\n" + news[2] + "\n" + news[3] + "\n" + news[4])
}
and that html
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Unbenanntes Dokument</title>
<script src="test.js"></script>
</head>
<body>
<a href="page.html" onclick="getNews()">Hier klicken</a>
<iframe id="news" src="http://www.aerosoft-shop.com/list_news.php?cat=fs&lang=de">
</body>
</html>
At last if i pase the source code into the html file it works but is there no way to parse from a external page?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您使用 Firebug 等工具调试代码,则会返回如下错误消息:
访问属性“getElementsByTagName”的权限被拒绝
在JavaScript中确实不可能访问指向不同域的IFrame,
甚至不是您域的子域(根据评论在这个答案上是可能的)。这里的问题是,网站所有者是否希望您抓取他的网站,或者至少同意您这样做,因为通常不欢迎从其他来源抓取(流量和版权问题)。
If you debug your code with a tool like Firebug, a errormessage would be returned like this:
Permission denied to access property 'getElementsByTagName'
It's indeed not possible in JavaScript to access a IFrame which points to a different domain,
not even subdomain of your domain(according to the comment on this answer it is possible).The question here is, if the site-owner wants you do crawl his site off or at least gave you an okay for it, because its generally not that welcomed to get crawled from other sources (traffic and maybe copyright problems).