如何在WebBrowser控件中获取渲染的html(由Javascript处理)?

发布于 2024-12-03 06:23:36 字数 778 浏览 1 评论 0原文

我有一个 ASP.NET 页面和一些自定义类,用于获取指定的网页并返回该页面正文。

protected String GetHtml()
{
    Thread thread = new Thread(new ThreadStart(GetHtmlWorker));
    thread.SetApartmentState(ApartmentState.STA);
    thread.Start();
    thread.Join();
    return docHtml;
}

protected void GetHtmlWorker()
{
    using (WebBrowser browser = new WebBrowser())
    {
        browser.ScriptErrorsSuppressed = true;
        browser.Navigate(_url);
        // Wait for control to load page
        while (browser.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();
        docHtml = browser.DocumentText;
    }
}

但我需要的是获取 DOM HTML 而不是页面源,因为我通过 jQueryDOM 进行了一些额外的操作。

I have an ASP.NET page and some custom class that fetches a specified webpage and returns that page body back.

protected String GetHtml()
{
    Thread thread = new Thread(new ThreadStart(GetHtmlWorker));
    thread.SetApartmentState(ApartmentState.STA);
    thread.Start();
    thread.Join();
    return docHtml;
}

protected void GetHtmlWorker()
{
    using (WebBrowser browser = new WebBrowser())
    {
        browser.ScriptErrorsSuppressed = true;
        browser.Navigate(_url);
        // Wait for control to load page
        while (browser.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();
        docHtml = browser.DocumentText;
    }
}

But what I need is to get DOM HTML instead of the page source because I do some extra operations over DOM by jQuery.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

风月客 2024-12-10 06:23:36

下面是我发现的一种在运行 javascript 后获取呈现的 HTML(DOM) 的解决方案:

将名为 webBrowser1 的 WebBrowser 控件放在类 Form1 的窗体上。

[Form1.cs[Design]]

然后代码使用:

[Form1.cs]

using System;
using System.Runtime.InteropServices;
using System.Windows.Forms;

namespace WebBrowserTest
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
            this.webBrowser1.ObjectForScripting = new MyScript();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            webBrowser1.Navigate("http://localhost:6489/Default.aspx");
        }

        private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
        }

        [ComVisible(true)]
        public class MyScript
        {
            public void CallServerSideCode()
            {
                var doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
            }
        }
    }
}

更改 webBrowser1.Navigate("< a href="http://localhost:6489/Default.aspx" rel="noreferrer">http://localhost:6489/Default.aspx") Form1_Load 中的参数您希望获得经过 javascript 处理后的 DOM 的页面。

您可以在 CallServerSideCode() 方法中访问修改后的 DOM,例如:

doc.GetElementById("myDataTable");

或者您可以像这样访问渲染的 HTML:

var renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;

Here is one solution I found to get to the rendered HTML(DOM) after javascript was run:

Place a WebBrowser control named webBrowser1 on the Form of class Form1.

[Form1.cs[Design]]

Then for code use:

[Form1.cs]

using System;
using System.Runtime.InteropServices;
using System.Windows.Forms;

namespace WebBrowserTest
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
            this.webBrowser1.ObjectForScripting = new MyScript();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            webBrowser1.Navigate("http://localhost:6489/Default.aspx");
        }

        private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
        }

        [ComVisible(true)]
        public class MyScript
        {
            public void CallServerSideCode()
            {
                var doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
            }
        }
    }
}

Change the webBrowser1.Navigate("http://localhost:6489/Default.aspx") parameter in Form1_Load to the page whose DOM after being processed by javascript you wish to obtain.

You can access the modified DOM in the CallServerSideCode() method, for example:

doc.GetElementById("myDataTable");

Or you can access the rendered HTML like this:

var renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
满栀 2024-12-10 06:23:36

正如 George 在其中一条评论中所说,理论上您只需使用以下命令即可获取 webBrowser1_DocumentCompleted 中的 DOM:

webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;

As George said in one of the comments, in theory you can just get the DOM in webBrowser1_DocumentCompleted by just using:

webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
戴着白色围巾的女孩 2024-12-10 06:23:36

首先介绍一点背景。我一直在尝试从网页上抓取信息。该网页的内容是动态的。我所说的动态是指当您向下滚动到页面底部时,网页会加载更多信息。当您滚动到页面底部时,HTML 内容会发生变化。不幸的是,Web 浏览器对象不会自动更新此信息。它仍然具有首次通过 webbrowser.navigate 函数加载的原始文档。更新的信息可用于 HTMLElementCollection

以下代码对我不起作用。

webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml

我将上述声明分解如下,

    Dim eCollections As HtmlElementCollection
    Dim strDoc As String
    eCollections = WB.Document.GetElementsByTagName("HTML")
    strDoc = eCollections(0).OuterHtml

效果非常好。希望这也能帮助别人。

First a little background. I have been trying to scrape information from a web page. The content of this webpage is dynamic. What I mean by dynamic is that the web page loads more information as you scroll down to the bottom of the page. The HTML content changes as you scroll to the bottom of the page. Unfortunately the Web Browser Object does not update this information automatically. It still has the original document that it first loaded via the webbrowser.navigate function. The updated information is available to the HTMLElementCollection.

The following code did not work for me.

webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml

I broke up the above statement as follows

    Dim eCollections As HtmlElementCollection
    Dim strDoc As String
    eCollections = WB.Document.GetElementsByTagName("HTML")
    strDoc = eCollections(0).OuterHtml

Worked like a charm. Hope this helps someone too.

甜宝宝 2024-12-10 06:23:36

另一种方法是在表单上设置一个计时器,然后当计时器到达时,页面将重新呈现,您可以解析页面。

Another way would be to set a timer on the form, then when the timer hits, the page will have re-rendered and you can parse the page.

蓝眸 2024-12-10 06:23:36

你可以获得

webBrowser1.Document.Body.OuterHtml

You can get

webBrowser1.Document.Body.OuterHtml

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文