c＃cefsharp如何在网站上做正确的JavaScript操作序列

发布于 2025-01-22 04:49:31 字数 5000 浏览 1 评论 0原文

这些动作序列可与螺纹一起工作。Sleep，在1秒内，在2秒内某个地方。我认为使用thread.sleep/task.delay不好。因为可以在不同的计算机上进行不同的执行。如何在不使用thread.sleep的情况下执行这些序列？还是可以使用thread.sleep/task.delay？

        private async void ButtonFind_Click(object sender, EventArgs e)
        {

            //Action1
            string jsScript1 = "document.getElementById('story').value=" + '\'' + textFind.Text + '\'';
            await chrome.EvaluateScriptAsync(jsScript1);

            //Action2
            string jsScript2 = "document.querySelector('body > div.wrapper > div.header > div.header44 > div.search_panel > span > form > button').click();";
            await chrome.EvaluateScriptAsync(jsScript2);

            //Action3
            Thread.Sleep(1000); //it is necessary to set exactly 1 seconds
            string jsScript3 = "document.getElementsByTagName('a')[2].click();";
            await chrome.EvaluateScriptAsync(jsScript3);

            //Action4
            Thread.Sleep(2000); //it is necessary to set exactly 2 seconds
            string jsScript4 = "document.querySelector('#dle-content > div.section > ul > li:nth-child(3)').click();";
            await chrome.EvaluateScriptAsync(jsScript4);
        }

我尝试使用任务期望，但是我也没有帮助我，

...
var task4 = chrome.EvaluateScriptAsync(jsScript4);
task4.Wait();

在第三次操作中没有帮助，

            string jsScript4 = @"
                  if( document.readyState !== 'loading' ) {
                      myInitCode();
                  } else {
                      document.addEventListener('DOMContentLoaded', function () {
                          myInitCode();
                      });
                  }

                  function myInitCode() {
                   var a = document.querySelector('#dle-content > div.section > ul > li:nth-child(3)').click();
                  return a;
                  }
              ";
            
            chrome.EvaluateScriptAsync(jsScript4);

我也尝试使用DOM渲染期望，这对我的添加（21.04.2022）

而不是使用thread.sleep.sleep，我使用” “ 环形在这里，算法是正确的，但是由于某些原因，按下应用程序按钮后，

                bool test = false;
                while(test == false)
                {
                    string myScript = @"
                        (function(){
                            var x = document.getElementsByTagName('a')[1].outerText;
                            return x;
                        })();
                        ";
                    var task = chrome.EvaluateScriptAsync(myScript);
                    task.ContinueWith(x =>
                    {
                        if (!x.IsFaulted)
                        {
                            var response = x.Result;
                            if (response.Success == true)
                            {
                                var final = response.Result;
                                if (final.ToString() == textFind.Text)
                                {
                                    MessageBox.Show("You found the link");
                                    test = true;
                                }
                                else
                                {
                                    MessageBox.Show("You do not found the link");
                                }
                            }
                        }
                    }, TaskScheduler.FromCurrentSynchronizationContext());
                }

我的添加（23.04.2022）（23.04.2022）

string jsScript1 = "document.getElementById('story').value=" + '\'' + textFind.Text + '\'' + ";"
                + @"
    Promise.resolve()
  .then(() => document.querySelector('body > div.wrapper > div.header > div.header44 > div.search_panel > span > form > button').click())
  .then(() =>  { var target = document.body;
            const config = { 
                childList: true, 
                attributes: true, 
                characterData: true, 
                subtree: true, 
                attributeFilter: ['id'], 
                attributeOldValue: true, 
                characterDataOldValue: true 
            }
            const callback = function(mutations) 
            {
                document.addEventListener('DOMContentLoaded', function(){                    
                if(document.getElementsByTagName('a')[1].innerText=='Troy')
                    {
                        alert('I got that link');
                    }
                }, true);
            };
            const observer = new MutationObserver(callback);
            observer.observe(target, config)});
            ";

            var task1 = chrome.EvaluateScriptAsPromiseAsync(jsScript1);
            task1.Wait();

使用evalueScriptAcriptasspromisync使用promise的mutationobserver悬挂了来评估承诺。也没有帮助。我得出的结论是，在单击搜索按钮或转到另一个页面后，JavaScript不会保存代码。如何保存JavaScript代码/请求并在单击搜索按钮或转到另一个页面后继续进行？

原文

These sequences of actions work with Thread.Sleep, somewhere in 1 second, somewhere in 2 seconds. I think using Thread.Sleep/Task.Delay is not good. Because it can be performed differently on different computers. How do I execute these sequences without using Thread.Sleep?
Or it is OK to using Thread.Sleep/Task.Delay?

        private async void ButtonFind_Click(object sender, EventArgs e)
        {

            //Action1
            string jsScript1 = "document.getElementById('story').value=" + '\'' + textFind.Text + '\'';
            await chrome.EvaluateScriptAsync(jsScript1);

            //Action2
            string jsScript2 = "document.querySelector('body > div.wrapper > div.header > div.header44 > div.search_panel > span > form > button').click();";
            await chrome.EvaluateScriptAsync(jsScript2);

            //Action3
            Thread.Sleep(1000); //it is necessary to set exactly 1 seconds
            string jsScript3 = "document.getElementsByTagName('a')[2].click();";
            await chrome.EvaluateScriptAsync(jsScript3);

            //Action4
            Thread.Sleep(2000); //it is necessary to set exactly 2 seconds
            string jsScript4 = "document.querySelector('#dle-content > div.section > ul > li:nth-child(3)').click();";
            await chrome.EvaluateScriptAsync(jsScript4);
        }

I tried to use task expectations, but it didn't help me

...
var task4 = chrome.EvaluateScriptAsync(jsScript4);
task4.Wait();

I also tried to use DOM rendering expectations, which didn't help either

            string jsScript4 = @"
                  if( document.readyState !== 'loading' ) {
                      myInitCode();
                  } else {
                      document.addEventListener('DOMContentLoaded', function () {
                          myInitCode();
                      });
                  }

                  function myInitCode() {
                   var a = document.querySelector('#dle-content > div.section > ul > li:nth-child(3)').click();
                  return a;
                  }
              ";
            
            chrome.EvaluateScriptAsync(jsScript4);

My addition (21.04.2022)

In third action instead of using Thread.Sleep, im using "While" loop
Here the algorithm is correct, but for some reason, after pressing the application button, the application is hanging

                bool test = false;
                while(test == false)
                {
                    string myScript = @"
                        (function(){
                            var x = document.getElementsByTagName('a')[1].outerText;
                            return x;
                        })();
                        ";
                    var task = chrome.EvaluateScriptAsync(myScript);
                    task.ContinueWith(x =>
                    {
                        if (!x.IsFaulted)
                        {
                            var response = x.Result;
                            if (response.Success == true)
                            {
                                var final = response.Result;
                                if (final.ToString() == textFind.Text)
                                {
                                    MessageBox.Show("You found the link");
                                    test = true;
                                }
                                else
                                {
                                    MessageBox.Show("You do not found the link");
                                }
                            }
                        }
                    }, TaskScheduler.FromCurrentSynchronizationContext());
                }

My addition (23.04.2022)

string jsScript1 = "document.getElementById('story').value=" + '\'' + textFind.Text + '\'' + ";"
                + @"
    Promise.resolve()
  .then(() => document.querySelector('body > div.wrapper > div.header > div.header44 > div.search_panel > span > form > button').click())
  .then(() =>  { var target = document.body;
            const config = { 
                childList: true, 
                attributes: true, 
                characterData: true, 
                subtree: true, 
                attributeFilter: ['id'], 
                attributeOldValue: true, 
                characterDataOldValue: true 
            }
            const callback = function(mutations) 
            {
                document.addEventListener('DOMContentLoaded', function(){                    
                if(document.getElementsByTagName('a')[1].innerText=='Troy')
                    {
                        alert('I got that link');
                    }
                }, true);
            };
            const observer = new MutationObserver(callback);
            observer.observe(target, config)});
            ";

            var task1 = chrome.EvaluateScriptAsPromiseAsync(jsScript1);
            task1.Wait();

Using a MutationObserver wrapped in a promise, using EvaluateScriptAsPromiseAsync to evaluate promise. Also didnt help.
I came to the conclusion that JavaScript does not save the code when clicking on a search button or after going to another page. How do I save the JavaScript code/request and continue it after clicking on a search button or after going to another page?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情何以堪。 2025-01-29 04:49:31

由于您的JavaScript导致导航，因此需要等待新页面加载。

您可以使用以下内容等待页面加载。

// create a static class for the extension method 
public static Task<LoadUrlAsyncResponse> WaitForLoadAsync(this IWebBrowser browser)
{
    var tcs = new TaskCompletionSource<LoadUrlAsyncResponse>(TaskCreationOptions.RunContinuationsAsynchronously);

    EventHandler<LoadErrorEventArgs> loadErrorHandler = null;
    EventHandler<LoadingStateChangedEventArgs> loadingStateChangeHandler = null;

    loadErrorHandler = (sender, args) =>
    {
        //Actions that trigger a download will raise an aborted error.
        //Generally speaking Aborted is safe to ignore
        if (args.ErrorCode == CefErrorCode.Aborted)
        {
            return;
        }

        //If LoadError was called then we'll remove both our handlers
        //as we won't need to capture LoadingStateChanged, we know there
        //was an error
        browser.LoadError -= loadErrorHandler;
        browser.LoadingStateChanged -= loadingStateChangeHandler;

        tcs.TrySetResult(new LoadUrlAsyncResponse(args.ErrorCode, -1));
    };

    loadingStateChangeHandler = (sender, args) =>
    {
        //Wait for while page to finish loading not just the first frame
        if (!args.IsLoading)
        {
            browser.LoadError -= loadErrorHandler;
            browser.LoadingStateChanged -= loadingStateChangeHandler;
            var host = args.Browser.GetHost();

            var navEntry = host?.GetVisibleNavigationEntry();

            int statusCode = navEntry?.HttpStatusCode ?? -1;

            //By default 0 is some sort of error, we map that to -1
            //so that it's clearer that something failed.
            if (statusCode == 0)
            {
                statusCode = -1;
            }

            tcs.TrySetResult(new LoadUrlAsyncResponse(statusCode == -1 ? CefErrorCode.Failed : CefErrorCode.None, statusCode));
        }
    };

    browser.LoadingStateChanged += loadingStateChangeHandler;
    browser.LoadError += loadErrorHandler;

    return tcs.Task;
}

// usage example 
private async void ButtonFind_Click(object sender, EventArgs e)
{

    //Action1
    string jsScript1 = "document.getElementById('story').value=" + '\'' + textFind.Text + '\'';
    await chrome.EvaluateScriptAsync(jsScript1);

    //Action2
    string jsScript2 = "document.querySelector('body > div.wrapper > div.header > div.header44 > div.search_panel > span > form > button').click();";
   
    await Task.WhenAll(chrome.WaitForLoadAsync(), 
      chrome.EvaluateScriptAsync(jsScript2));

    //Action3
    string jsScript3 = "document.getElementsByTagName('a')[2].click();";
    await Task.WhenAll(chrome.WaitForLoadAsync(), 
      chrome.EvaluateScriptAsync(jsScript3));


    //Action4
    string jsScript4 = "document.querySelector('#dle-content > div.section > ul > li:nth-child(3)').click();";
    await chrome.EvaluateScriptAsync(jsScript4);
}

As your JavaScript causes a navigation you need to wait for the new page to load.

You can use something like the following to wait for the page load.

// create a static class for the extension method 
public static Task<LoadUrlAsyncResponse> WaitForLoadAsync(this IWebBrowser browser)
{
    var tcs = new TaskCompletionSource<LoadUrlAsyncResponse>(TaskCreationOptions.RunContinuationsAsynchronously);

    EventHandler<LoadErrorEventArgs> loadErrorHandler = null;
    EventHandler<LoadingStateChangedEventArgs> loadingStateChangeHandler = null;

    loadErrorHandler = (sender, args) =>
    {
        //Actions that trigger a download will raise an aborted error.
        //Generally speaking Aborted is safe to ignore
        if (args.ErrorCode == CefErrorCode.Aborted)
        {
            return;
        }

        //If LoadError was called then we'll remove both our handlers
        //as we won't need to capture LoadingStateChanged, we know there
        //was an error
        browser.LoadError -= loadErrorHandler;
        browser.LoadingStateChanged -= loadingStateChangeHandler;

        tcs.TrySetResult(new LoadUrlAsyncResponse(args.ErrorCode, -1));
    };

    loadingStateChangeHandler = (sender, args) =>
    {
        //Wait for while page to finish loading not just the first frame
        if (!args.IsLoading)
        {
            browser.LoadError -= loadErrorHandler;
            browser.LoadingStateChanged -= loadingStateChangeHandler;
            var host = args.Browser.GetHost();

            var navEntry = host?.GetVisibleNavigationEntry();

            int statusCode = navEntry?.HttpStatusCode ?? -1;

            //By default 0 is some sort of error, we map that to -1
            //so that it's clearer that something failed.
            if (statusCode == 0)
            {
                statusCode = -1;
            }

            tcs.TrySetResult(new LoadUrlAsyncResponse(statusCode == -1 ? CefErrorCode.Failed : CefErrorCode.None, statusCode));
        }
    };

    browser.LoadingStateChanged += loadingStateChangeHandler;
    browser.LoadError += loadErrorHandler;

    return tcs.Task;
}

// usage example 
private async void ButtonFind_Click(object sender, EventArgs e)
{

    //Action1
    string jsScript1 = "document.getElementById('story').value=" + '\'' + textFind.Text + '\'';
    await chrome.EvaluateScriptAsync(jsScript1);

    //Action2
    string jsScript2 = "document.querySelector('body > div.wrapper > div.header > div.header44 > div.search_panel > span > form > button').click();";
   
    await Task.WhenAll(chrome.WaitForLoadAsync(), 
      chrome.EvaluateScriptAsync(jsScript2));

    //Action3
    string jsScript3 = "document.getElementsByTagName('a')[2].click();";
    await Task.WhenAll(chrome.WaitForLoadAsync(), 
      chrome.EvaluateScriptAsync(jsScript3));


    //Action4
    string jsScript4 = "document.querySelector('#dle-content > div.section > ul > li:nth-child(3)').click();";
    await chrome.EvaluateScriptAsync(jsScript4);
}

回复收藏 0 原文

鹿! 2025-01-29 04:49:31

您永远不得不使用睡眠，因为计算机之间的时间变化，即使在同一台计算机中，网页也可能有所不同。

我在刮擦和IMO方面做很多工作，最适合管理此操作的是从JavaScript方面工作。您注入/运行JavaScript以填充控件，单击按钮...

使用此焦点，问题是导航使您失去状态。当您导航到其他页面时，您的JavaScript从头开始。我将这些共享数据旋转以通过绑定对象并注入JavaScript在JavaScript和C＃之间持续存在。

例如，您可以使用JavaScript代码运行操作1、2和3。在单击按钮之前，您可以使用绑定对象告诉您要进入第二页的C＃代码。

加载第二页时，您将运行第二页的JavaScript（您知道步骤，并且可以为2页注入JavaScript代码）。

在所有情况下，您的JavaScript代码都必须具有一些机制需要等待。例如，设置一个计时器等待，直到出现控件。这样，您可以在不等待页面的情况下运行JavaScript（有时很难管理此事件）。

更新

我的刮擦库是巨大的。我将揭露您需要做这项工作的作品，但您需要自己组装。

我们创建一个boundObject类：

public class BoundObject
{
    public BoundObject(IWebBrowser browser)
    {
        this.Browser = browser;
    }

    public void OnJavaScriptMessage(string message)
    {
        this.Browser.OnJavaScriptMessage(message);
    }
}

iwebbrowser是我自定义浏览器的接口，是一个可以管理我需要的包装器。例如，创建一个浏览器类，例如CustomBrowser，例如实现此接口。

创建一种方法以确保您的界对象正常工作：

public void SetBoundObject()
{
    // To get events in C# from JavaScript
    try
    {
        var boundObject = new BoundObject();
        this._browserInternal.JavascriptObjectRepository.Register(
        "bound", boundObject, false, BindingOptions.DefaultBinder);

       this.BoundObject = boundObject;
   }
   catch (ArgumentException ex)
   {
       if (!ex.ParamName.Identical("bound"))
       {
           throw;
       }
   }
}

_browserInternal是CEFSHARP浏览器。导航时，必须在每个页面加载上运行该方法。这样做，您在JavaScript端具有一个窗口。带有OnjavaScriptMessage函数的对象。然后，您可以在这样的JavaScript中定义一个函数：

function sendMessage(msg) {
    var json = JSON.stringify(msg);
    window.bound.onJavaScriptMessage(json);
    return this;
};

您现在可以将任何对象发送到C＃应用程序，并在OnjavaScriptMessage方法上在CustomBrowser中进行管理。在该方法中，我管理我的自定义消息协议，例如插座环境中的典型邮件或Windows消息系统中的典型消息协议，并生成了我在“继承CustomBrowser”类中实现的Onmessage。

使用CEFSHARP浏览器的ExecuteScriptAsync将信息发送到JavaScript是微不足道的。

进一步

当我从事强烈的刮擦工作时，。我创建了一些带有类的脚本来管理整个网络进行废料。例如，我创建类来进行登录，导航到不同的部分，填写表格……就像我是网站的所有者一样。然后，当页面加载时，我可以注入脚本，并且可以在远程网站中使用自己的课程，制作刮擦...碎片。

我的脚本是嵌入式资源，因此可以进入我的最终可执行文件中。在调试中，我从磁盘中读取它们，以允许编辑+重新加载+测试，直到我的脚本正常运行为止。使用DevTools，您可以在控制台中尝试，直到获得所需的源为止。然后，您将添加到JavaScripts课程中并重新加载。

您可以使用ExecuteCriptAsync添加简单的JavaScript，但是在大型文件中出现问题时出现问题...

因此您需要插入整个脚本文件。为此，请实现iSchemeHandlerFactory创建和返回IresourceHandler。该资源处理程序必须具有一个processRequestAsync，您可以在其中收到一个请求。您可以用来介绍脚本的脚本：

  this.ResponseLength = stream.Length;
  this.MimeType = GetMimeType(fileExtension);
  this.StatusCode = (int)HttpStatusCode.OK;
  this.Stream = stream;

  callback.Continue();
  return true;

流也许是在其中编写脚本文件的内容的内存流。

You never must work with sleep because time changes between computers and, even in the same computer, a web page may be differ the time required to load.

I work a lot with scraping and IMO the best focus to manage this is working from JavaScript side. You inject/run your JavaScript to fill controls, click buttons...

With this focus, the problem is that navigations make you lose the state. When you navigate to other page, your JavaScript start from scratch. I revolve this sharing data to persist between JavaScript and C# through Bound Object and injecting JavaScript.

For example, you can run action 1, 2 and 3 with a piece of JavaScript code. Before click button, you can use your Bound Object to tell to your C# code that you are going to second page.

When your second page are loaded, you run your JavaScript for your second page (you know the step and can inject the JavaScript code for your 2 page).

In all cases, your JavaScript code must have some mechanism to wait. For example, set a timer to wait until your controls appears. In this way, you can run your JavaScript without wait to the page is fully loaded (sometimes this events are hard to manage).

UPDATE

My scraping library is huge. I'm going to expose pieces that you need to do the work but you need to assemble by yourself.

We create a BoundObject class:

public class BoundObject
{
    public BoundObject(IWebBrowser browser)
    {
        this.Browser = browser;
    }

    public void OnJavaScriptMessage(string message)
    {
        this.Browser.OnJavaScriptMessage(message);
    }
}

IWebBrowser is an interface of my custom browser, a wrapper to manage all I need. Create a Browser class, like CustomBrowser, for example, implementing this interface.

Create a method to ensure your Bound Object is working:

public void SetBoundObject()
{
    // To get events in C# from JavaScript
    try
    {
        var boundObject = new BoundObject();
        this._browserInternal.JavascriptObjectRepository.Register(
        "bound", boundObject, false, BindingOptions.DefaultBinder);

       this.BoundObject = boundObject;
   }
   catch (ArgumentException ex)
   {
       if (!ex.ParamName.Identical("bound"))
       {
           throw;
       }
   }
}

_browserInternal is the CefSharp browser. You must run that method on each page load, when you navigate. Doing that, you have a window.bound object in JavaScript side with an onJavaScriptMessage function. Then, you can define a function in JavaScript like this:

function sendMessage(msg) {
    var json = JSON.stringify(msg);
    window.bound.onJavaScriptMessage(json);
    return this;
};

You can send now any object to your C# application and manage in your CustomBrowser, on OnJavaScriptMessage method. In that method I manage my custom message protocol, like a typical one in sockets environment or the windows message system and generate a OnMessage that I implement in classes inheriting CustomBrowser.

Send information to JavaScript is trivial using ExecuteScriptAsync of CefSharp browser.

Going further

When I work in an intense scraping job. I create some scripts with classes to manage the entire Web to scrap. I create classes, for example, to do login, navigate to different sections, fill forms... like if I was the owner of the WebSite. Then, when page load, I inject my scripts and I can use my own classes in the remote WebSite making scraping... piece of cake.

My scripts are embedded resources so are into my final executable. In debug, I read them from disk to allow edit+reload+test until my scripts works fine. With the DevTools you can try in the console until you get the desired source. Then you add into your JavaScripts classes and reload.

You can add simple JavaScript with ExecuteScriptAsync, but with large files appears problems escaping quotes...

So you need insert an entire script file. To do that, implement ISchemeHandlerFactory to create and return an IResourceHandler. That resource handler must have a ProcessRequestAsync in which you receive a request.Url that you can use to locale your scripts:

  this.ResponseLength = stream.Length;
  this.MimeType = GetMimeType(fileExtension);
  this.StatusCode = (int)HttpStatusCode.OK;
  this.Stream = stream;

  callback.Continue();
  return true;

stream maybe a MemoryStream in which you write the content of your script file.

回复收藏 0 原文

~没有更多了~