F# 风格异步工作流的 WebRequest 超时

发布于 2024-12-13 07:51:11 字数 1952 浏览 2 评论 0原文

对于更广泛的上下文,这里是我的代码,下载URL 列表。

在我看来,在使用 use! 时,没有好的方法来处理 F# 中的超时! response = request.AsyncGetResponse() 样式 URL 获取。我几乎一切都按我希望的方式工作(错误处理和异步请求和响应下载),避免了网站需要很长时间响应时出现的问题。我当前的代码只是无限期地挂起。我已经在我编写的等待 300 秒的 PHP 脚本上尝试过了。它一直在等待。

我找到了两种“解决方案”,这两种方案都是不可取的。

AwaitIAsyncResult + BeginGetResponse

就像ildjarn这个其他堆栈溢出问题。这样做的问题是,如果您已将许多异步请求排队,则某些请求会在 AwaitIAsyncResult 上被人为阻止。换句话说,发出请求的调用已经发出,但幕后的某些东西正在阻止调用。这会导致在发出许多并发请求时过早触发 AwaitIAsyncResult 超时。我的猜测是对单个域的请求数量的限制或只是对总请求的限制。

为了支持我的怀疑,我编写了一个小 WPF 应用程序来绘制请求开始和结束的时间线。在上面链接的代码中,请注意计时器在第 49 行和第 54 行(调用第 10 行)开始和停止。这是生成的时间线图像

当我将计时器开始移动到初始响应之后(因此我只计时内容的下载)时, 时间线看起来更加真实。请注意,这是两次单独的运行,但除了计时器启动的位置之外,没有任何代码更改。而不是在使用之前直接测量startTime! response = request.AsyncGetResponse(),我之后就直接得到了。

为了进一步支持我的主张,我用 Fiddler2 制作了一个时间表。这是生成的时间线显然,请求并没有完全按照我的指示开始。

新线程中的 GetResponseStream

换句话说,同步请求和下载调用是在辅助线程中进行的。这确实有效,因为GetResponseStream尊重WebRequest对象上的Timeout属性。但在此过程中,我们失去了所有的等待时间,因为请求已在网上,而响应尚未返回。我们不妨用 C# 编写它...;)

问题

  • 这是一个已知问题吗?
  • 是否有任何好的解决方案可以利用 F# 异步工作流程并仍然允许超时和错误处理?
  • 如果问题确实是我一次发出太多请求,那么限制请求数量的最佳方法是使用 Semaphore(5, 5) 或类似的东西吗?
  • 附带问题:如果您查看了我的代码,您能看到我做过并且可以修复的任何愚蠢的事情吗?

如果您有任何困惑,请告诉我。

For a broader context, here is my code, which downloads a list of URLs.

It seems to me that there is no good way to handle timeouts in F# when using use! response = request.AsyncGetResponse() style URL fetching. I have pretty much everything working as I'd like it too (error handling and asynchronous request and response downloading) save the problem that occurs when a website takes a long time to response. My current code just hangs indefinitely. I've tried it on a PHP script I wrote that waits 300 seconds. It waited the whole time.

I have found "solutions" of two sorts, both of which are undesirable.

AwaitIAsyncResult + BeginGetResponse

Like the answer by ildjarn on this other Stack Overflow question. The problem with this is that if you have queued many asynchronous requests, some are artificially blocked on AwaitIAsyncResult. In other words, the call to make the request has been made, but something behind the scenes is blocking the call. This causes the time-out on AwaitIAsyncResult to be triggered prematurely when many concurrent requests are made. My guess is a limit on the number of requests to a single domain or just a limit on total requests.

To support my suspicion I wrote little WPF application to draw a timeline of when the requests seem to be starting and ending. In my code linked above, notice the timer start and stops on lines 49 and 54 (calling line 10). Here is the resulting timeline image.

When I move the timer start to after the initial response (so I am only timing the downloading of the contents), the timeline looks a lot more realistic. Note, these are two separate runs, but no code change aside from where the timer is started. Instead of having the startTime measured directly before use! response = request.AsyncGetResponse(), I have it directly afterwards.

To further support my claim, I made a timeline with Fiddler2. Here is the resulting timeline. Clearly the requests aren't starting exactly when I tell them to.

GetResponseStream in a new thread

In other words, synchronous requests and download calls are made in a secondary thread. This does work, since GetResponseStream respects the Timeout property on the WebRequest object. But in the process, we lose all of the waiting time as the request is on the wire and the response hasn't come back yet. We might as well write it in C#... ;)

Questions

  • Is this a known problem?
  • Is there any good solution that takes advantage of F# asynchronous workflows and still allows timeouts and error handling?
  • If the problem is really that I am making too many requests at once, then would the best way to limit the number of request be to use a Semaphore(5, 5) or something like that?
  • Side Question: if you've looked at my code, can you see any stupid things I've done and could fix?

If there is anything you are confused about, please let me know.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

嘿哥们儿 2024-12-20 07:51:11

AsyncGetResponse 简单地忽略发布的任何超时值...这是我们刚刚制定的解决方案:

open System
open System.IO
open System.Net

type Request = Request of WebRequest * AsyncReplyChannel<WebResponse>

let requestAgent =
    MailboxProcessor.Start <| fun inbox -> async {
            while true do
                let! (Request (req, port)) = inbox.Receive ()

                async {
                    try
                        let! resp = req.AsyncGetResponse ()
                        port.Reply resp
                    with
                    | ex -> sprintf "Exception in child %s\n%s" (ex.GetType().Name) ex.Message |> Console.WriteLine
                } |> Async.Start
        }

let getHTML url =
    async {
        try
            let req = "http://" + url |> WebRequest.Create
            try
                use! resp = requestAgent.PostAndAsyncReply ((fun chan -> Request (req, chan)), 1000)
                use str = resp.GetResponseStream ()
                use rdr = new StreamReader (str)
                return Some <| rdr.ReadToEnd ()
            with
            | :? System.TimeoutException ->
                req.Abort()
                Console.WriteLine "RequestAgent call timed out"
                return None
        with
        | ex ->
            sprintf "Exception in request %s\n\n%s" (ex.GetType().Name) ex.Message |> Console.WriteLine
            return None
    } |> Async.RunSynchronously;;

getHTML "www.grogogle.com"

即我们委托给另一个代理并调用它提供异步超时...如果我们没有在指定的时间内从代理那里得到回复是时候我们中止请求并继续前进了。

AsyncGetResponse simply ignoring any timeout value posted... here's a solution we just cooked:

open System
open System.IO
open System.Net

type Request = Request of WebRequest * AsyncReplyChannel<WebResponse>

let requestAgent =
    MailboxProcessor.Start <| fun inbox -> async {
            while true do
                let! (Request (req, port)) = inbox.Receive ()

                async {
                    try
                        let! resp = req.AsyncGetResponse ()
                        port.Reply resp
                    with
                    | ex -> sprintf "Exception in child %s\n%s" (ex.GetType().Name) ex.Message |> Console.WriteLine
                } |> Async.Start
        }

let getHTML url =
    async {
        try
            let req = "http://" + url |> WebRequest.Create
            try
                use! resp = requestAgent.PostAndAsyncReply ((fun chan -> Request (req, chan)), 1000)
                use str = resp.GetResponseStream ()
                use rdr = new StreamReader (str)
                return Some <| rdr.ReadToEnd ()
            with
            | :? System.TimeoutException ->
                req.Abort()
                Console.WriteLine "RequestAgent call timed out"
                return None
        with
        | ex ->
            sprintf "Exception in request %s\n\n%s" (ex.GetType().Name) ex.Message |> Console.WriteLine
            return None
    } |> Async.RunSynchronously;;

getHTML "www.grogogle.com"

i.e. We're delegating to another agent and calling it providing an async timeout... if we do not get a reply from the agent in the specified amount of time we abort the request and move on.

硬不硬你别怂 2024-12-20 07:51:11

我发现我的其他答案可能无法回答您的特定问题...这是不需要使用信号量的任务限制器的另一个实现。

open System

type IParallelLimiter =
    abstract GetToken : unit -> Async<IDisposable>

type Message= 
    | GetToken of AsyncReplyChannel<IDisposable>
    | Release

let start count =
    let agent =
        MailboxProcessor.Start(fun inbox ->
            let newToken () =
                { new IDisposable with
                    member x.Dispose () = inbox.Post Release }

            let rec loop n = async {
                    let! msg = inbox.Scan <| function
                        | GetToken _ when n = 0 -> None
                        | msg -> async.Return msg |> Some

                    return!
                        match msg with
                        | Release ->
                            loop (n + 1)
                        | GetToken port ->
                            port.Reply <| newToken ()
                            loop (n - 1)
                }
            loop count)

    { new IParallelLimiter with
        member x.GetToken () =
            agent.PostAndAsyncReply GetToken}

let limiter = start 100;;

for _ in 0..1000 do
    async {
        use! token = limiter.GetToken ()
        Console.WriteLine "Sleeping..."
        do! Async.Sleep 3000
        Console.WriteLine "Releasing..."
    } |> Async.Start

I see my other answer may fail to answer your particular question... here's another implementation for a task limiter that doesn't require the use of semaphore.

open System

type IParallelLimiter =
    abstract GetToken : unit -> Async<IDisposable>

type Message= 
    | GetToken of AsyncReplyChannel<IDisposable>
    | Release

let start count =
    let agent =
        MailboxProcessor.Start(fun inbox ->
            let newToken () =
                { new IDisposable with
                    member x.Dispose () = inbox.Post Release }

            let rec loop n = async {
                    let! msg = inbox.Scan <| function
                        | GetToken _ when n = 0 -> None
                        | msg -> async.Return msg |> Some

                    return!
                        match msg with
                        | Release ->
                            loop (n + 1)
                        | GetToken port ->
                            port.Reply <| newToken ()
                            loop (n - 1)
                }
            loop count)

    { new IParallelLimiter with
        member x.GetToken () =
            agent.PostAndAsyncReply GetToken}

let limiter = start 100;;

for _ in 0..1000 do
    async {
        use! token = limiter.GetToken ()
        Console.WriteLine "Sleeping..."
        do! Async.Sleep 3000
        Console.WriteLine "Releasing..."
    } |> Async.Start
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文