F# 邮箱处理器问题

发布于 2024-11-18 15:59:16 字数 1589 浏览 5 评论 0原文

我使用 http://fssnip.net/3K 中的代码创建了一个控制台程序。我发现

  1. 我会在末尾添加“System.Console.ReadLine() |>ignore”以等待线程完成。是否可以告诉所有 MailBoxProcessors 已完成并且程序可以自行退出?

  2. 我尝试将测试网址“www.google.com”更改为无效网址,并得到以下输出。是否可以避免“输出竞赛”?

     http://www.google.co1m crawled by agent 1.  
     AgAAAent gent 3 is done.  
     gent 2 is done.  
     5 is done.  
     gent 4 is done.  
     Agent USupervisor RL collector is done.  
     is done.  
     1 is done.

[编辑]

使用Tomas的更新http://fssnip.net/65后,最后一次输出/爬行仍然终止。以下是我将“限制”更改为 5 并添加一些调试消息后程序的输出。最后一行显示截断的 URL。这是一种检测是否所有爬虫都完成执行的方法吗?

[Main] before crawl
[Crawl] before return result
http://news.google.com crawled by agent 1.
[supervisor] reached limit
http://www.gstatic.com/news/img/favicon.ico crawled by agent 5.
Agent 2 is done.
[supervisor] reached limit
Agent 5 is done.
http://www.google.com/imghp?hl=en&tab=ni crawled by agent 3.
[supervisor] reached limit
Agent 3 is done.
http://www.google.com/webhp?hl=en&tab=nw crawled by agent 4.
[supervisor] reached limit
Agent 4 is done.
http://news.google.com/n

我将主代码更改为

printfn "[Main] before crawl"
crawl "http://news.google.com" 5
|> Async.RunSynchronously
printfn "[Main] after crawl"

然而,最后一个 printfn "[Main] aftercrawl" 永远不会执行,除非我在末尾添加 Console.Readline() 。

[编辑2]

代码在fsi下运行良好。但是,如果使用以下命令运行,则会出现同样的问题 fsi --use:Program.fs --exec --quiet

I've created a console program using the code from http://fssnip.net/3K. And I found that

  1. I'd to add "System.Console.ReadLine() |> ignore" at the end to wait for the finish of threads. Is it possible to tell all the MailBoxProcessors are done and the program can exit itself?

  2. I tried to change the test url "www.google.com" to something invalid url and I got the following output. Is it possible to avoid the "outputting race"?

     http://www.google.co1m crawled by agent 1.  
     AgAAAent gent 3 is done.  
     gent 2 is done.  
     5 is done.  
     gent 4 is done.  
     Agent USupervisor RL collector is done.  
     is done.  
     1 is done.

[Edit]

The last output/crawling is still terminated after using Tomas's update http://fssnip.net/65. The following is the output of the program after I changed the "limit" to 5 and added some debugging messages. The last line shows the truncated URL. Is it a way to detect if all the crawlers finish their execution?

[Main] before crawl
[Crawl] before return result
http://news.google.com crawled by agent 1.
[supervisor] reached limit
http://www.gstatic.com/news/img/favicon.ico crawled by agent 5.
Agent 2 is done.
[supervisor] reached limit
Agent 5 is done.
http://www.google.com/imghp?hl=en&tab=ni crawled by agent 3.
[supervisor] reached limit
Agent 3 is done.
http://www.google.com/webhp?hl=en&tab=nw crawled by agent 4.
[supervisor] reached limit
Agent 4 is done.
http://news.google.com/n

I changed the main code to

printfn "[Main] before crawl"
crawl "http://news.google.com" 5
|> Async.RunSynchronously
printfn "[Main] after crawl"

However, the last printfn "[Main] after crawl" is never executed, unless I add a Console.Readline() at the end.

[Edit 2]

The code runs fine under fsi. However it will have the same problem if it was run using
fsi --use:Program.fs --exec --quiet

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

╰◇生如夏花灿烂 2024-11-25 15:59:16

我创建了一个片段,使用您询问的两个功能扩展了前一个片段: http://fssnip.net/65

  1. 为了解决这个问题,我添加了带有 AsyncReplyChannelStart 消息。当supervisor代理启动时,它会等待该消息并保存回复通道以供以后使用。完成后,它会使用此通道发送回复。

    启动代理的函数返回等待回复的异步工作流。然后,您可以使用 Async.RunSynchronously 调用crawl,该操作将在主管代理完成时完成。

  2. 为了避免打印时出现争用,您需要同步所有打印。最简单的方法是编写一个新代理:-)。代理接收字符串并将它们一一打印到输出(这样它们就不能交错)。该代码段隐藏了标准 printfn 函数,并使用将字符串发送到代理的新实现。

I created a snippet that extends the previous one with the two features you asked about: http://fssnip.net/65.

  1. To solve this, I added Start message that carries AsyncReplyChannel<unit>. When the supervisor agent starts, it waits for this message and saves the reply channel for later use. When it completes, it sends a reply using this channel.

    The function that starts the agent returns asynchronous workflow that waits for the reply. You can then call crawl using Async.RunSynchronously, which will complete when the supervisor agent completes.

  2. To avoid race when printing, you need to synchronize all prints. The easiest way to do this is to write a new agent :-). The agent receives strings and prints them to the output one by one (so that they cannot be interleaved). The snippet hides the standard printfn function with a new implementation that sends strings to the agent.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文