F# 邮箱处理器问题
我使用 http://fssnip.net/3K 中的代码创建了一个控制台程序。我发现
我会在末尾添加“System.Console.ReadLine() |>ignore”以等待线程完成。是否可以告诉所有 MailBoxProcessors 已完成并且程序可以自行退出?
我尝试将测试网址“www.google.com”更改为无效网址,并得到以下输出。是否可以避免“输出竞赛”?
http://www.google.co1m crawled by agent 1. AgAAAent gent 3 is done. gent 2 is done. 5 is done. gent 4 is done. Agent USupervisor RL collector is done. is done. 1 is done.
[编辑]
使用Tomas的更新http://fssnip.net/65后,最后一次输出/爬行仍然终止。以下是我将“限制”更改为 5 并添加一些调试消息后程序的输出。最后一行显示截断的 URL。这是一种检测是否所有爬虫都完成执行的方法吗?
[Main] before crawl
[Crawl] before return result
http://news.google.com crawled by agent 1.
[supervisor] reached limit
http://www.gstatic.com/news/img/favicon.ico crawled by agent 5.
Agent 2 is done.
[supervisor] reached limit
Agent 5 is done.
http://www.google.com/imghp?hl=en&tab=ni crawled by agent 3.
[supervisor] reached limit
Agent 3 is done.
http://www.google.com/webhp?hl=en&tab=nw crawled by agent 4.
[supervisor] reached limit
Agent 4 is done.
http://news.google.com/n
我将主代码更改为
printfn "[Main] before crawl"
crawl "http://news.google.com" 5
|> Async.RunSynchronously
printfn "[Main] after crawl"
然而,最后一个 printfn "[Main] aftercrawl" 永远不会执行,除非我在末尾添加 Console.Readline() 。
[编辑2]
代码在fsi下运行良好。但是,如果使用以下命令运行,则会出现同样的问题 fsi --use:Program.fs --exec --quiet
I've created a console program using the code from http://fssnip.net/3K. And I found that
I'd to add "System.Console.ReadLine() |> ignore" at the end to wait for the finish of threads. Is it possible to tell all the MailBoxProcessors are done and the program can exit itself?
I tried to change the test url "www.google.com" to something invalid url and I got the following output. Is it possible to avoid the "outputting race"?
http://www.google.co1m crawled by agent 1. AgAAAent gent 3 is done. gent 2 is done. 5 is done. gent 4 is done. Agent USupervisor RL collector is done. is done. 1 is done.
[Edit]
The last output/crawling is still terminated after using Tomas's update http://fssnip.net/65. The following is the output of the program after I changed the "limit" to 5 and added some debugging messages. The last line shows the truncated URL. Is it a way to detect if all the crawlers finish their execution?
[Main] before crawl
[Crawl] before return result
http://news.google.com crawled by agent 1.
[supervisor] reached limit
http://www.gstatic.com/news/img/favicon.ico crawled by agent 5.
Agent 2 is done.
[supervisor] reached limit
Agent 5 is done.
http://www.google.com/imghp?hl=en&tab=ni crawled by agent 3.
[supervisor] reached limit
Agent 3 is done.
http://www.google.com/webhp?hl=en&tab=nw crawled by agent 4.
[supervisor] reached limit
Agent 4 is done.
http://news.google.com/n
I changed the main code to
printfn "[Main] before crawl"
crawl "http://news.google.com" 5
|> Async.RunSynchronously
printfn "[Main] after crawl"
However, the last printfn "[Main] after crawl" is never executed, unless I add a Console.Readline() at the end.
[Edit 2]
The code runs fine under fsi. However it will have the same problem if it was run using
fsi --use:Program.fs --exec --quiet
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我创建了一个片段,使用您询问的两个功能扩展了前一个片段: http://fssnip.net/65。
为了解决这个问题,我添加了带有
AsyncReplyChannel
的Start
消息。当supervisor代理启动时,它会等待该消息并保存回复通道以供以后使用。完成后,它会使用此通道发送回复。启动代理的函数返回等待回复的异步工作流。然后,您可以使用
Async.RunSynchronously
调用crawl
,该操作将在主管代理完成时完成。为了避免打印时出现争用,您需要同步所有打印。最简单的方法是编写一个新代理:-)。代理接收字符串并将它们一一打印到输出(这样它们就不能交错)。该代码段隐藏了标准
printfn
函数,并使用将字符串发送到代理的新实现。I created a snippet that extends the previous one with the two features you asked about: http://fssnip.net/65.
To solve this, I added
Start
message that carriesAsyncReplyChannel<unit>
. When the supervisor agent starts, it waits for this message and saves the reply channel for later use. When it completes, it sends a reply using this channel.The function that starts the agent returns asynchronous workflow that waits for the reply. You can then call
crawl
usingAsync.RunSynchronously
, which will complete when the supervisor agent completes.To avoid race when printing, you need to synchronize all prints. The easiest way to do this is to write a new agent :-). The agent receives strings and prints them to the output one by one (so that they cannot be interleaved). The snippet hides the standard
printfn
function with a new implementation that sends strings to the agent.