GridGain / Scala - 在现有作业中生成作业
作为概念验证,我正在构建这个极其简单的 Twitter Friends 爬虫。它将执行以下操作:
- 为 Twitter 帐户“twitter-user-1”执行 CrawlJob
- 查找“twitter-user-1”的所有朋友
- 为“twitter-user-1”的所有朋友执行 CrawlJob
这是我的代码到目前为止的样子:
def main( args:Array[String] ) {
scalar {
grid.execute(classOf[CrawlTask], "twitter-user-1").get
}
}
class CrawlTask extends GridTaskNoReduceSplitAdapter[String] {
def split( gridSize:Int, arg:String): Collection[GridJob] = {
val jobs:Collection[GridJob] = new ArrayList[GridJob]()
val initialCrawlJob = new CrawlJob()
initialCrawlJob.twitterId = arg
jobs.add(initialCrawlJob)
jobs
}
}
class CrawlJob extends GridJob {
var twitterId:String = new String()
def cancel() = {
println("cancel - " + twitterId)
}
def execute():Object = {
println("fetch friends for - " + twitterId)
// Fetch and execute CrawlJobs for all friends
return null
}
}
我为所有 Twitter 交互准备了 Java 服务。需要一些示例来弄清楚如何在现有作业中创建新作业并将其与原始任务关联起来。
谢谢|斯里兰甘
As a proof of concept, I'm building this extremely simple Twitter Friends crawler. Here's what it will do:
- Execute CrawlJob for Twitter account "twitter-user-1"
- Find all friends of "twitter-user-1"
- Execute CrawlJob for all friends of "twitter-user-1"
Here's what my code looks like so far:
def main( args:Array[String] ) {
scalar {
grid.execute(classOf[CrawlTask], "twitter-user-1").get
}
}
class CrawlTask extends GridTaskNoReduceSplitAdapter[String] {
def split( gridSize:Int, arg:String): Collection[GridJob] = {
val jobs:Collection[GridJob] = new ArrayList[GridJob]()
val initialCrawlJob = new CrawlJob()
initialCrawlJob.twitterId = arg
jobs.add(initialCrawlJob)
jobs
}
}
class CrawlJob extends GridJob {
var twitterId:String = new String()
def cancel() = {
println("cancel - " + twitterId)
}
def execute():Object = {
println("fetch friends for - " + twitterId)
// Fetch and execute CrawlJobs for all friends
return null
}
}
I have Java services prepared for all twitter interaction. Need some examples to figure out how to create new jobs within an existing job and associate it with the original Task.
Thanks | Srirangan
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我是如何解决这个问题的?
从概念上统一 GridTasks 和 GridJobs。 MySpecialGridTask 只能有一个 MySpecialGridJob。
然后,很容易在任务或作业中执行新的GridTasks。
在上面的例子中:
How did I get around this?
Conceptually unite GridTasks and GridJobs. MySpecialGridTask can only have one MySpecialGridJob.
Then, it is easy to execute new GridTasks in the Task or the Job.
In the example above: