GridGain / Scala - 在现有作业中生成作业

发布于 2024-10-21 04:18:19 字数 1061 浏览 5 评论 0原文

作为概念验证,我正在构建这个极其简单的 Twitter Friends 爬虫。它将执行以下操作:

  1. 为 Twitter 帐户“twitter-user-1”执行 CrawlJob
  2. 查找“twitter-user-1”的所有朋友
  3. 为“twitter-user-1”的所有朋友执行 CrawlJob

这是我的代码到目前为止的样子:

def main( args:Array[String] ) {

  scalar {
    grid.execute(classOf[CrawlTask], "twitter-user-1").get
  }

}

class CrawlTask extends GridTaskNoReduceSplitAdapter[String] {

    def split( gridSize:Int, arg:String): Collection[GridJob] = {
        val jobs:Collection[GridJob] = new ArrayList[GridJob]()
        val initialCrawlJob = new CrawlJob()
        initialCrawlJob.twitterId = arg
        jobs.add(initialCrawlJob)
        jobs
    }

}

class CrawlJob extends GridJob {

  var twitterId:String = new String()

  def cancel() = {
    println("cancel - " + twitterId)
  }

  def execute():Object = {
    println("fetch friends for - " + twitterId)
    // Fetch and execute CrawlJobs for all friends
    return null
  }

}

我为所有 Twitter 交互准备了 Java 服务。需要一些示例来弄清楚如何在现有作业中创建新作业并将其与原始任务关联起来。

谢谢|斯里兰甘

As a proof of concept, I'm building this extremely simple Twitter Friends crawler. Here's what it will do:

  1. Execute CrawlJob for Twitter account "twitter-user-1"
  2. Find all friends of "twitter-user-1"
  3. Execute CrawlJob for all friends of "twitter-user-1"

Here's what my code looks like so far:

def main( args:Array[String] ) {

  scalar {
    grid.execute(classOf[CrawlTask], "twitter-user-1").get
  }

}

class CrawlTask extends GridTaskNoReduceSplitAdapter[String] {

    def split( gridSize:Int, arg:String): Collection[GridJob] = {
        val jobs:Collection[GridJob] = new ArrayList[GridJob]()
        val initialCrawlJob = new CrawlJob()
        initialCrawlJob.twitterId = arg
        jobs.add(initialCrawlJob)
        jobs
    }

}

class CrawlJob extends GridJob {

  var twitterId:String = new String()

  def cancel() = {
    println("cancel - " + twitterId)
  }

  def execute():Object = {
    println("fetch friends for - " + twitterId)
    // Fetch and execute CrawlJobs for all friends
    return null
  }

}

I have Java services prepared for all twitter interaction. Need some examples to figure out how to create new jobs within an existing job and associate it with the original Task.

Thanks | Srirangan

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

战皆罪 2024-10-28 04:18:19

我是如何解决这个问题的?

从概念上统一 GridTasks 和 GridJobs。 MySpecialGridTask 只能有一个 MySpecialGridJob。

然后,很容易在任务或作业中执行新的GridTasks。

在上面的例子中:

class CrawlJob extends GridJob {

  var twitterId:String = new String()

  def cancel() = {
    println("cancel - " + twitterId)
  }

  def execute():Object = {
    println("fetch friends for - " + twitterId)
    // Fetch and execute CrawlJobs for all friends
    // Execute Job Here
    grid.execute(classOf[CrawlTask], "twitter-user-2").get
    grid.execute(classOf[CrawlTask], "twitter-user-3").get
    return null
  }

}

How did I get around this?

Conceptually unite GridTasks and GridJobs. MySpecialGridTask can only have one MySpecialGridJob.

Then, it is easy to execute new GridTasks in the Task or the Job.

In the example above:

class CrawlJob extends GridJob {

  var twitterId:String = new String()

  def cancel() = {
    println("cancel - " + twitterId)
  }

  def execute():Object = {
    println("fetch friends for - " + twitterId)
    // Fetch and execute CrawlJobs for all friends
    // Execute Job Here
    grid.execute(classOf[CrawlTask], "twitter-user-2").get
    grid.execute(classOf[CrawlTask], "twitter-user-3").get
    return null
  }

}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文