Mahout/Hadoop:SQL 到 SequenceFile

发布于 2024-11-04 08:16:53 字数 1116 浏览 5 评论 0原文

我开始使用 Mahout 进行集群,但我很难尝试将 sql(mysql) 转储转换为与 Mahout 兼容的 SequenceFile。我正在使用上面的代码。

SQL 示例

(1, 318145, '[running with jentopia, sotm]', '2011-04-27 21:47:16'),
(2, 318138, '[fonts, textile, valentines day]', '2011-04-27 21:47:16'),
...

Java

    File url = new File(inputFile);

    // starts the conf
    Configuration conf = new Configuration();

    // opens a buffer to save file
    Job job = new Job(conf);
    job.setJobName("Convert Text");
    job.setJarByClass(Mapper.class);

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    job.setNumReduceTasks(0);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(Text.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    job.setInputFormatClass(TextInputFormat.class);

    TextInputFormat.addInputPath(job, new Path(inputFile));
    SequenceFileOutputFormat.setOutputPath(job, new Path(SequenceFileCreator.SEQUENCE_FOLDER_PATH));

    // submit and wait for completion
    job.waitForCompletion(true);

谢谢!

I am starting to use Mahout for clustering, but I am having a hard time trying to convert a sql(mysql) dump to a mahout-compatible SequenceFile. I am using the code above.

SQL Sample

(1, 318145, '[running with jentopia, sotm]', '2011-04-27 21:47:16'),
(2, 318138, '[fonts, textile, valentines day]', '2011-04-27 21:47:16'),
...

Java

    File url = new File(inputFile);

    // starts the conf
    Configuration conf = new Configuration();

    // opens a buffer to save file
    Job job = new Job(conf);
    job.setJobName("Convert Text");
    job.setJarByClass(Mapper.class);

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    job.setNumReduceTasks(0);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(Text.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    job.setInputFormatClass(TextInputFormat.class);

    TextInputFormat.addInputPath(job, new Path(inputFile));
    SequenceFileOutputFormat.setOutputPath(job, new Path(SequenceFileCreator.SEQUENCE_FOLDER_PATH));

    // submit and wait for completion
    job.waitForCompletion(true);

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文