Mahout/Hadoop:SQL 到 SequenceFile
我开始使用 Mahout 进行集群,但我很难尝试将 sql(mysql) 转储转换为与 Mahout 兼容的 SequenceFile。我正在使用上面的代码。
SQL 示例
(1, 318145, '[running with jentopia, sotm]', '2011-04-27 21:47:16'),
(2, 318138, '[fonts, textile, valentines day]', '2011-04-27 21:47:16'),
...
Java
File url = new File(inputFile);
// starts the conf
Configuration conf = new Configuration();
// opens a buffer to save file
Job job = new Job(conf);
job.setJobName("Convert Text");
job.setJarByClass(Mapper.class);
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setNumReduceTasks(0);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setInputFormatClass(TextInputFormat.class);
TextInputFormat.addInputPath(job, new Path(inputFile));
SequenceFileOutputFormat.setOutputPath(job, new Path(SequenceFileCreator.SEQUENCE_FOLDER_PATH));
// submit and wait for completion
job.waitForCompletion(true);
谢谢!
I am starting to use Mahout for clustering, but I am having a hard time trying to convert a sql(mysql) dump to a mahout-compatible SequenceFile. I am using the code above.
SQL Sample
(1, 318145, '[running with jentopia, sotm]', '2011-04-27 21:47:16'),
(2, 318138, '[fonts, textile, valentines day]', '2011-04-27 21:47:16'),
...
Java
File url = new File(inputFile);
// starts the conf
Configuration conf = new Configuration();
// opens a buffer to save file
Job job = new Job(conf);
job.setJobName("Convert Text");
job.setJarByClass(Mapper.class);
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setNumReduceTasks(0);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setInputFormatClass(TextInputFormat.class);
TextInputFormat.addInputPath(job, new Path(inputFile));
SequenceFileOutputFormat.setOutputPath(job, new Path(SequenceFileCreator.SEQUENCE_FOLDER_PATH));
// submit and wait for completion
job.waitForCompletion(true);
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论