如何设计我的映射器？

发布于 2024-11-27 02:34:26 字数 1086 浏览 0 评论 0原文

我必须编写一个mapreduce作业，但我不知道如何去做，

我有jar MARD.jar，通过它我可以实例化MARD对象。我使用它来调用 mard.normalize 文件方法，即 mard.normaliseFile（一堆参数）。

这又创建了特定的输出文件。

为了运行规范化方法，它需要工作目录中有一个名为 myMard 的文件夹。所以我想我会将 myMard 文件夹作为 hadoop 作业的输入路径，但我不确定这是否有帮助，因为 mard.normaliseFile(一堆参数) 将在工作目录中搜索 myMard 文件夹，但它不会找到它（**这是我的想法）Mapper 只能通过从 fileSplit 获得的“值”访问文件的内容，它不能直接访问 myMard 文件夹中的文件。

简而言之，我必须通过 MapReduce 执行以下代码

File setupFolder = new File(setupFolderName);

setupFolder.mkdirs();



MARD mard = new MARD(setupFolder);

Text valuz = new Text();

IntWritable intval = new IntWritable();

File original = new File("Vca1652.txt");

File mardedxml = new File("Vca1652-mardedxml.txt");

File marded = new File("Vca1652-marded.txt");



mardedxml.createNewFile();

marded.createNewFile();

NormalisationStats stats;

try {

stats = mard.normaliseFile(original,mardedxml,marded,50.0);

//This meathod requires access to the myMardfolder


System.out.println(stats);

} catch (MARDException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

请帮助

原文

I have to write a mapreduce job but I dont know how to go about it,

I have jar MARD.jar through which I can instantiate MARD objects.
Using which I call the mard.normalize file meathod on it i.e. mard.normaliseFile(bunch of arguments).

This inturn creates certain output file.

For the normalise meathod to run it needs a folder called myMard in the working directory.
So I thought that I would give the myMard folder as the in input path to hadoop job, but m not sure if that would help beacuse mard.normaliseFile(bunch of arguments) will search for the myMard folder in the working directory but it will not find it as (**this is what I think) the Mapper will only be able to access the content of files through the "values" obtained from the fileSplit, it cannot give direct access to the files in the myMard folder.

In short I have to execute the follwing code through the MapReduce

File setupFolder = new File(setupFolderName);

setupFolder.mkdirs();



MARD mard = new MARD(setupFolder);

Text valuz = new Text();

IntWritable intval = new IntWritable();

File original = new File("Vca1652.txt");

File mardedxml = new File("Vca1652-mardedxml.txt");

File marded = new File("Vca1652-marded.txt");



mardedxml.createNewFile();

marded.createNewFile();

NormalisationStats stats;

try {

stats = mard.normaliseFile(original,mardedxml,marded,50.0);

//This meathod requires access to the myMardfolder


System.out.println(stats);

} catch (MARDException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

Please help

分享到QQ

分享到微博