Hadoop java映射器作业在从节点上执行,目录问题
作为 Java 映射器的一部分,我有一个命令在本地从属节点上执行一些独立代码。当我运行代码时,它执行得很好,除非它尝试访问某些本地文件,在这种情况下,我会收到无法找到这些文件的错误。
深入挖掘一下,它似乎是从以下目录执行的:
/data/hadoop/mapred/local/taskTracker/{user}/jobcache/job_201109261253_0023/attempt_201109261253_0023_m_000001_0/work
但我打算从相关文件所在的本地目录执行:
/home/users/{user}/input/jobname
java/hadoop中有没有办法强制从本地目录执行,而不是hadoop中自动创建的jobcache目录?
也许有更好的方法来解决这个问题吗?
对此的任何帮助将不胜感激!
As part of my Java mapper I have a command executes some standalone code on a local slave node. When I run a code it executes fine, unless it is trying to access some local files in which case I get the error that it cannot locate those files.
Digging a little deeper it seems to be executing from the following directory:
/data/hadoop/mapred/local/taskTracker/{user}/jobcache/job_201109261253_0023/attempt_201109261253_0023_m_000001_0/work
But I am intending to execute from a local directory where the relevant files are located:
/home/users/{user}/input/jobname
Is there a way in java/hadoop to force the execution from the local directory, instead of the jobcache directory automatically created in hadoop?
Is there perhaps a better way to go about this?
Any help on this would be greatly appreciated!
我现在使用的解决方法是将所有相关文件复制到作业缓存工作目录。然后,如果需要,您可以将结果复制回用户目录。
不幸的是,这并不能完全回答问题,但希望能为其他人提供有用的解决方法。
干杯,
乔里斯
A workaround method I'm using right now that works consists of copying all the relevant files over to the jobcache working directory. Then you can copy the results back to user directory if necessary.
Unfortunately this doesn't fully answer the question, but hopefully provides a useful workaround for others.
Cheers,
Joris