java怎么读取并获取hive的textfile文件内容
如图使用FileSystem按字节数读取文件时出现了一行数据不全的情况。代码如下://text格式存储
Path filePath = new Path(location);
FileStatus[] status = null;
FileSystem fs = null;
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://192.168.20.181:8020");
conf.set("fs.permissions.umask-mode", "002");
System.getProperties().setProperty("HADOOP_USER_NAME", "hdfs");
try {
fs = FileSystem.get(conf);
} catch (Exception e) {
e.printStackTrace();
logger.error("获取hdfs目录列表失败!");
}
try {
status = fs.listStatus(filePath);
} catch (Exception e) {
e.printStackTrace();
throw new Exception("HdfsListDirectory:" + e.getMessage());
}
long start = (new Date()).getTime();
List<Future> taskList = new ArrayList<Future>();
for (FileStatus st : status) {
MyCallable myCallable = new MyCallable(st, fs,lineDelim,fieldDelim,cloumnList);
FutureTask task = new FutureTask(myCallable);
pool.submit(task);
taskList.add(task);
}
int count = 0;
for(Future f:taskList){
count =count +(Integer)f.get();
}
请问各位大佬:java有什么方法读取hive中textfile文件,orc文件的读取已经完成。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
自问自答一次。可以使用LazySimpleSerDe去读取文件,思路来自读取orc文件。具体代码如下:
//text格式存储
Path filePath = new Path(location);
Properties p = new Properties();
LazySimpleSerDe serde = new LazySimpleSerDe();
JobConf conf = new JobConf();
conf.set("fs.default.name","hdfs://192.168.20.181:8020");
p.setProperty("columns", cloNameSB.toString());
p.setProperty("columns.types", cloTypeSB.toString());
serde.initialize(conf, p);
StructObjectInspector inspector = (StructObjectInspector) serde.getObjectInspector();
TextInputFormat in = new TextInputFormat();
in.configure(conf);
TextInputFormat.setInputPaths(conf, filePath);
InputSplit[] splits = in.getSplits(conf, 1);
long start = (new Date()).getTime();
int count = 0;
for(InputSplit split:splits){
RecordReader reader = in.getRecordReader(split, conf, Reporter.NULL);
Object key = reader.createKey();
Object value = reader.createValue();
List<? extends StructField> fields = inspector.getAllStructFieldRefs();
while(reader.next(key, value)) {
Map<String,Object> map = new HashMap<String,Object>();
for(int i = 0; i<fields.size(); i++){
Text text = (Text)value;
String[] vs = text.toString().split(fieldDelim);
map.put(fields.get(i).getFieldName(), vs[i]);
}
count++;
}
}
System.out.println((new Date()).getTime()-start);
System.out.println(count);
这样虽然比直接按字节读取文件要慢一点,但是更准确