Dockerised Hive找不到org.apache.hadoop.fs.s3a.s3a.s3afilesystem,即使我添加了hadoop-aws jar

发布于 2025-02-06 08:25:27 字数 2731 浏览 1 评论 0原文

我正在尝试使用由HDFS&组成的Docker-Compose。色调& Hive +连接到我的AWS S3存储桶。 截至目前,我正在运行它,并且可以使用Hue File浏览器浏览我的S3存储桶。 当我尝试在镶木quet文件上创建一个蜂巢外部表时,我会收到以下错误:

org.apache.hadoop.fs.s3a.s3a.s3afilesystem找不到

我知道这是由于缺失的Hadoop-aws Jar引起的。 为了快速获胜,我已将Hadoop-aws-2.7.4.4.jar上传到容器HDFS,并在

添加jar hdfs:/hadoop-aws-2.7.4.jar;

我没有遇到任何错误,因此似乎可以正确找到罐子(如果更改名称,我会发现一个错误文件)。然而,创建外部表仍会出现相同的错误。

我在做什么错?尽管它可能与不兼容的版本有关,但它似乎与我的Docker-Compose版本匹配。

这是所用的码头组合:

version: "3"

services:
  namenode:
    image: bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8
    volumes:
      - namenode:/hadoop/dfs/name
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop-hive.env
    ports:
      - "50070:50070"
    
  datanode:
    image: bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
    volumes:
      - datanode:/hadoop/dfs/data
    env_file:
      - ./hadoop-hive.env
    environment:
      SERVICE_PRECONDITION: "namenode:50070"
    ports:
      - "50075:50075"

  resourcemanager:
    image: bde2020/hadoop-resourcemanager:2.0.0-hadoop2.7.4-java8
    environment:
      SERVICE_PRECONDITION: "namenode:50070 datanode:50075"
    env_file:
      - ./hadoop-hive.env

  hive-server:
    image: bde2020/hive:2.3.2-postgresql-metastore
    env_file:
      - ./hadoop-hive.env
    environment:
      HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://hive-metastore/metastore"
      SERVICE_PRECONDITION: "hive-metastore:9083"
    ports:
      - "10000:10000"

  hive-metastore:
    image: bde2020/hive:2.3.2-postgresql-metastore
    env_file:
      - ./hadoop-hive.env
    command: /opt/hive/bin/hive --service metastore
    environment:
      SERVICE_PRECONDITION: "namenode:50070 datanode:50075 hive-metastore-postgresql:5432 resourcemanager:8088"
    ports:
      - "9083:9083"

  hive-metastore-postgresql:
    image: bde2020/hive-metastore-postgresql:2.3.0
    ports:
      - "5432:5432"

  huedb:
    image: postgres:12.1-alpine
    volumes:
      - pg_data:/var/lib/postgresl/data/
    ports:
      - "5432"
    env_file:
      - ./hadoop-hive.env
    environment:
        SERVICE_PRECONDITION: "namenode:50070 datanode:50075 hive-metastore-postgresql:5432 resourcemanager:8088 hive-metastore:9083"
  
  hue:
    image: gethue/hue:4.6.0
    environment:
        SERVICE_PRECONDITION: "namenode:50070 datanode:50075 hive-metastore-postgresql:5432 resourcemanager:8088 hive-metastore:9083 huedb:5000"
    ports:
      - "8888:8888"
    volumes:
      - ./hue-overrides.ini:/usr/share/hue/desktop/conf/hue-overrides.ini
    links:
      - huedb

volumes:
  namenode:
  datanode:
  pg_data:

会喜欢任何帮助!!

谢谢,

I am trying to PoC a docker-compose composed of HDFS & Hue & Hive + connected to my AWS S3 Buckets.
As of right now I have it running and I can browse my S3 Buckets using Hue File browser.
When I try to create a hive external table on my parquet file, I get the following error :

org.apache.hadoop.fs.s3a.S3AFileSystem not found

I understand this is caused by a missing hadoop-aws jar.
As a quick win, I've uploaded the hadoop-aws-2.7.4.jar to the container HDFS, and executed my create table query after

add jar hdfs:/hadoop-aws-2.7.4.jar;

I don't get any errors so it seems to find the jar properly (if I change the name, I get an error file not found). Yet the create external table still fails with the same error.

What am I doing wrong ? I though it could be related to an incompatible version but it seems to be matching my docker-compose versions.

Here is the docker-compose used :

version: "3"

services:
  namenode:
    image: bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8
    volumes:
      - namenode:/hadoop/dfs/name
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop-hive.env
    ports:
      - "50070:50070"
    
  datanode:
    image: bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
    volumes:
      - datanode:/hadoop/dfs/data
    env_file:
      - ./hadoop-hive.env
    environment:
      SERVICE_PRECONDITION: "namenode:50070"
    ports:
      - "50075:50075"

  resourcemanager:
    image: bde2020/hadoop-resourcemanager:2.0.0-hadoop2.7.4-java8
    environment:
      SERVICE_PRECONDITION: "namenode:50070 datanode:50075"
    env_file:
      - ./hadoop-hive.env

  hive-server:
    image: bde2020/hive:2.3.2-postgresql-metastore
    env_file:
      - ./hadoop-hive.env
    environment:
      HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://hive-metastore/metastore"
      SERVICE_PRECONDITION: "hive-metastore:9083"
    ports:
      - "10000:10000"

  hive-metastore:
    image: bde2020/hive:2.3.2-postgresql-metastore
    env_file:
      - ./hadoop-hive.env
    command: /opt/hive/bin/hive --service metastore
    environment:
      SERVICE_PRECONDITION: "namenode:50070 datanode:50075 hive-metastore-postgresql:5432 resourcemanager:8088"
    ports:
      - "9083:9083"

  hive-metastore-postgresql:
    image: bde2020/hive-metastore-postgresql:2.3.0
    ports:
      - "5432:5432"

  huedb:
    image: postgres:12.1-alpine
    volumes:
      - pg_data:/var/lib/postgresl/data/
    ports:
      - "5432"
    env_file:
      - ./hadoop-hive.env
    environment:
        SERVICE_PRECONDITION: "namenode:50070 datanode:50075 hive-metastore-postgresql:5432 resourcemanager:8088 hive-metastore:9083"
  
  hue:
    image: gethue/hue:4.6.0
    environment:
        SERVICE_PRECONDITION: "namenode:50070 datanode:50075 hive-metastore-postgresql:5432 resourcemanager:8088 hive-metastore:9083 huedb:5000"
    ports:
      - "8888:8888"
    volumes:
      - ./hue-overrides.ini:/usr/share/hue/desktop/conf/hue-overrides.ini
    links:
      - huedb

volumes:
  namenode:
  datanode:
  pg_data:

Would love any help !!

Thanks,

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文