如何让 HBase 与 sbt 的依赖管理完美配合?

发布于 2024-11-14 07:30:49 字数 922 浏览 10 评论 0原文

我正在尝试启动一个使用 CDH3 的 Hadoop 和 HBase 的 sbt 项目。我正在尝试使用 project/build/Project.scala 文件来声明对 HBase 和 Hadoop 的依赖关系。 (我承认我对 sbt、maven 和 ivy 的掌握有点薄弱。如果我说了或做了一些愚蠢的事情,请原谅我。)

一切都在 Hadoop 依赖项下进行得很顺利。添加 HBase 依赖项会导致对 Thrift 0.2.0 的依赖,而该版本似乎没有存储库,或者从这里听起来是这样的 SO post。

所以,真的,我有两个问题: 1. 老实说,我不想依赖Thrift,因为我不想使用HBase的Thrift接口。有没有办法告诉 sbt 跳过它? 2. 有更好的设置方法吗?我应该将 HBase jar 转储到 lib 目录中然后继续吗?

更新 这是 sbt 0.10 build.sbt 文件,它实现了我想要的:

scalaVersion := "2.9.0-1"

resolvers += "ClouderaRepo" at "https://repository.cloudera.com/content/repositories/releases"

libraryDependencies ++= Seq(
  "org.apache.hadoop" % "hadoop-core" % "0.20.2-cdh3u0",
  "org.apache.hbase" % "hbase" % "0.90.1-cdh3u0"
)

ivyXML :=
  <dependencies>
    <exclude module="thrift"/>
  </dependencies>

I'm trying to get an sbt project going which uses CDH3's Hadoop and HBase. I'm trying to using a project/build/Project.scala file to declare dependencies on HBase and Hadoop. (I'll admit my grasp of sbt, maven, and ivy is a little weak. Please pardon me if I'd saying or doing something dumb.)

Everything went swimmingly with the Hadoop dependency. Adding the HBase dependency resulted in a dependency on Thrift 0.2.0, for which there doesn't appear to be a repo, or so it sounds from this SO post.

So, really, I have two questions:
1. Honestly, I don't want a dependency on Thrift because I don't want to use HBase's Thrift interface. Is there a way to tell sbt to skip it?
2. Is there some better way to set this up? Should I just dump the HBase jar in the lib directory and move on?

Update This is the sbt 0.10 build.sbt file that accomplished what I wanted:

scalaVersion := "2.9.0-1"

resolvers += "ClouderaRepo" at "https://repository.cloudera.com/content/repositories/releases"

libraryDependencies ++= Seq(
  "org.apache.hadoop" % "hadoop-core" % "0.20.2-cdh3u0",
  "org.apache.hbase" % "hbase" % "0.90.1-cdh3u0"
)

ivyXML :=
  <dependencies>
    <exclude module="thrift"/>
  </dependencies>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

祁梦 2024-11-21 07:30:50

查看 HBase POM 文件,Thrift位于 http://people.apache.org/~rawson/repo 的存储库中。您可以将其添加到您的项目中,它应该会找到 Thrift。我以为 SBT 会解决这个问题,但这是 SBT、Ivy 和 Maven 的交集,所以谁能真正说出应该发生什么。

如果您确实不需要 Thrift,则可以使用内联 Ivy XML 排除依赖项,如文档 SBT 维基

override def ivyXML = 
  <dependencies>
    <exclude module="thrift"/>
  </dependencies>

回复:将 jar 转储到 lib 目录中,这将是短期收益,长期损失。这当然更方便,如果这是你下周要扔掉的一些概念证明,那么肯定只是扔进罐子里然后忘记它。但对于任何生命周期超过几个月的项目来说,花时间进行正确的依赖管理是值得的。

虽然所有这些工具都面临挑战,但好处是:

  1. 依赖关系分析可以告诉您直接依赖关系何时具有冲突的传递依赖关系。在这些工具出现之前,这通常会导致奇怪的运行时行为或方法未找到异常。
  2. 升级非常简单。只需更改版本号、更新即可。
  3. 它避免了必须将二进制文件提交到版本控制。当合并分支时它们可能会出现问题。
  4. 除非您对如何对 lib 目录中的二进制文件进行版本控制有明确的策略,否则很容易丢失您拥有的版本。

Looking at the HBase POM file, Thrift is in the repo at http://people.apache.org/~rawson/repo. You can add that to your project, and it should find Thrift. I thought that SBT would have figured that out, but this is an intersection of SBT, Ivy and Maven, so who can really say what really should happen.

If you really don't need Thrift, you can exclude dependencies using inline Ivy XML, as documented on the SBT wiki.

override def ivyXML = 
  <dependencies>
    <exclude module="thrift"/>
  </dependencies>

Re: dumping the jar in the lib directory, that would be a short term gain, long term loss. It's certainly more expedient, and if this is some proof of concept you're throwing away next week, sure just drop in the jar and forget about it. But for any project that has a lifespan greater than a couple of months, it's worth it to spend the time to get dependency management right.

While all of these tools have their challenges, the benefits are:

  1. Dependency analysis can tell you when your direct dependencies have conflicting transitive dependencies. Before these tools, this usually resulted in weird runtime behavior or method not found exceptions.
  2. Upgrades are super-simple. Just change the version number, update, and you're done.
  3. It avoids having to commit binaries to version control. They can be problematic when it comes time to merge branches.
  4. Unless you have an explicit policy of how you version the binaries in your lib directory, it's easy to lose track of what versions you have.
天冷不及心凉 2024-11-21 07:30:50

我在 github 上有一个使用 Hadoop 的 sbt 项目的非常简单的示例: https://github.com/deanwampler /scala-hadoop

查看 project/build/WordCountProject.scala,其中我定义了一个名为 ClouderaMavenRepo 的变量(它定义了 Cloudera 存储库位置),以及名为 hadoopCore 的变量>,定义了Hadoop jar的具体信息。

如果您在浏览器中访问 Cloudera 存储库,您应该能够导航到 Hive 的相应信息。

I have a very simple example of an sbt project w/ Hadoop on github: https://github.com/deanwampler/scala-hadoop.

Look in project/build/WordCountProject.scala, where I define a variable named ClouderaMavenRepo, which defines the Cloudera repository location, and the variable named hadoopCore, which defines the specific information for the Hadoop jar.

If you go to the Cloudera repo in a browser, you should be able to navigate to the corresponding information for Hive.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文