如何让 HBase 与 sbt 的依赖管理完美配合?
我正在尝试启动一个使用 CDH3 的 Hadoop 和 HBase 的 sbt 项目。我正在尝试使用 project/build/Project.scala 文件来声明对 HBase 和 Hadoop 的依赖关系。 (我承认我对 sbt、maven 和 ivy 的掌握有点薄弱。如果我说了或做了一些愚蠢的事情,请原谅我。)
一切都在 Hadoop 依赖项下进行得很顺利。添加 HBase 依赖项会导致对 Thrift 0.2.0 的依赖,而该版本似乎没有存储库,或者从这里听起来是这样的 SO post。
所以,真的,我有两个问题: 1. 老实说,我不想依赖Thrift,因为我不想使用HBase的Thrift接口。有没有办法告诉 sbt 跳过它? 2. 有更好的设置方法吗?我应该将 HBase jar 转储到 lib 目录中然后继续吗?
更新 这是 sbt 0.10 build.sbt 文件,它实现了我想要的:
scalaVersion := "2.9.0-1"
resolvers += "ClouderaRepo" at "https://repository.cloudera.com/content/repositories/releases"
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-core" % "0.20.2-cdh3u0",
"org.apache.hbase" % "hbase" % "0.90.1-cdh3u0"
)
ivyXML :=
<dependencies>
<exclude module="thrift"/>
</dependencies>
I'm trying to get an sbt project going which uses CDH3's Hadoop and HBase. I'm trying to using a project/build/Project.scala file to declare dependencies on HBase and Hadoop. (I'll admit my grasp of sbt, maven, and ivy is a little weak. Please pardon me if I'd saying or doing something dumb.)
Everything went swimmingly with the Hadoop dependency. Adding the HBase dependency resulted in a dependency on Thrift 0.2.0, for which there doesn't appear to be a repo, or so it sounds from this SO post.
So, really, I have two questions:
1. Honestly, I don't want a dependency on Thrift because I don't want to use HBase's Thrift interface. Is there a way to tell sbt to skip it?
2. Is there some better way to set this up? Should I just dump the HBase jar in the lib directory and move on?
Update This is the sbt 0.10 build.sbt file that accomplished what I wanted:
scalaVersion := "2.9.0-1"
resolvers += "ClouderaRepo" at "https://repository.cloudera.com/content/repositories/releases"
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-core" % "0.20.2-cdh3u0",
"org.apache.hbase" % "hbase" % "0.90.1-cdh3u0"
)
ivyXML :=
<dependencies>
<exclude module="thrift"/>
</dependencies>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
查看 HBase POM 文件,Thrift位于 http://people.apache.org/~rawson/repo 的存储库中。您可以将其添加到您的项目中,它应该会找到 Thrift。我以为 SBT 会解决这个问题,但这是 SBT、Ivy 和 Maven 的交集,所以谁能真正说出应该发生什么。
如果您确实不需要 Thrift,则可以使用内联 Ivy XML 排除依赖项,如文档 SBT 维基。
回复:将 jar 转储到 lib 目录中,这将是短期收益,长期损失。这当然更方便,如果这是你下周要扔掉的一些概念证明,那么肯定只是扔进罐子里然后忘记它。但对于任何生命周期超过几个月的项目来说,花时间进行正确的依赖管理是值得的。
虽然所有这些工具都面临挑战,但好处是:
Looking at the HBase POM file, Thrift is in the repo at http://people.apache.org/~rawson/repo. You can add that to your project, and it should find Thrift. I thought that SBT would have figured that out, but this is an intersection of SBT, Ivy and Maven, so who can really say what really should happen.
If you really don't need Thrift, you can exclude dependencies using inline Ivy XML, as documented on the SBT wiki.
Re: dumping the jar in the lib directory, that would be a short term gain, long term loss. It's certainly more expedient, and if this is some proof of concept you're throwing away next week, sure just drop in the jar and forget about it. But for any project that has a lifespan greater than a couple of months, it's worth it to spend the time to get dependency management right.
While all of these tools have their challenges, the benefits are:
我在 github 上有一个使用 Hadoop 的 sbt 项目的非常简单的示例: https://github.com/deanwampler /scala-hadoop。
查看
project/build/WordCountProject.scala
,其中我定义了一个名为ClouderaMavenRepo
的变量(它定义了 Cloudera 存储库位置),以及名为hadoopCore
的变量>,定义了Hadoop jar的具体信息。如果您在浏览器中访问 Cloudera 存储库,您应该能够导航到 Hive 的相应信息。
I have a very simple example of an sbt project w/ Hadoop on github: https://github.com/deanwampler/scala-hadoop.
Look in
project/build/WordCountProject.scala
, where I define a variable namedClouderaMavenRepo
, which defines the Cloudera repository location, and the variable namedhadoopCore
, which defines the specific information for the Hadoop jar.If you go to the Cloudera repo in a browser, you should be able to navigate to the corresponding information for Hive.