如何从.NET 连接到 Hadoop/Hive

发布于 2024-09-14 12:21:27 字数 154 浏览 6 评论 0原文

我正在开发一个解决方案,其中我将有一个运行 Hive 的 Hadoop 集群,并且我想从 .NET 应用程序发送作业和 Hive 查询进行处理,并在完成时收到通知。除了直接从 Java 应用程序连接之外,我找不到任何与 Hadoop 交互的解决方案,是否有一个我可以访问但我没有找到的 API?

I am working on a solution where I will have a Hadoop cluster with Hive running and I want to send jobs and hive queries from a .NET application to be processed and get notified when they are done. I can't find any solutions for interfacing with Hadoop other than directly from a Java app, is there an API I can access that I am just not finding?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

幻想少年梦 2024-09-21 12:21:27

显然,可以使用非 Java 解决方案连接到 Hadoop - 请参阅 我是否必须在其中编写应用程序Java?

Apparently it is possible to connect to Hadoop with non-Java solutions - see Do I have to write my application in Java?

嗳卜坏 2024-09-21 12:21:27

对于 Hadoop:没有直接的方法从 C# 连接,因为 Hadoop 通信层仅使用 java,而不是跨平台。这可能是可能的,但以非常重要的方式。
我知道有一个补丁可以添加对 Hadoop 的 Protocol Buffers 支持,但在撰写本文时(2011 年 8 月)尚未发布。

使用 Hive 情况会更好,因为 Hive 有支持 C# 的 Thrift 接口。您可以下载 Hive Thrift 接口并自行生成 C# 客户端,但请注意,它需要对生成的代码进行一些修改。相反,我建议您从 https://bitbucket.org 下载 dll /vadim/hive-sharp/downloads/hive-sharp-lib.dll 或使用 Nuget 包管理器,搜索“hive”: http://nuget.org/List/Packages/Hive.Sharp.Lib
免责声明:我是作者。

With Hadoop: there is no straight way to connect from C# because Hadoop communication tier is working with java only and is not cross platform. It is probably possible but in very non-trivial ways.
I know there is a patch to add Protocol Buffers support for Hadoop but at the moment of writing (Aug 2011) is is not released yet.

With Hive situation is better because Hive has Thrift interface which supports C#. You can download Hive Thrift interfaces and generate C# client on your own but beware that it requires some hacking of generated code. Instead I would recommend you downloading dll from https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll or use Nuget package manager, search for "hive": http://nuget.org/List/Packages/Hive.Sharp.Lib
Disclaimer: I'm the author.

怂人 2024-09-21 12:21:27
  1. 有 Hortonworks ODBC 驱动程序。我个人没有使用过它,但它可以让您像使用任何其他 ODBC 数据源一样使用 hive。一旦安装了 ODBC 驱动程序,您就可以使用 OdbcConnection 类连接到 Hive。

  2. 正如其他答案中所述 - 您可以使用 Thrift api。为此,您需要从接口定义文件生成 C# 类,您可以从 Hive 源存储库下载该文件。这种方法对我有用。

  3. 您可以使用 IKVM 将 hadoop 客户端 Java 库转换为可以在 C# 中使用的 .Net 程序集。我没有将 IKVM 与 Hive 客户端一起使用,但我使用了一些其他 hadoop 客户端库,令人惊讶的是它有效。

编辑:

  1. 还有 Apache Templeton,它允许使用 Rest 接口提交 Hive 作业(也包括 Pig 和 MR)。它的问题是它会产生另一个映射任务来提交 Hive 作业,这使得它变慢。
  1. There is Hortonworks ODBC driver. I havn't used it personally, but it shall let you work with hive as with any other ODBC datasource. You can use OdbcConnection class to connect to Hive once ODBC driver is installed.

  2. As noted in other answers - you can use Thrift api. For that you need to generate C# classes from interface definition files, which you can download from Hive source repository. This approach works for me.

  3. You can use IKVM, to convert hadoop client java libraries into .Net assemblies which you can use from C#. I havn't used IKVM with Hive client, but I've IKVMed some other hadoop client library and surprisingly it worked.

EDIT:

  1. There's also Apache templeton, which allows submitting Hive jobs (Pig and MR also) using Rest interface. The problem with it is that it spawns another map task to submit Hive job, which makes it slower.
爱本泡沫多脆弱 2024-09-21 12:21:27

可以通过 Microsoft 的 ODBC 连接器使用 C# 访问 Hive。下载“Microsoft.Hadoop.Hive”的 Nuget 包并按照 http://msdn.microsoft.com/en-us/library/dn749834.aspx

技巧在于构建连接字符串以与其连接。我想出的最好方法是下载 Microsoft Hive ODBC 驱动程序 (http://www.microsoft.com/en-us/download/details.aspx?id=40886),安装它,然后使用 Visual Studio 内的服务器资源管理器添加新连接,然后构建我的连接字符串。为此,我使用了以下步骤:

  • 将数据源更改为“Microsoft ODBC 数据源”,并确保使用“.NET Framework Data Provider for ODBC”作为数据提供程序。

更改数据源对话框窗口

  • 在“数据源规范”部分下,选中“使用连接字符串”,然后单击“构建” “ 按钮。

添加连接对话框窗口

  • 在“机器数据源”选项卡下,选择“Sample Microsoft Hive DSN”数据源名称,然后单击“确定”按钮。

选择数据源对话框窗口

  • 将打开一个标题为“Microsoft Hive ODBC 驱动程序连接对话框”的窗口。输入可选描述,然后输入 Hive 服务器的路径、您将使用的端口以及应连接到的数据库。指示 Hive 服务器类型,并指定要使用的身份验证机制,然后填写相应的字段。

Microsoft Hive ODBC 驱动程序连接对话框窗口

  • 最后,单击底部的“测试”按钮以确保您能够连接成功。如果成功,单击“确定”按钮,然后您将返回“修改连接”窗口。在此处输入 Hive 服务的登录信息。

使用此数据源或复制为您构建的连接字符串并在您的应用程序中使用它。

It is possible to access Hive utilizing C# by making use of Microsoft's ODBC connector. Download the Nuget package for "Microsoft.Hadoop.Hive" and follow the example provided at http://msdn.microsoft.com/en-us/library/dn749834.aspx

The trick lies in building the connection string to connect with it. The best way I came up with was to download the Microsoft Hive ODBC Driver (http://www.microsoft.com/en-us/download/details.aspx?id=40886), install it, then use the Server Explorer inside Visual Studio to add a new connection, then build the connection string for me. To do this, I used the following steps:

  • Change the data source to "Microsoft ODBC Data Source" and ensure you're using the ".NET Framework Data Provider for ODBC" as the data provider.

Change Data Source Dialog Window

  • Under the "Data source specification" portion, check the "Use connection string" then click the "Build" button.

Add Connection Dialog Window

  • Under the "Machine Data Source" tab, select the "Sample Microsoft Hive DSN" data source name, then click the "OK" button.

Select Data Source Dialog Window

  • A window titled "Microsoft Hive ODBC Driver Connection Dialog" will open. Enter an optional description, then type in the path to your Hive server, the port you will be using, and what database it should connect to. Indicate the Hive Server Type, and specify an authentication mechanism to use, then fill out the appropriate fields.

Microsoft Hive ODBC Driver Connection Dialog Window

  • Finally, click the "Test" button in the bottom to ensure that you're able to successfully connect. If successful, click the "OK" button, then you'll be back in the "Modify Connection" window. Enter the login information for your Hive service here.

Either utilize this data source or copy the connection string that it's built for you and use it within your application.

没︽人懂的悲伤 2024-09-21 12:21:27

Thrift API也是其他语言访问hdfs和hive的另一种方式

Thrift API is also another way for other language to access hdfs and hive

如歌彻婉言 2024-09-21 12:21:27

看看这是否有帮助。我尝试通过 C# 连接到 Hadoop

如何使用.NET/C#通过Hive与Hadoop通信

See if this helps. I have tried to connect to Hadoop via C#

How to communicate to Hadoop via Hive using .NET/C#

深空失忆 2024-09-21 12:21:27

使用 https://hbasenet.codeplex.com/ 中的 Hbase.Net 库

然后你可以连接到 hbase/hive如下所示:

        Client c = new Client("10.20.14.179", 9090, 1000000);

        var cli = c.TotalClients;

        var tableList = c.GetTableNames();

仅供参考,我们使用的是hortonworks沙箱,它连接得很好。

在上面的示例中,10.20.14.179 是主机,9090 是端口。

另外,以下可能会有所帮助 https://community.hortonworks.com/questions/25101/is-there-a-way-to-connect-to-hbase-using-c.html

没有本机 C# HBase 客户端。但是,有多种选项可以通过 C# 与 HBase 进行交互。

  1. C# HBase Thrift 客户端 - Thrift 允许定义服务端点
    和数据模型采用通用格式并使用代码生成器
    创建特定于语言的绑定。 HBase 提供了一个 Thirft 服务器和
    定义。网上有很多创建 C# HBase 的示例
    节俭客户。

  2. Marlin - Marlin 是一个 C# 客户端,用于与 Stargate (HBase
    REST API)最终成为 hbase-sdk-for-net。我没有
    个人针对 HBase 1.x+ 进行了测试,但考虑到它使用
    星际之门,我希望它应该有效。如果您打算使用
    Stargate 并实现你自己的客户端,我会推荐它
    Thrift,确保使用 protobufs 以避免 JSON 序列化
    开销。使用基于 HTTP 的方法也使得更容易
    通过多个网关的负载平衡请求。

  3. Phoenix 查询服务器 - Phoenix 是 HBase 上的 SQL 皮肤。凤凰查询
    Server 是一个 REST API,用于向 Phoenix 提交 SQL 查询。这是
    一些示例代码,但是我还没有测试过。

  4. Simba HBase ODBC 驱动程序 - 使用 ODBC 连接到 HBase。我听说过
    对这种方法的积极反馈,尤其是来自诸如
    画面。这不是开源的,需要购买许可证。

Use Hbase.Net library from https://hbasenet.codeplex.com/

Then you can connect to hbase/hive as shown below:

        Client c = new Client("10.20.14.179", 9090, 1000000);

        var cli = c.TotalClients;

        var tableList = c.GetTableNames();

FYI, we are using hortonworks sandbox and it connects fine.

In above example, 10.20.14.179 is host and 9090 is port.

Also, below might help from https://community.hortonworks.com/questions/25101/is-there-a-way-to-connect-to-hbase-using-c.html

There is no native C# HBase client. however, there are several options for interacting with HBase from C#.

  1. C# HBase Thrift client - Thrift allows for defining service endpoints
    and data models in a common format and using code generators to
    create language specific bindings. HBase provides a Thirft server and
    definitions. There are many examples online for creating a C# HBase
    Thrift Client.

  2. Marlin - Marlin is a C# client for interacting with Stargate (HBase
    REST API) that ultimately became hbase-sdk-for-net. I have not
    personally tested this against HBase 1.x+, but considering it uses
    Stargate, I expect it should work. If you are planning to use
    Stargate and implement your own client, which I would recommend over
    Thrift, make sure to use protobufs to avoid the JSON serialization
    overhead. Using a HTTP based approach also makes it much easier to
    load balance requests over multiple gateways.

  3. Phoenix Query Server - Phoenix is a SQL skin on HBase. Phoenix Query
    Server is a REST API for submitting SQL queries to Phoenix. Here is
    some example code, however, I have not yet tested it.

  4. Simba HBase ODBC Driver - Using ODBC to connect to HBase. I've heard
    positive feedback on this approach, especially from tools like
    Tableau. This is not open source and requires purchasing a license.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文