如何在 Java 中对 Hive 进行异步调用？

发布于 2024-08-20 10:14:42 字数 775 浏览 11 评论 0原文

我想以异步方式在服务器上执行 Hive 查询。 Hive 查询可能需要很长时间才能完成，因此我不想阻止调用。我目前正在使用 Thirft 进行阻塞调用（在 client.execute() 上阻塞），但我还没有看到如何进行非阻塞调用的示例。这是阻塞代码：

        TSocket transport = new TSocket("hive.example.com", 10000);
        transport.setTimeout(999999999);
        TBinaryProtocol protocol = new TBinaryProtocol(transport);
        Client client = new ThriftHive.Client(protocol);
        transport.open();
        client.execute(hql);  // Omitted HQL

        List<String> rows;
        while ((rows = client.fetchN(1000)) != null) {
            for (String row : rows) {
                // Do stuff with row
            }
        }

        transport.close();

上面的代码缺少 try/catch 块以保持简短。

有谁知道如何进行异步调用？ Hive/Thrift 可以支持吗？有更好的办法吗？

谢谢！

原文

I would like to execute a Hive query on the server in an asynchronous manner. The Hive query will likely take a long time to complete, so I would prefer not to block on the call. I am currently using Thirft to make a blocking call (blocks on client.execute()), but I have not seen an example of how to make a non-blocking call. Here is the blocking code:

        TSocket transport = new TSocket("hive.example.com", 10000);
        transport.setTimeout(999999999);
        TBinaryProtocol protocol = new TBinaryProtocol(transport);
        Client client = new ThriftHive.Client(protocol);
        transport.open();
        client.execute(hql);  // Omitted HQL

        List<String> rows;
        while ((rows = client.fetchN(1000)) != null) {
            for (String row : rows) {
                // Do stuff with row
            }
        }

        transport.close();

The code above is missing try/catch blocks to keep it short.

Does anyone have any ideas how to do an async call? Can Hive/Thrift support it? Is there a better way?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笛声青案梦长安 2024-08-27 10:14:42

AFAIK，在撰写本文时 Thrift 不会生成异步客户端。此链接此处（搜索“异步”文本）中解释的原因是Thrift 专为假设延迟较低的数据中心而设计。

不幸的是，正如您所知，调用和结果之间经历的延迟并不总是由网络引起的，而是由正在执行的逻辑引起的！我们从 Java 应用程序服务器调用 Cassandra 数据库时遇到这个问题，我们希望限制总线程数。

摘要：现在您所能做的就是确保您有足够的资源来处理所需数量的阻塞并发线程，并等待更有效的实现。

回复收藏 0 原文

守望孤独 2024-08-27 10:14:42

添加此补丁后，现在可以在 Java thrift 客户端中进行异步调用：
https://issues.apache.org/jira/browse/THRIFT-768

使用新的 thrift 生成异步 java 客户端并按如下方式初始化客户端：

TNonblockingTransport transport = new TNonblockingSocket("127.0.0.1", 9160);
TAsyncClientManager clientManager = new TAsyncClientManager();
TProtocolFactory protocolFactory = new TBinaryProtocol.Factory();
Hive.AsyncClient client = new Hive.AsyncClient(protocolFactory, clientManager, transport);

现在，您可以像在同步接口上一样在此客户端上执行方法。唯一的变化是所有方法都采用回调的附加参数。

It is now possible to make an asynchronous call in a Java thrift client after this patch was put in:
https://issues.apache.org/jira/browse/THRIFT-768

Generate the async java client using the new thrift and initialize your client as follows:

TNonblockingTransport transport = new TNonblockingSocket("127.0.0.1", 9160);
TAsyncClientManager clientManager = new TAsyncClientManager();
TProtocolFactory protocolFactory = new TBinaryProtocol.Factory();
Hive.AsyncClient client = new Hive.AsyncClient(protocolFactory, clientManager, transport);

Now you can execute methods on this client as you would on a synchronous interface. The only change is that all methods take an additional parameter of a callback.

回复收藏 0 原文

枫林﹌晚霞¤ 2024-08-27 10:14:42

我对 Hive 一无所知，但作为最后的手段，您可以使用 Java 的并发库：

 Callable<SomeResult> c = new Callable<SomeResult>(){public SomeResult call(){

    // your Hive code here

 }};

 Future<SomeResult> result = executorService.submit(c);

 // when you need the result, this will block
 result.get();

或者，如果您不需要等待结果，请使用 Runnable 而不是 Callable。

I know nothing about Hive, but as a last resort, you can use Java's concurrency library:

 Callable<SomeResult> c = new Callable<SomeResult>(){public SomeResult call(){

    // your Hive code here

 }};

 Future<SomeResult> result = executorService.submit(c);

 // when you need the result, this will block
 result.get();

Or, if you do not need to wait for the result, use Runnable instead of Callable.

回复收藏 0 原文