如何在Java程序中使用Sqoop?

发布于 2025-01-04 17:44:41 字数 65 浏览 5 评论 0原文

我知道如何通过命令行使用 sqoop 。 但不知道如何使用java程序调用sqoop命令。 谁能提供一些代码视图吗?

I know how to use sqoop through command line.
But dont know how to call sqoop command using java programs .
Can anyone give some code view?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

笔落惊风雨 2025-01-11 17:44:41

您可以通过在类路径中包含 sqoop jar 并调用 Sqoop.runTool() 方法,从 java 代码内部运行 sqoop。您必须以编程方式创建 sqoop 所需的参数,就好像它是命令行一样(例如 --connect 等)。

请注意以下事项:

  • 确保 sqoop 工具名称(例如导入/导出等)是第一个参数。
  • 注意类路径顺序 - 执行可能会失败,因为 sqoop 需要库的 X 版本,而您使用了不同的版本。确保 sqoop 所需的库不会被您自己的依赖项所掩盖。我在使用 commons-io 时遇到了这样的问题(sqoop 需要 v1.4),并且自从我使用 commons-io v1.2 以来就出现了 NoSuchMethod 异常。
  • 每个参数都需要位于单独的数组元素上。例如,“--connect jdbc:mysql:...”应该作为数组中的两个单独元素传递,而不是一个。
  • sqoop 解析器知道如何接受双引号参数,因此如果需要,请使用双引号(我建议始终)。唯一的例外是 fields-delimited-by 参数,它需要单个字符,所以不要用双引号它。
  • 我建议将命令行参数创建逻辑和实际执行分开,以便可以在不实际运行该工具的情况下正确测试您的逻辑。
  • 最好使用--hadoop-home参数,以防止对环境的依赖。
  • Sqoop.Main() 相比,Sqoop.runTool() 的优点是 runTool() 返回以下错误代码:执行。

希望有帮助。

final int ret = Sqoop.runTool(new String[] { ... });
if (ret != 0) {
  throw new RuntimeException("Sqoop failed - return code " + Integer.toString(ret));
}

RL

You can run sqoop from inside your java code by including the sqoop jar in your classpath and calling the Sqoop.runTool() method. You would have to create the required parameters to sqoop programmatically as if it were the command line (e.g. --connect etc.).

Please pay attention to the following:

  • Make sure that the sqoop tool name (e.g. import/export etc.) is the first parameter.
  • Pay attention to classpath ordering - The execution might fail because sqoop requires version X of a library and you use a different version. Ensure that the libraries that sqoop requires are not overshadowed by your own dependencies. I've encountered such a problem with commons-io (sqoop requires v1.4) and had a NoSuchMethod exception since I was using commons-io v1.2.
  • Each argument needs to be on a separate array element. For example, "--connect jdbc:mysql:..." should be passed as two separate elements in the array, not one.
  • The sqoop parser knows how to accept double-quoted parameters, so use double quotes if you need to (I suggest always). The only exception is the fields-delimited-by parameter which expects a single char, so don't double-quote it.
  • I'd suggest splitting the command-line-arguments creation logic and the actual execution so your logic can be tested properly without actually running the tool.
  • It would be better to use the --hadoop-home parameter, in order to prevent dependency on the environment.
  • The advantage of Sqoop.runTool() as opposed to Sqoop.Main() is the fact that runTool() return the error code of the execution.

Hope that helps.

final int ret = Sqoop.runTool(new String[] { ... });
if (ret != 0) {
  throw new RuntimeException("Sqoop failed - return code " + Integer.toString(ret));
}

RL

泡沫很甜 2025-01-11 17:44:41

下面是在 Java 程序中使用 sqoop 将数据从 MySQL 导入到 HDFS/HBase 的示例代码。确保类路径中有 sqoop jar:

        SqoopOptions options = new SqoopOptions();
        options.setConnectString("jdbc:mysql://HOSTNAME:PORT/DATABASE_NAME");
        //options.setTableName("TABLE_NAME");
        //options.setWhereClause("id>10");     // this where clause works when importing whole table, ie when setTableName() is used
        options.setUsername("USERNAME");
        options.setPassword("PASSWORD");
        //options.setDirectMode(true);    // Make sure the direct mode is off when importing data to HBase
        options.setNumMappers(8);         // Default value is 4
        options.setSqlQuery("SELECT * FROM user_logs WHERE $CONDITIONS limit 10");
        options.setSplitByCol("log_id");

        // HBase options
        options.setHBaseTable("HBASE_TABLE_NAME");
        options.setHBaseColFamily("colFamily");
        options.setCreateHBaseTable(true);    // Create HBase table, if it does not exist
        options.setHBaseRowKeyColumn("log_id");

        int ret = new ImportTool().run(options);

按照 Harel 的建议,我们可以使用 run() 方法的输出来进行错误处理。希望这有帮助。

Find below a sample code for using sqoop in Java Program for importing data from MySQL to HDFS/HBase. Make sure you have sqoop jar in your classpath:

        SqoopOptions options = new SqoopOptions();
        options.setConnectString("jdbc:mysql://HOSTNAME:PORT/DATABASE_NAME");
        //options.setTableName("TABLE_NAME");
        //options.setWhereClause("id>10");     // this where clause works when importing whole table, ie when setTableName() is used
        options.setUsername("USERNAME");
        options.setPassword("PASSWORD");
        //options.setDirectMode(true);    // Make sure the direct mode is off when importing data to HBase
        options.setNumMappers(8);         // Default value is 4
        options.setSqlQuery("SELECT * FROM user_logs WHERE $CONDITIONS limit 10");
        options.setSplitByCol("log_id");

        // HBase options
        options.setHBaseTable("HBASE_TABLE_NAME");
        options.setHBaseColFamily("colFamily");
        options.setCreateHBaseTable(true);    // Create HBase table, if it does not exist
        options.setHBaseRowKeyColumn("log_id");

        int ret = new ImportTool().run(options);

As suggested by Harel, we can use the output of the run() method for error handling. Hoping this helps.

咋地 2025-01-11 17:44:41

有一个技巧对我来说非常有效。通过ssh,可以直接执行Sqoop命令。只是你必须使用的是一个 SSH Java 库,

它独立于 Java。您只需包含要执行导入的远程系统中安装的任何 SSH 库和 sqoop。现在通过 ssh 连接到系统并执行将数据从 MySQL 导出到 hive 的命令。

你必须遵循这个步骤。

下载 sshxcute Java 库:https://code.google.com/p/sshxcute/
并将其添加到您的java项目的构建路径中,其中包含以下Java代码

import net.neoremind.sshxcute.core.SSHExec;
import net.neoremind.sshxcute.core.ConnBean;
import net.neoremind.sshxcute.task.CustomTask;
import net.neoremind.sshxcute.task.impl.ExecCommand;

public class TestSSH {

public static void main(String args[]) throws Exception{

    // Initialize a ConnBean object, the parameter list is IP, username, password

    ConnBean cb = new ConnBean("192.168.56.102", "root","hadoop");

    // Put the ConnBean instance as parameter for SSHExec static method getInstance(ConnBean) to retrieve a singleton SSHExec instance
    SSHExec ssh = SSHExec.getInstance(cb);          
    // Connect to server
    ssh.connect();
    CustomTask sampleTask1 = new ExecCommand("echo $SSH_CLIENT"); // Print Your Client IP By which you connected to ssh server on Horton Sandbox
    System.out.println(ssh.exec(sampleTask1));
    CustomTask sampleTask2 = new ExecCommand("sqoop import --connect jdbc:mysql://192.168.56.101:3316/mysql_db_name --username=mysql_user --password=mysql_pwd --table mysql_table_name --hive-import -m 1 -- --schema default");
    ssh.exec(sampleTask2);
    ssh.disconnect();   
}
}

There is a trick which worked out for me pretty well. Via ssh, you can execute the Sqoop command directly. Just you have to use is an SSH Java Library

This is independent of Java. You just need to include any SSH library and sqoop installed in the remote system you want to perform the import. Now connect to the system via ssh and execute the commands which will export data from MySQL to hive.

You have to follow this step.

Download sshxcute java library: https://code.google.com/p/sshxcute/
and Add it to the build path of your java project which contains the following Java code

import net.neoremind.sshxcute.core.SSHExec;
import net.neoremind.sshxcute.core.ConnBean;
import net.neoremind.sshxcute.task.CustomTask;
import net.neoremind.sshxcute.task.impl.ExecCommand;

public class TestSSH {

public static void main(String args[]) throws Exception{

    // Initialize a ConnBean object, the parameter list is IP, username, password

    ConnBean cb = new ConnBean("192.168.56.102", "root","hadoop");

    // Put the ConnBean instance as parameter for SSHExec static method getInstance(ConnBean) to retrieve a singleton SSHExec instance
    SSHExec ssh = SSHExec.getInstance(cb);          
    // Connect to server
    ssh.connect();
    CustomTask sampleTask1 = new ExecCommand("echo $SSH_CLIENT"); // Print Your Client IP By which you connected to ssh server on Horton Sandbox
    System.out.println(ssh.exec(sampleTask1));
    CustomTask sampleTask2 = new ExecCommand("sqoop import --connect jdbc:mysql://192.168.56.101:3316/mysql_db_name --username=mysql_user --password=mysql_pwd --table mysql_table_name --hive-import -m 1 -- --schema default");
    ssh.exec(sampleTask2);
    ssh.disconnect();   
}
}
一片旧的回忆 2025-01-11 17:44:41

如果您知道可执行文件的位置和命令行参数,则可以使用 ProcessBuilder,然后可以运行一个单独的 Process,Java 可以监视该进程的完成情况和返回代码。

If you know the location of the executable and the command line arguments you can use a ProcessBuilder, this can then be run a separate Process that Java can monitor for completion and return code.

不喜欢何必死缠烂打 2025-01-11 17:44:41

请遵循 vikas 给出的对我有用的代码,并将这些 jar 文件包含在类路径中并导入这些包

import com.cloudera.sqoop.SqoopOptions;
导入com.cloudera.sqoop.tool.ImportTool;

参考库

  1. Sqoop-1.4.4 jar /sqoop
  2. ojdbc6.jar /sqoop/lib(对于 Oracle)
  3. commons-logging-1.1.1.jar hadoop/lib
  4. hadoop-core-1.2.1.jar /hadoop
  5. commons-cli-1.2。 jar hadoop/lib
  6. commmons-io.2.1.jar hadoop/lib
  7. commons-configuration-1.6.jar hadoop/lib
  8. commons-lang-2.4.jar hadoop/lib
  9. jackson-core-asl-1.8.8.jar hadoop/lib
  10. jackson-mapper-asl-1.8.8.jar hadoop/lib
  11. commons-httpclient-3.0.1.jar hadoop/lib

JRE系统库

1.resources.jar jdk/jre/lib
2.rt.jar jdk/jre/lib
3.jsse.jar jdk/jre/lib
4. jce.jar jdk/jre/lib
5. 字符集、jar jdk/jre/lib
6. jfr.jar jdk/jre/lib
7. dnsns.jar jdk/jre/lib/ext
8. sunec.jar jdk/jre/lib/ext
9. zipfs.jar jdk/jre/lib/ext
10. sunpkcs11.jar jdk/jre/lib/ext
11.localedata.jar jdk/jre/lib/ext
12. sunjce_provider.jar jdk/jre/lib/ext

如果您的 eclipse 项目使用 JDK1.6,并且您添加的库是 JDK1.7,则有时您会收到错误,在这种情况下,在 eclipse 中创建项目时配置 JRE。

Vikas 如果我想将导入的文件放入 hive 中,我应该使用 options.parameter ("--hive-import") 吗?

Please follow the code given by vikas it worked for me and include these jar files in classpath and import these packages

import com.cloudera.sqoop.SqoopOptions;
import com.cloudera.sqoop.tool.ImportTool;

Ref Libraries

  1. Sqoop-1.4.4 jar /sqoop
  2. ojdbc6.jar /sqoop/lib (for oracle)
  3. commons-logging-1.1.1.jar hadoop/lib
  4. hadoop-core-1.2.1.jar /hadoop
  5. commons-cli-1.2.jar hadoop/lib
  6. commmons-io.2.1.jar hadoop/lib
  7. commons-configuration-1.6.jar hadoop/lib
  8. commons-lang-2.4.jar hadoop/lib
  9. jackson-core-asl-1.8.8.jar hadoop/lib
  10. jackson-mapper-asl-1.8.8.jar hadoop/lib
  11. commons-httpclient-3.0.1.jar hadoop/lib

JRE system library

1.resources.jar jdk/jre/lib
2.rt.jar jdk/jre/lib
3. jsse.jar jdk/jre/lib
4. jce.jar jdk/jre/lib
5. charsets,jar jdk/jre/lib
6. jfr.jar jdk/jre/lib
7. dnsns.jar jdk/jre/lib/ext
8. sunec.jar jdk/jre/lib/ext
9. zipfs.jar jdk/jre/lib/ext
10. sunpkcs11.jar jdk/jre/lib/ext
11. localedata.jar jdk/jre/lib/ext
12. sunjce_provider.jar jdk/jre/lib/ext

Sometimes u get error if your eclipse project is using JDK1.6 and the libraries you add are JDK1.7 for this case configure JRE while creating project in eclipse.

Vikas if i want to put the imported files into hive should i use options.parameter ("--hive-import") ?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文