Java ETL:很难找到合适的

发布于 2024-10-03 20:18:51 字数 923 浏览 7 评论 0原文

我正在寻找一个可嵌入的Java ETL,即可以从Java 代码调用的Extract Transform Load 引擎。

我发现找到一个合适的人非常困难。

我主要关注将分隔文本文件加载到数据库表中,并在此过程中进行一些小的转换。

我想要以下功能:

  • 能够在外部指定简单映射,例如,文本列 5 到数据库列 foo,指定一些 xml 映射文件
  • 能够为数据库节点提供 javax.sql.Datasource

CloverETL 允许映射到在 XML 中指定,但数据库连接必须是 JNDI 名称或指定 driverClass、url、dbusername、密码等的属性文件。因为我已经通过依赖项设置了 javax.sql.Datasource在注入框架中,属性文件看起来很痛苦且不健壮,特别是如果我希望它在多个环境(开发、测试、生产)中工作。

KETL 告诉我,“我们目前正在彻底修改 KETL™ 文档。因此,仅更新了安装指南。”诚实,但没有帮助。

Octopus 现在是“http://www.together.at/prod/database/tdt”,正在“建设中”。

Pentaho似乎使用与 CloverETL 相同的“指定 driverClass”风格,而不是使用数据源,但是 Pentaho 从 java 代码调用其引擎的文档很难找到。

基本上我真的很希望能够执行这个伪代码:

extractTransformLoad(         
        getInputFile( "input.csv" ) , 
        getXMLMapping( "myMappingFile.xml") ,
        new DatabaseWriter( getDatasource() );

有什么建议吗?

I'm looking for an embeddable Java ETL, i.e., an Extract Transform Load engine that can be called from Java code.

I'm finding it surprisingly hard to find a suitable one.

I'm mainly looking at loading delimited text files into database tables, with some minor transforms along the way.

I'd like the following features:

  • the ability to specify the simple mappings externally, e.g, text column 5 to database column foo, specified some xml mapping file
  • the ability to give the the database node a javax.sql.Datasource

CloverETL allows mapping to be specified in XML, but database connections must be either JNDI names or a properties file specifying driverClass, url, dbusername, password, etc. Since I already have javax.sql.Datasources set up by my dependency injection framework, properties files seem painful and non-robust, especially if I want this to work in several environments (dev, test, prod).

KETL tells me that "We are currently in the process of completely overhauling our documentation for KETL™. Because of this, only the installation guide has been updated." Honest, but not helpful.

Octopus is now "http://www.together.at/prod/database/tdt", which is "under construction".

Pentaho seems to use the same "specify driverClass" style that CloverETL does, rather that using a datasource, but Pentaho's documentation for calling their engine from java code is just difficult to find.

Basically I'd really like to be able to do this pseudo-code:

extractTransformLoad(         
        getInputFile( "input.csv" ) , 
        getXMLMapping( "myMappingFile.xml") ,
        new DatabaseWriter( getDatasource() );

Any suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

时光是把杀猪刀 2024-10-10 20:18:51

披露:我是 Scriptella ETL 的作者,但我相信这个工具可能对您的情况有用。

它是一种轻量级开源 ETL,与 Java 进行单行集成。它还支持 Spring Framework 并附带适用于 CSV、文本、XML、Excel 和其他数据源的内置驱动程序

将 CSV 文件导入表的示例:

<!DOCTYPE etl SYSTEM "http://scriptella.org/dtd/etl.dtd">
<etl>
  <connection id="in" driver="csv" url="data.csv" />
  <connection id="out" driver="oracle" url="jdbc:oracle:thin:@localhost:1521:ORCL" 
      classpath="ojdbc14.jar" user="scott" password="tiger" />
  <!-- Copy all CSV rows to a database table -->
  <query connection-id="in">
      <!-- Empty query means select all columns -->
      <script connection-id="out">
          INSERT INTO Table_Name VALUES (?id,?priority, ?summary, ?status)
      </script>
  </query>
</etl>

从 Java 运行:

// Execute etl.xml file
EtlExecutor.newExecutor(new File("etl.xml")).execute();

从命令行运行:

scriptella [file_name]

与 Spring 集成:

  1. 使用 "spring" 驱动程序和 bean 名称来引用数据源。示例:

    >
    
  2. EtlExecutorBean 添加到应用程序上下文以执行作业:

    
        <属性名称=“configLocation”值=“create-db.etl.xml”/>
        <属性名称=“progressIndicator”>
        <属性名称=“自动启动”值=“true”/> 
    
    

有关其他详细信息,请参阅 Spring 示例

Disclosure: I'm the author of Scriptella ETL, but I believe this tool might be useful for your case.

It's a lightweight open source ETL with a one-liner integration with Java. It also supports Spring Framework and comes with built-in drivers for CSV, text, XML, Excel and other data-sources.

Example of importing a CSV file into a table:

<!DOCTYPE etl SYSTEM "http://scriptella.org/dtd/etl.dtd">
<etl>
  <connection id="in" driver="csv" url="data.csv" />
  <connection id="out" driver="oracle" url="jdbc:oracle:thin:@localhost:1521:ORCL" 
      classpath="ojdbc14.jar" user="scott" password="tiger" />
  <!-- Copy all CSV rows to a database table -->
  <query connection-id="in">
      <!-- Empty query means select all columns -->
      <script connection-id="out">
          INSERT INTO Table_Name VALUES (?id,?priority, ?summary, ?status)
      </script>
  </query>
</etl>

Running from Java:

// Execute etl.xml file
EtlExecutor.newExecutor(new File("etl.xml")).execute();

Running from command-line:

scriptella [file_name]

Integration with Spring:

  1. Use "spring" driver and the name of the bean to references data-sources. Example:

    <connection id="spring" driver="spring" url="datasourceBeanName" />
    
  2. Add EtlExecutorBean to the application context in order to execute the job:

    <bean id="createDb" class="scriptella.driver.spring.EtlExecutorBean">
        <property name="configLocation" value="create-db.etl.xml" />
        <property name="progressIndicator"><ref local="progress" /></property>
        <property name="autostart" value="true" /> <!-- Etl will be run during app context initialization -->
    </bean>
    

For additional details see the Spring example.

开始看清了 2024-10-10 20:18:51

您了解 Talend 吗?

它是一个基于 Eclipse (Talend Open Studio) 的工具,但您可以通过编写自己的代码或将作业导出到 Java 类来直接在 Java 中使用它。

Do you know Talend?

It's a tool based on Eclipse (Talend Open Studio), but you can use it directly in Java by writing your own code or by exporting jobs to Java classes.

裸钻 2024-10-10 20:18:51

这里是所有基于java的开源ETL库的列表。我发现您已经评估了其中的一些,但还有更多。另外,这似乎是 https://stackoverflow.com/ 的重复项questions/272517/please-recommend-a-powerful-java-based-etl-framework

Here is a list of all the java based open source ETL libraries. I see you have evaluated few of them already but there are more. Also this seems to be a duplicate of https://stackoverflow.com/questions/272517/please-recommend-a-powerful-java-based-etl-framework

彼岸花ソ最美的依靠 2024-10-10 20:18:51

CloverETL Engine 易于嵌入且可扩展,因此您可以编写自己的连接并将其插入到 CLoverETL 中。 DBConnection 对象在 CloverETL 3.1 中将略有变化,以使其更具可扩展性,并且其后代的实现(使用 DataSource 来连接数据库)将变得轻而易举。

CloverETL Engine is easily embeddable as well as extendible, so you can write your own connection and plug it in to CLoverETL. The DBConnection object will be slightly changed in CloverETL 3.1, to be more extendible and the implementation of its descendant, that uses DataSource for connection to database will be as a child's play.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文