如何使用kettle处理非规范化数据?

发布于 2024-12-11 11:55:01 字数 402 浏览 0 评论 0原文

Kettle 有“行标准化器”和“行非标准化器”步骤,例如

http://wiki.pentaho .com/display/EAI/Row+Normalizer

但它们要求您手动配置非规范化表中的字段。我不明白这在实际中如何使用,因为非规范化表中的字段数量取决于规范化表中的行数,这是动态的。例如,在他们的示例中,非规范化输入表中有三个产品的三列,用户必须手动告诉转换如何处理每一列。但在实际应用中,产品数量会动态变化。因此,此转换仅在某一时刻适用于一张表。任何具有不同列数的操作都会失败。

我有数十或数百个非规范化输入文件,它们看起来非常像他们的示例,所有文件都有不同的列数。

Kettle has "row normalizer" and "row denormalizer" steps, e.g.

http://wiki.pentaho.com/display/EAI/Row+Normalizer

but they require that you manually configure the fields in the denormalized table. I don't understand how this can be used practically, since the number of fields in the denormalized table depends on the number of rows in the normalized table, which is dynamic. E.g. in their example, there are three columns for three products in the denormalized input table, and the user must manually tell the transform how to handle each one. But in a real application the number of products will change dynamically. So this transform will only work with one table, at one moment in time. Anything with a different column count will fail.

I have dozens or hundreds of denormalized input files that look very much like their example, all with different column counts.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

橘虞初梦 2024-12-18 11:55:01

我在非规范化方面也遇到了类似的问题。我有一个 /etc/group 文件,其结构类似于 group:gid:member1,member2,....,并且我使用 User Defined 将其非规范化Java 类 组件,所以最后我有了字段group,gid,member。我知道您需要另一个方向,但这对您来说可能是一个很好的起点。这是来源:

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
    // boilerplate
    Object[] r = getRow();
    if (r == null) {
        setOutputDone();
        return false;
    }
    if(first)
        first = false;

    String tmp = get(Fields.In, "members").getString(r);
    if(null==tmp)
        return true;
    String accounts[] = tmp.split(",");
    for(int i=0; i<accounts.length; ++i){
        Object[] out_row = RowDataUtil.allocateRowData(data.outputRowMeta.size());
        for (int j=0; j<r.length; ++j)
            out_row[j] = r[j];
        String account = accounts[i];
        get(Fields.Out, "account").setValue(out_row,account);
        putRow(data.outputRowMeta, out_row);
    }

    return true;
}

I had a similar problem with denormalization. I had an /etc/group file with a structure like group:gid:member1,member2,...., and I denormalized it with a User Defined Java Class component, so finally I have fields group,gid,member. I know you need the other direction, but it may be a good starting point for you. Here is the source:

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
    // boilerplate
    Object[] r = getRow();
    if (r == null) {
        setOutputDone();
        return false;
    }
    if(first)
        first = false;

    String tmp = get(Fields.In, "members").getString(r);
    if(null==tmp)
        return true;
    String accounts[] = tmp.split(",");
    for(int i=0; i<accounts.length; ++i){
        Object[] out_row = RowDataUtil.allocateRowData(data.outputRowMeta.size());
        for (int j=0; j<r.length; ++j)
            out_row[j] = r[j];
        String account = accounts[i];
        get(Fields.Out, "account").setValue(out_row,account);
        putRow(data.outputRowMeta, out_row);
    }

    return true;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文