如何使用kettle处理非规范化数据？

发布于 2024-12-11 11:55:01 字数 402 浏览 0 评论 0原文

Kettle 有“行标准化器”和“行非标准化器”步骤，例如

http://wiki.pentaho .com/display/EAI/Row+Normalizer，

但它们要求您手动配置非规范化表中的字段。我不明白这在实际中如何使用，因为非规范化表中的字段数量取决于规范化表中的行数，这是动态的。例如，在他们的示例中，非规范化输入表中有三个产品的三列，用户必须手动告诉转换如何处理每一列。但在实际应用中，产品数量会动态变化。因此，此转换仅在某一时刻适用于一张表。任何具有不同列数的操作都会失败。

我有数十或数百个非规范化输入文件，它们看起来非常像他们的示例，所有文件都有不同的列数。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橘虞初梦 2024-12-18 11:55:01

我在非规范化方面也遇到了类似的问题。我有一个 /etc/group 文件，其结构类似于 group:gid:member1,member2,....，并且我使用 User Defined 将其非规范化Java 类 组件，所以最后我有了字段group,gid,member。我知道您需要另一个方向，但这对您来说可能是一个很好的起点。这是来源：

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
    // boilerplate
    Object[] r = getRow();
    if (r == null) {
        setOutputDone();
        return false;
    }
    if(first)
        first = false;

    String tmp = get(Fields.In, "members").getString(r);
    if(null==tmp)
        return true;
    String accounts[] = tmp.split(",");
    for(int i=0; i<accounts.length; ++i){
        Object[] out_row = RowDataUtil.allocateRowData(data.outputRowMeta.size());
        for (int j=0; j<r.length; ++j)
            out_row[j] = r[j];
        String account = accounts[i];
        get(Fields.Out, "account").setValue(out_row,account);
        putRow(data.outputRowMeta, out_row);
    }

    return true;
}

I had a similar problem with denormalization. I had an /etc/group file with a structure like group:gid:member1,member2,...., and I denormalized it with a User Defined Java Class component, so finally I have fields group,gid,member. I know you need the other direction, but it may be a good starting point for you. Here is the source:

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
    // boilerplate
    Object[] r = getRow();
    if (r == null) {
        setOutputDone();
        return false;
    }
    if(first)
        first = false;

    String tmp = get(Fields.In, "members").getString(r);
    if(null==tmp)
        return true;
    String accounts[] = tmp.split(",");
    for(int i=0; i<accounts.length; ++i){
        Object[] out_row = RowDataUtil.allocateRowData(data.outputRowMeta.size());
        for (int j=0; j<r.length; ++j)
            out_row[j] = r[j];
        String account = accounts[i];
        get(Fields.Out, "account").setValue(out_row,account);
        putRow(data.outputRowMeta, out_row);
    }

    return true;
}

回复收藏 0 原文

~没有更多了~