如何使用kettle处理非规范化数据?
Kettle 有“行标准化器”和“行非标准化器”步骤,例如
http://wiki.pentaho .com/display/EAI/Row+Normalizer,
但它们要求您手动配置非规范化表中的字段。我不明白这在实际中如何使用,因为非规范化表中的字段数量取决于规范化表中的行数,这是动态的。例如,在他们的示例中,非规范化输入表中有三个产品的三列,用户必须手动告诉转换如何处理每一列。但在实际应用中,产品数量会动态变化。因此,此转换仅在某一时刻适用于一张表。任何具有不同列数的操作都会失败。
我有数十或数百个非规范化输入文件,它们看起来非常像他们的示例,所有文件都有不同的列数。
Kettle has "row normalizer" and "row denormalizer" steps, e.g.
http://wiki.pentaho.com/display/EAI/Row+Normalizer
but they require that you manually configure the fields in the denormalized table. I don't understand how this can be used practically, since the number of fields in the denormalized table depends on the number of rows in the normalized table, which is dynamic. E.g. in their example, there are three columns for three products in the denormalized input table, and the user must manually tell the transform how to handle each one. But in a real application the number of products will change dynamically. So this transform will only work with one table, at one moment in time. Anything with a different column count will fail.
I have dozens or hundreds of denormalized input files that look very much like their example, all with different column counts.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我在非规范化方面也遇到了类似的问题。我有一个
/etc/group
文件,其结构类似于group:gid:member1,member2,....
,并且我使用User Defined 将其非规范化Java 类
组件,所以最后我有了字段group,gid,member
。我知道您需要另一个方向,但这对您来说可能是一个很好的起点。这是来源:I had a similar problem with denormalization. I had an
/etc/group
file with a structure likegroup:gid:member1,member2,....
, and I denormalized it with aUser Defined Java Class
component, so finally I have fieldsgroup,gid,member
. I know you need the other direction, but it may be a good starting point for you. Here is the source: