SSIS事务数据(不同记录类型,一个文件)

发布于 2024-07-24 14:02:08 字数 398 浏览 4 评论 0原文

有趣的是,我们正在评估用于预处理报表数据(例如水电费账单、银行报表)以进行打印的 ETL 工具。

一些数据以不同的记录类型存储在单个平面文件中。

例如,以“01”作为第一个字段的记录类型将是地址数据。 这将包含名称和地址字段。 带有“02”的记录类型将是汇总数据,包含余额和总计。 记录类型“03”将是报表上的行项目。

每条语句将有一条 01 和 02 记录,以及多条 03 记录。 我可以预先解析该文件并拆分为 3 个文件以加载到表中,但这不太理想。

我们获取该文件并对其进行一些操作(例如,在地址记录中添加更多字段,并且可能进行一些总计/验证),然后以几乎相同的格式发送该文件(但带有额外的字段添加)到我们的印刷排版程序中。

您将如何在 SSIS 中执行此操作?

An interesting one, we're evaluating ETL tools for pre-processing statement data (e.g. utility bills, bank statements) for printing.

Some of the data comes through in a single flat file, with different record types.

e.g. a record type with "01" as the first field will be address data. This will have name and address fields. A record type with "02" will be summary data, with balances and totals. Record type "03" will be a line item on the statement.

Each statement will have one 01 and 02 records, and multiple 03 records. I could pre-parse the file and split into 3 files for loading into a table, but this is less than ideal.

We take the file and do a few manipulations on it (e.g. add in a couple more fields to the address record, and maybe do some totalling / validation), and then send the file in pretty much the same format (But with the extra fields added) to our print composition program.

How would you do this in SSIS?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

离鸿 2024-07-31 14:02:08

SSIS 中变体记录的一个大问题是,您无法获得连接管理器帮助布局的任何好处,因为连接管理器只能处理单个布局。

因此,通常情况下,您最终会得到一个以 CRLF 结尾的平面文件,其中包含两列:记录类型和记录数据。 然后,您放入条件分割并解析不同路径上的每种类型的行。 解析必须拆分剩余的记录数据,并将其放入列中,并使用派生列转换或脚本转换以及潜在的转换转换正常进行转换。

如果您有很多包要做,我会认真考虑编写一个自定义组件,该组件生成的 3 个输出已转换为您的目标类型。

The big problem with variant records in SSIS is that you don't get any of the benefits of the connection manager helping with the layout, since the connection manager can only handle a single layout.

So typically, you end up with a CRLF terminated flat file with two columns: recordtype and recorddata. Then you put the conditional split in and parse each type of row on different paths. The parsing will have to split up the remaining record data and put it in columns and convert as normal, either with a derived column transform or a script transform and potentially conversion transforms.

If you had a lot of packages to do, I would seriously consider writing a custom component which produced 3 outputs already converted to your destination types.

半窗疏影 2024-07-31 14:02:08

回答了我自己的问题 - 请参阅下面的脚本。 AcctNum 来自平面文件源的派生列,并将正确填充 02 记录类型,将其保存在本地静态变量中,并将其放回到不包含帐户号的其他记录类型的行上。

/* Microsoft SQL Server 集成服务脚本组件
* 使用 Microsoft Visual C# 2008 编写脚本。
* ScriptMain是脚本的入口点类。*/

using System;
使用系统数据;
使用 Microsoft.SqlServer.Dts.Pipeline.Wrapper;
使用 Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
公共类 ScriptMain : UserComponent
{
静态字符串帐号= null;

public override void PreExecute()
{
    base.PreExecute();
    /*
      Add your code here for preprocessing or remove if not needed
    */
}

public override void PostExecute()
{
    base.PostExecute();
    /*
      Add your code here for postprocessing or remove if not needed
      You can set read/write variables here, for example:
      Variables.MyIntVar = 100
    */
}

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    if (Row.RecordType == "02")
        AccountNumber = Row.AcctNum; // Store incomming Account Number into local script variable
    else if (Row.RecordType == "06" || Row.RecordType == "07" || Row.RecordType == "08" ||
             Row.RecordType == "09" || Row.RecordType == "10")
        Row.AcctNum = AccountNumber; // Put Stored Account Number on this row.
}

}

answered my own question - see below script. AcctNum come in from a derived column from the flat file source and will be correctly populated for 02 record types, save it in local static varialbe and put it back on the row for other record types that do not contain the acct number.

/* Microsoft SQL Server Integration Services Script Component
* Write scripts using Microsoft Visual C# 2008.
* ScriptMain is the entry point class of the script.*/

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
static String AccountNumber = null;

public override void PreExecute()
{
    base.PreExecute();
    /*
      Add your code here for preprocessing or remove if not needed
    */
}

public override void PostExecute()
{
    base.PostExecute();
    /*
      Add your code here for postprocessing or remove if not needed
      You can set read/write variables here, for example:
      Variables.MyIntVar = 100
    */
}

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    if (Row.RecordType == "02")
        AccountNumber = Row.AcctNum; // Store incomming Account Number into local script variable
    else if (Row.RecordType == "06" || Row.RecordType == "07" || Row.RecordType == "08" ||
             Row.RecordType == "09" || Row.RecordType == "10")
        Row.AcctNum = AccountNumber; // Put Stored Account Number on this row.
}

}

丶视觉 2024-07-31 14:02:08

这是可能的,但您必须编写自定义逻辑。 我用 DTS 做过一次。
如果文件被分隔,SSIS 将正确导入字段。 您可以编写一个脚本来检查记录类型字段,然后根据记录类型分支到不同的插入。 如果文件包含未分隔的记录,但每种类型都有自己的固定宽度,则情况会变得更加复杂,因为您必须解析和拆分每个导入的行,并将记录类型及其宽度硬编码在脚本中。

This is possible, bu you will have to write custom logic. I did this once with DTS.
If the file is delimited, SSIS will import the fields correctly. You can write a script that examines the record type field, then branches into different inserts depending on the record type. If the file has records that are not delimited, but each type has its own fixed widths, this becomes a lot more complicated, since you'd have to parse and split each imported line, with the record types and their width hardcoded in the script.

﹏雨一样淡蓝的深情 2024-07-31 14:02:08

有几种方法可以做到这一点,但我认为最容易理解的方法是在源任务之后添加条件分割,然后将其推送到一堆数据转换任务以获得正确的数据格式。

确保您的源设置了正确的数据类型,这样就不会出现任何问题(例如所有字符串)。 然后只需检查该条件拆分中的“记录类型”字段即可将其发送到正确的分支。

There are a few ways to do it, but I think the easiest one to understand would be to add a conditional split after the source task, and then push it through a bunch of data conversion tasks to get the right format of data.

Make sure that your source is set up with the correct data types, so nothing falls through (e.g.-all strings). Then just check the "Record Type" field in that conditional split to send it to the right branch.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文