这是我的第一个SSIS项目,所以我正在努力处理一些核心概念。
在生产中,我的表格以及每种形式的文件都提交了表格。在转到下一个SSIS软件包之前,我需要验证磁盘上的每个文件是否存在。
在我的第一个软件包中,我想实现以下内容:
- 将生产中的样本数据从临时db - 中提取
- 加载样品数据中的记录集,rsformsubsibsions并获得要处理的记录计数 - 工作
- 显示一个消息框,其中包含要处理的记录数量 - 工作
- 循环通过每个记录,具体取决于表单的类型,请从其中一列中进行json数据,并添加结果在另一个记录集中的列表中的一组文件名,rsfiles- 这是我遇到问题的地方
- 通过rsfiles循环,并确认每个文件都存在,如果不存在,请将一个简单的记录写入表格的表格错误的记录 - 尚未完成此操作,但应该简单
这是我的控制流,说明了上述粗略的解释:
上面步骤4中的问题
我尝试找到一个简单的if/then或switch/case任务,但缺少我选择了我在堆栈中的另一篇文章中发现的黑客,该秘密是使用空序列的容器和从那里使用优先限制来确定我正在使用表达式处理的形式类型:
@[FormSubmissions::FormTypeId] == 1
以上是图表中“拆分通用表单提交文件”的约束 - 循环中的左流量最多。
使用数据流,我首先使用脚本任务来估算JSON数据,该数据被循环容器推入变量,然后将结果列表添加到RecordSet中。
注意:如果我禁用表格2a的控制流部分,则表格1G的过程有效,反之亦然。
每种表单类型都有其自己的JSON结构,因此,如果我尝试使用Form 1 GI的数据流/脚本任务来处理表单2a,则会遇到异常。我现在是我现在的地方。我得到了表格1G的数据流程的表格2a记录。
我的想法是,我正在使用我的普通编码理解来尝试解决该工具不同工作的问题?即,我假设该循环通过行通过其数据行运行,并且foreach行检查约束,仅执行该约束的数据流。也许我没有得到并行性影响SSI的多么?
This is my first SSIS project so I'm struggling with some of the core concepts.
In production I have forms submitted along with files for each form. I need to verify that each file exists on disk before moving on to the next SSIS package.
In my first package I want to achieve the following:
- Extract sample data from production into temp DB - working
- Load sample data into a Recordset, rsFormSubmissions and get a count of records to be processed - working
- Show a message box with the number of records to be processed - working
- Loop through each record and, depending on the type of form, deserialize JSON data from one of the columns and add the resulting set of filenames to a list in another recordset, rsFiles - This is where I'm having a problem
- Loop through rsFiles and confirm that each file exists, if not, write a simple record to a table indicating the faulty record - haven't done this yet, but should be simple
Here is my Control Flow illustrating the rough explanation above:
data:image/s3,"s3://crabby-images/9dee3/9dee3bafaf918af2ad19bd1d10ec55e7313ff292" alt="enter image description here"
The Problem
In step 4 above I tried finding a simple if/then or switch/case task but lacking that I opted for a hack I found on another post here in Stack which is to use an empty sequence container and from there use precedence constraints to determine what form type I'm dealing with using an expression:
@[FormSubmissions::FormTypeId] == 1
The above is the constraint for "Split out Generic form submission files" in the diagram - the left most flow in the loop.
Using a Data Flow I first use a Script Task to deserialize JSON data that was pushed into a variable by the loop container and then add the resulting list of files to a Recordset.
Note: If I disable the Control Flow section for Form 2A then Form 1G's process works and vice versa.
Each form type has its own JSON structure so if I were to attempt to process Form 2A using the Data Flow/ Script Task for Form 1 G I would run into exceptions. Which is where I'm at now. I'm getting Form 2A records being processed by the Data Flow for Form 1G.
My thinking is that I'm using my normal coding understanding to try and solve a problem where the tool works differently? I.e. I'm assuming that the loop is running through its data row by row AND foreach row checks the constraint and only executes the Dataflow for that constraint. Perhaps I'm not getting how parallelism impacts SSIS?
发布评论
评论(1)
不错的文章。是的,SSIS是NO .NET,并且可以有点“思维”。 ;-)
我看到您要检查的两件事:
foreach循环编辑器,确保
@[form ubmissions :: formTypeid]
变量正在设置通过RSRECORDSETYOUAROPINGOVER“ Collection”通过变量映射选项卡。的值。
(可选)虽然没有必要,但它是范围您的变量,使它们在需要访问它们的内部最大控制流容器之外看不到它们(在OOP中考虑封装原理)。因此理想情况下,
@[formubmissions :: formTypeid]
变量被“范围”到foreach循环控制流容器,因为那是应该写入的地方” “和“从中读”。如果在该循环之外进行范围(默认变量范围是软件包级别),则包装的其他部分可能会无意间干扰其值。(点击放大)
(可选)更改变量范围仅通过“移动变量”按钮
Nice write-up. Yes, SSIS is no .NET, and can be a little bit of a "mind-bend". ;-)
I see two things you'll want to check:
Inside the Foreach Loop Editor, ensure that the
@[FormSubmissions::FormTypeId]
variable is being set by a value from rsRecordSetYouAreLoopingOver "Collection" via the Variable Mappings tab.(Optional) While it's not necessary, it's a "preferred practice" to Scope your variables such that they are not visible outside of the inner-most control flow container that needs access to them (think Encapsulation Principle in OOP). So ideally, the
@[FormSubmissions::FormTypeId]
variable is "Scoped" to the Foreach loop Control Flow Container, since that is the place where it should be "written to" and "read from". If it's scoped outside of that loop (default Variable scope is Package-level) other parts of the package may inadvertently interfere with its values.(Click to enlarge)
data:image/s3,"s3://crabby-images/b6be9/b6be903e1c498c9906099977877b56b2886f5190" alt="SSIS Scoped Variables Get/Set Inside a Foreach Loop"
(Optional) Changing a Variable's Scope is only available via the "Move Variable" button