将多行记录上传到SQL Server

发布于 2024-12-05 18:40:19 字数 1654 浏览 6 评论 0原文

我们从客户端收到固定长度的数据集，如下所示：

1 SOMEFILE   20110922
2 20110101ABC999  
3 JOHN         SMITH     19800201
4 5000000       1000
2 20060101DEF999  
3 JANE         KOTZE     19811001
4 200000        800
5 5200000       1800

其中每行第一个位置的数字表示该行中的信息类型。类型是：

1  Header record (only appears once, in the first line)  
2  Contract record  
3  Person record  
4  Amounts record  
5  Trailer record (only appears once, in the last line)

2、3和4中的信息实际上都与一条记录相关，我需要在上传阶段找到一种方法将它们合并为一条记录。没有标识符明确指定 2、3 和 4 的哪些组合属于彼此，但在所有情况下，它们在原始数据中都被排序为直接出现在另一个的下方。

我需要的是一个预处理步骤，它将获取原始数据，然后将正确的 2,3 和 4 行组合成一条记录（然后再次输出为 txt 文件），如下所示：

20110101ABC999JOHN         SMITH     198002015000000       1000
20060101DEF999JANE         KOTZE     19811001200000        800

我想过 bcp'ing into SQL （或者甚至只使用 Access）并分配一个自动递增的整数作为 PK。即：

  PK Type  Record 
  1  1     SOMEFILE   20110922
  2  2     20110101ABC999  
  3  3     JOHN         SMITH     19800201
  4  4     5000000       1000
  5  2     20060101DEF999  
  6  3     JANE         KOTZE     19811001
  7  4     200000        800
  8  5     5200000       1800

然后做类似的事情：

select 
type2.[record]+type3.[record]+type4.[record]
from

(select [record] from uploaded where [type]=2) as type2

join
(select [record] from uploaded where [type]=3) as type3
on type2.PK + 1 = type3.PK

join
(select [record] from uploaded where [type]=4) as type4
on type2.PK + 2 = type4.PK

但我担心的是，这完全依赖于 SQL Server 按照数据出现在输入文件中的顺序分配 PK；我不确定情况是否一定如此。

有谁知道吗？或者知道更好的方法来做到这一点？

谢谢
卡尔

原文

We receive fixed length datasets from a client that look something like this:

1 SOMEFILE   20110922
2 20110101ABC999  
3 JOHN         SMITH     19800201
4 5000000       1000
2 20060101DEF999  
3 JANE         KOTZE     19811001
4 200000        800
5 5200000       1800

where the number in the first position on each line indicates the type of information in the line. The types are:

1  Header record (only appears once, in the first line)  
2  Contract record  
3  Person record  
4  Amounts record  
5  Trailer record (only appears once, in the last line)

The information in 2, 3 and 4 all actually relate to one record, and I need to find a way at upload stage to combine them into one. There are no identifiers that explicitly specify which combinations of 2, 3 and 4 belong with one another, but in all cases they have been ordered in the raw data to appear directly below one another.

What I need is a preprocessing step that will take the original data and then combine the correct 2,3 and 4 lines into one record (and then output again as a txt file), like this:

20110101ABC999JOHN         SMITH     198002015000000       1000
20060101DEF999JANE         KOTZE     19811001200000        800

I have thought of bcp'ing into SQL (or even just using Access) and assigning an auto-incremented integer as PK. i.e:

  PK Type  Record 
  1  1     SOMEFILE   20110922
  2  2     20110101ABC999  
  3  3     JOHN         SMITH     19800201
  4  4     5000000       1000
  5  2     20060101DEF999  
  6  3     JANE         KOTZE     19811001
  7  4     200000        800
  8  5     5200000       1800

and then doing something like:

select 
type2.[record]+type3.[record]+type4.[record]
from

(select [record] from uploaded where [type]=2) as type2

join
(select [record] from uploaded where [type]=3) as type3
on type2.PK + 1 = type3.PK

join
(select [record] from uploaded where [type]=4) as type4
on type2.PK + 2 = type4.PK

But what I am worried about is that this is entirely dependent on SQL Server assigning the PKs in the order that the data appears in die input file; I am not sure that this would necessarily be the case.

Does anyone know? Or know of a better way to do this?

Thanks
Karl

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

￡烟消云散 2024-12-12 18:40:19

编辑：添加了第二个解决方案

解决方案1：

您无法确定 SQL Server 插入顺序。
在将数据导入 SQL Server 之前，您必须进行一些文本文件处理。例如，您可以使用 PowerShell 将 PK 添加到文件中，如下所示：

$rows = GET-CONTENT -PATH D:\BD\Samples\MyData.txt

for($i=0; $i -lt $rows.length; $i++)
{
    $row = $rows[$i]
    $temp=("00000"+[string]($i+1))
    $rows[$i]=$temp.substring($temp.length-5)+" "+$row

}

SET-CONTENT -PATH D:\BD\Samples\MyDataResults.txt $rows

Before (MyData.txt content):

1 SOMEFILE   20110922
2 20110101ABC999
3 JOHN         SMITH     19800201
4 5000000       1000
2 20060101DEF999
3 JANE         KOTZE     19811001
4 200000        800
5 5200000       1800

After PowerShell< /code> 处理（MyDataResults.txt 内容）：

00001 1 SOMEFILE   20110922
00002 2 20110101ABC999
00003 3 JOHN         SMITH     19800201
00004 4 5000000       1000
00005 2 20060101DEF999
00006 3 JANE         KOTZE     19811001
00007 4 200000        800
00008 5 5200000       1800

在这两个 PS 脚本中，我假设您可以插入 max. 99999 行。

解决方案 2：

$rows = GET-CONTENT -PATH D:\BD\Samples\MyData.txt

$rows[0]="00000 "+$row
$rows[$rows.length-1]="99999 "+$row

$groupid=0

for($i=1; $i -lt $rows.length-1; $i=$i+3)
{
    $groupid++

    $row = $rows[$i]
    $temp=("00000"+[string]$groupid)
    $rows[$i]=$temp.substring($temp.length-5)+" "+$row

    $row = $rows[$i+1]
    $temp=("00000"+[string]$groupid)
    $rows[$i+1]=$temp.substring($temp.length-5)+" "+$row

    $row = $rows[$i+2]
    $temp=("00000"+[string]$groupid)
    $rows[$i+2]=$temp.substring($temp.length-5)+" "+$row        

}

SET-CONTENT -PATH D:\BD\Samples\MyDataResults2.txt $rows

结果：

00000 4 200000        800
00001 2 20110101ABC999
00001 3 JOHN         SMITH     19800201
00001 4 5000000       1000
00002 2 20060101DEF999
00002 3 JANE         KOTZE     19811001
00002 4 200000        800
99999 4 200000        800

Edit: added second solution

Solution 1:

You can not be sure regarding SQL Server insert order.
You have to do some text file processings before importing your data in SQL Server. For example, you can use PowerShell to add a PK into file thus:

$rows = GET-CONTENT -PATH D:\BD\Samples\MyData.txt

for($i=0; $i -lt $rows.length; $i++)
{
    $row = $rows[$i]
    $temp=("00000"+[string]($i+1))
    $rows[$i]=$temp.substring($temp.length-5)+" "+$row

}

SET-CONTENT -PATH D:\BD\Samples\MyDataResults.txt $rows

Before (MyData.txt content):

1 SOMEFILE   20110922
2 20110101ABC999
3 JOHN         SMITH     19800201
4 5000000       1000
2 20060101DEF999
3 JANE         KOTZE     19811001
4 200000        800
5 5200000       1800

After PowerShell processing (MyDataResults.txt content):

00001 1 SOMEFILE   20110922
00002 2 20110101ABC999
00003 3 JOHN         SMITH     19800201
00004 4 5000000       1000
00005 2 20060101DEF999
00006 3 JANE         KOTZE     19811001
00007 4 200000        800
00008 5 5200000       1800

In both PS scripts I assume you can insert max. 99999 rows.

Solution 2:

$rows = GET-CONTENT -PATH D:\BD\Samples\MyData.txt

$rows[0]="00000 "+$row
$rows[$rows.length-1]="99999 "+$row

$groupid=0

for($i=1; $i -lt $rows.length-1; $i=$i+3)
{
    $groupid++

    $row = $rows[$i]
    $temp=("00000"+[string]$groupid)
    $rows[$i]=$temp.substring($temp.length-5)+" "+$row

    $row = $rows[$i+1]
    $temp=("00000"+[string]$groupid)
    $rows[$i+1]=$temp.substring($temp.length-5)+" "+$row

    $row = $rows[$i+2]
    $temp=("00000"+[string]$groupid)
    $rows[$i+2]=$temp.substring($temp.length-5)+" "+$row        

}

SET-CONTENT -PATH D:\BD\Samples\MyDataResults2.txt $rows

Results:

00000 4 200000        800
00001 2 20110101ABC999
00001 3 JOHN         SMITH     19800201
00001 4 5000000       1000
00002 2 20060101DEF999
00002 3 JANE         KOTZE     19811001
00002 4 200000        800
99999 4 200000        800

回复收藏 0 原文

~没有更多了~