如何处理嵌入列中的列表和范围?

发布于 2024-10-16 14:26:31 字数 811 浏览 0 评论 0原文

我无法控制我要处理的数据的格式。当然,我可以使用脚本语言来处理数据库之外的以下问题,但我想避免这种情况,因为我正在处理的数据量很大,而且我想消除手动的必要性步骤。

简而言之,我有一张清单。列表可能包含单个 3 位数字符串、多个 3 位数字符串、一系列 3 位数字符串(例如 012-018)或多个 3 位数字符串和范围3 位数字的字符串。例如:

drop table list;
drop table lists;

create table lists (id varchar, vals varchar);

insert into lists values('A', '001,003-005');
insert into lists values('B', '008-007');
insert into lists values('C', '010, 011, 012');
insert into lists values('D', '011-013, 016-018, 020');

我知道我知道

我想把它变成下表:

create table list (id varchar, val varchar);
A   001
A   003
A   004
A   005
B   008
B   007
C   010
C   011
C   012
D   011
D   012
D   013
D   016
D   017
D   018
D   020

有没有办法在 SQL 中做到这一点?

I have no control over the format of the data I am trying to process. I could, of course, use a scripting language to deal with the following problem outside of the database, but I would like to avoid that because of the amount of data I am dealing with and because I'd like to eliminate the necessity of manual steps.

In short, I have a table of lists. A list might consist of a single 3-digit string, more than one 3-digit strings, a range of 3-digit strings, e.g. 012-018, or a number of 3-digit strings and ranges of 3-digit strings. For example:

drop table list;
drop table lists;

create table lists (id varchar, vals varchar);

insert into lists values('A', '001,003-005');
insert into lists values('B', '008-007');
insert into lists values('C', '010, 011, 012');
insert into lists values('D', '011-013, 016-018, 020');

I know, I know.

I would like to turn this into the following table:

create table list (id varchar, val varchar);
A   001
A   003
A   004
A   005
B   008
B   007
C   010
C   011
C   012
D   011
D   012
D   013
D   016
D   017
D   018
D   020

Is there any way to do this in SQL?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

荒岛晴空 2024-10-23 14:26:31

由于您没有使用特定的 RDBMS 标记您的问题,因此我必须笼统地回答。

SQL 本身不提供您正在寻找的基本操作,这基本上是字符串拆分。这意味着您必须自己编写,或者使用已在线发布的许多之一。

不过,您的数据范围使问题变得有点复杂。这意味着您的过程将如下所示:

  1. 将数据插入临时/内存表并按程序迭代它(或者,使用游标执行相同操作)
  2. 对于集合中的每个记录,提取非-标准化字符串数据并用','分割。
  3. 对于分割数据中的每个元素,您必须用 '-' 分割(对于非范围元素,应该返回一个结果) 。
  4. 如果您的第二次拆分(在 '-' 上)产生一个结果,则它是一条记录,您可以将其插入到最终目标中。如果它产生两个结果,那么它是一个范围,您必须从开始到结束进行迭代(使用该拆分的元素 1 和 2)并将记录插入到最终目标

评论后编辑

不幸的是,我对 PROC SQL 或 SAS 不熟悉,所以我无法为此提供具体的解决方案。我可以在下面发布一些关于 SQL Server T-SQL 的内容,希望能够帮助您入门。

declare @results table (idx int identity(1, 1), id varchar(5), data varchar(max))
declare @elements table (idx int identity(1, 1), element varchar(25))
declare @range table (idx int identity(1, 1), element varchar(25))

insert into @results (id, data)
select
    your_id,
    your_data

from your_source

declare @i int
declare @cnt int

declare @j int
declare @cnt2 int

declare @element varchar(25)

declare @first int
declare @second int

declare @start int
declare @end int

declare @id varchar(5)
declare @data varchar(max)

select @i = min(idx) - 1, @cnt = max(idx) from @results

while @i < @cnt
begin
    select @i = @i + 1

    select @id = id, @data = data from @results where idx = @i

    delete @elements

    insert into @elements (element) 
    select
        element

    from split(@data, ',')

    select @j = min(idx) - 1, @cnt2 = max(idx) from @elements

    while @j < @cnt2 
    begin
        select @j = @j + 1

        select @element = element from @elements where idx = @j

        delete @range

        insert into @range (element)
        select
            element

        from split(@element, '-')

        select @first = min(idx), @second = max(idx) from @range

        if @first = @second --single element
            insert into final_destination (id, value)
            select
                @id,
                element

            from @range
        else if @second - @first = 1 -- two elements, as desired
        begin
            select @start = convert(int, element) - 1 from @range where idx = @first
            select @end = convert(int, element) from @range where idx = @second

            while @start < @end
            begin
                select @start = @start + 1

                insert into final_destination (id, value)
                values (@id, @start)
            end
        end
        else -- error condition, bad input
    end
end

Since you haven't tagged your question with a specific RDBMS, I'll have to answer generally.

SQL itself doesn't provide the basic operation that you're looking for, which is basically a string split. This means that you'll have to write your own, or use one of the many that have been published online.

You've complicated matters a bit, though, with the ranges that you have in your data. This means that your procedure is going to look something like this:

  1. Insert your data into a temp/memory table and iterate over it procedurally (or, alternatively, use a cursor to do the same)
  2. For each record in your set, extract the non-normalized string data and split it by ','.
  3. For each element within the split data, you'll have to then split that by '-' (which, for non-range elements, should return you a single result).
  4. If your second split (on '-') yields one result, it's a single record that you can insert into your final destination. If it yields two results, then it's a range and you'll have to iterate from the start to the finish (using elements 1 and 2 of that split) and insert records into your final destination

Edit after comment

Unfortunately, I don't have any familiarity with PROC SQL or SAS, so I can't provide a specific solution for that. I can post something below in SQL Server T-SQL, which should hopefully get you started.

declare @results table (idx int identity(1, 1), id varchar(5), data varchar(max))
declare @elements table (idx int identity(1, 1), element varchar(25))
declare @range table (idx int identity(1, 1), element varchar(25))

insert into @results (id, data)
select
    your_id,
    your_data

from your_source

declare @i int
declare @cnt int

declare @j int
declare @cnt2 int

declare @element varchar(25)

declare @first int
declare @second int

declare @start int
declare @end int

declare @id varchar(5)
declare @data varchar(max)

select @i = min(idx) - 1, @cnt = max(idx) from @results

while @i < @cnt
begin
    select @i = @i + 1

    select @id = id, @data = data from @results where idx = @i

    delete @elements

    insert into @elements (element) 
    select
        element

    from split(@data, ',')

    select @j = min(idx) - 1, @cnt2 = max(idx) from @elements

    while @j < @cnt2 
    begin
        select @j = @j + 1

        select @element = element from @elements where idx = @j

        delete @range

        insert into @range (element)
        select
            element

        from split(@element, '-')

        select @first = min(idx), @second = max(idx) from @range

        if @first = @second --single element
            insert into final_destination (id, value)
            select
                @id,
                element

            from @range
        else if @second - @first = 1 -- two elements, as desired
        begin
            select @start = convert(int, element) - 1 from @range where idx = @first
            select @end = convert(int, element) from @range where idx = @second

            while @start < @end
            begin
                select @start = @start + 1

                insert into final_destination (id, value)
                values (@id, @start)
            end
        end
        else -- error condition, bad input
    end
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文