删除重复的重复字符

发布于 2024-11-03 18:06:20 字数 235 浏览 1 评论 0原文

我的存储过程中有一个字符串，例如 ',,,sam,,bob,' 或 ',,,' 从上面的字符串中我必须从中删除多个逗号，它必须看起来像 'sam,bob,' 或仅当 ',,,' 时 '' 。我必须仅使用 Sql Server 函数。我使用 Sql Server 2008 和 .Net 3.5

提前致谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

筱果果 2024-11-10 18:06:20

这适用于仅包含逗号或最多包含 398 个连续逗号的字符串。

 SELECT 
     CASE 
         WHEN TargetString NOT LIKE '%[^,]%' 
             THEN '' /*The string is exclusively commas*/
         ELSE 
            REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TargetString,
            REPLICATE(',',16),','), /*399/16 = 24 remainder 15*/
            REPLICATE(',',8),','),  /* 39/ 8 =  4 remainder 7*/
            REPLICATE(',',4),','),  /* 11/ 4 =  2 remainder 3*/
            REPLICATE(',',2),','),  /*  5/ 2 =  2 remainder 1*/
            REPLICATE(',',2),',')   /*  3/ 2 =  1 remainder 1*/
         END
 FROM T

如果您需要更多，请在顶部添加额外的 2 的幂；如果您需要较少，请从顶部删除。每个阶段的注释表示该阶段无法成功处理的最小数量。

所有注释行都采用这种格式

/*  L/D    =  Q remainder R */

D:    Corresponds to the length of the string generated by `REPLICATE`
R:    Is always D-1
Q+R:  Form L for the next step

因此，要使用另一个 REPLICATE(',',32),',') 阶段向上扩展该系列

D = 32 
R = 31
Q = 368 (399-31)
L = (368 * 32) + 31 = 11807

，这样就可以处理最多 11,806 个字符的逗号部分。

This works for strings that are exclusively commas or have up to 398 contiguous commas.

 SELECT 
     CASE 
         WHEN TargetString NOT LIKE '%[^,]%' 
             THEN '' /*The string is exclusively commas*/
         ELSE 
            REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TargetString,
            REPLICATE(',',16),','), /*399/16 = 24 remainder 15*/
            REPLICATE(',',8),','),  /* 39/ 8 =  4 remainder 7*/
            REPLICATE(',',4),','),  /* 11/ 4 =  2 remainder 3*/
            REPLICATE(',',2),','),  /*  5/ 2 =  2 remainder 1*/
            REPLICATE(',',2),',')   /*  3/ 2 =  1 remainder 1*/
         END
 FROM T

Add extra powers of 2 at the top if you need more or remove from the top if you need less. The comments by each stage indicate the smallest number that this stage will not deal with successfully.

All the comment lines are in this format

/*  L/D    =  Q remainder R */

D:    Corresponds to the length of the string generated by `REPLICATE`
R:    Is always D-1
Q+R:  Form L for the next step

So to extend the series upwards with another REPLICATE(',',32),',') stage

D = 32 
R = 31
Q = 368 (399-31)
L = (368 * 32) + 31 = 11807

So that would deal with sections of commas up to 11,806 characters.

回复收藏 0 原文

顾忌 2024-11-10 18:06:20

我建议使用 UDF 来完成此操作。由于我要建议的 UDF 不涉及任何表，因此性能应该相当不错。

CREATE Function [dbo].[CleanDuplicates](@Data VarChar(8000), @DuplicateChar VarChar(1))
Returns VarChar(8000)
WITH SCHEMABINDING
AS
Begin

    Set @Data = @DuplicateChar + @Data

    While PATINDEX('%' + @DuplicateChar + @DuplicateChar + '%',@Data) > 0
        Set @Data = REPLACE(@Data, @DuplicateChar + @DuplicateChar,@DuplicateChar)

    Return Right(@Data, Len(@Data)-1)

End

您可以像这样测试该功能：

Select dbo.CleanDuplicates(',,,', ',')
Select dbo.CleanDuplicates(',,,sam,,bob,', ',')

I would suggest a UDF to do this. Since the UDF I am about to suggest doesn't touch any tables, the performance should be pretty good.

CREATE Function [dbo].[CleanDuplicates](@Data VarChar(8000), @DuplicateChar VarChar(1))
Returns VarChar(8000)
WITH SCHEMABINDING
AS
Begin

    Set @Data = @DuplicateChar + @Data

    While PATINDEX('%' + @DuplicateChar + @DuplicateChar + '%',@Data) > 0
        Set @Data = REPLACE(@Data, @DuplicateChar + @DuplicateChar,@DuplicateChar)

    Return Right(@Data, Len(@Data)-1)

End

You can test the function like this:

Select dbo.CleanDuplicates(',,,', ',')
Select dbo.CleanDuplicates(',,,sam,,bob,', ',')

回复收藏 0 原文

萌辣 2024-11-10 18:06:20

试试这个

SELECT @Parameter AS 'BEFORE'
BEGIN
WHILE CHARINDEX(',,', @Parameter) > 0
    BEGIN
        SELECT @Parameter = REPLACE(@Parameter, ',,',',') 
    END
SELECT @Parameter AS 'AFTER'
END

try this

SELECT @Parameter AS 'BEFORE'
BEGIN
WHILE CHARINDEX(',,', @Parameter) > 0
    BEGIN
        SELECT @Parameter = REPLACE(@Parameter, ',,',',') 
    END
SELECT @Parameter AS 'AFTER'
END

回复收藏 0 原文

む无字情书 2024-11-10 18:06:20

乔治·马斯特罗斯写道：
<小时>
我建议使用 UDF 来执行此操作。自从 UDF 我要建议
不接触任何表，性能应该不错。

我同意“仅内存”标量 UDF 相当快。事实上，我实际上使用了 George 的标量 UDF 之一，它解决了“初始大写”问题，以证明有时“基于集合”的代码不是总是最好的方法。

然而，马丁·史密斯（Martin Smith）（该帖子的另一位发帖者）绝对走在正确的道路上。在这种情况下，“Set Based”仍然是出路。当然，任何人都可以对性能提出未经证实的主张，因此让我们通过性能演示来加热这一点。

为了进行演示，我们首先需要一些测试数据。有很多测试数据，因为我们要测试的两个函数都运行得非常快。这是构建一百万行测试表的代码。

--===== Conditionally drop the test table 
     -- to make reruns in SSMS easier
     IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL
        DROP TABLE #MyHead
GO
--===== Create and populate the test table on-the-fly.
     -- This builds a bunch of GUIDs and removes the dashes from them to 
     -- increase the chances of duplicating adjacent characters.
     -- Not to worry.  This takes less than 7 seconds to run because of
     -- the "Pseudo Cursor" created by the CROSS JOIN.
 SELECT TOP 1000000
        RowNum     = IDENTITY(INT,1,1),
        SomeString = REPLACE(CAST(NEWID() AS VARCHAR(36)),'-','')
   INTO #MyHead
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;
GO

无需在这里重新发布乔治的出色功能，但我确实需要发布我的功能。以下函数产生与 George 相同的结果。它看起来像一个“iTVF”（内联表值函数），但它只返回一个值。这就是为什么微软称它们为“内联标量函数”（我简称为“iSF”）。

 CREATE FUNCTION dbo.CleanDuplicatesJBM
        (@Data VARCHAR(8000), @DuplicateChar VARCHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN 
 SELECT Item =  STUFF(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
                    @DuplicateChar+@Data COLLATE LATIN1_GENERAL_BIN,
                REPLICATE(@DuplicateChar,33),@DuplicateChar),
                REPLICATE(@DuplicateChar,17),@DuplicateChar),
                REPLICATE(@DuplicateChar, 9),@DuplicateChar),
                REPLICATE(@DuplicateChar, 5),@DuplicateChar),
                REPLICATE(@DuplicateChar, 3),@DuplicateChar),
                REPLICATE(@DuplicateChar, 2),@DuplicateChar),
                REPLICATE(@DuplicateChar, 2),@DuplicateChar)
                ,1,1,'')
;
GO

首先，让我们测试一下 George 的标量 UDF。请阅读有关我们为何不使用 SET STATISTICS TIME ON 的评论。

/******************************************************************************
 Test George's code.
 Since Scalar Functions don't work well with SET STATISTICS TIME ON, we measure
 duration a different way.  We'll also throw away the result in a "Bit Bucket"
 variable because we're trying to measure the performance of the function 
 rather than how long it takes to display or store results.
******************************************************************************/
--===== Declare some obviously named variables
DECLARE @StartTime DATETIME,
        @BitBucket VARCHAR(36)
;
--===== Start the "Timer"
 SELECT @StartTime = GETDATE()
;
--===== Run the test on the function
 SELECT @BitBucket = [dbo].[CleanDuplicates](SomeString,'A')
   FROM #MyHead
;
--===== Display the duration in milliseconds
  PRINT DATEDIFF(ms,@StartTime,GETDATE())
;
--===== Run the test a total of 5 times
GO 5

以下是“ Fiver”运行的返回结果...

Beginning execution loop
15750
15516
15543
15480
15510
Batch execution completed 5 times.
(Average is 15,559 on my 10 year old, single 1.8Ghz CPU)

现在，我们将运行“iSF”版本...

/******************************************************************************
 Test Jeff's code.
 Even though this uses an "iSF" (Inline Scalar Function), we'll test exactly
 the same way that we tested George's code so we're comparing apples-to-apples.
 This includes throwing away the result in a "Bit Bucket" variable because 
 we're trying to measure the performance of the function rather than how long 
 it takes to display or store results.
******************************************************************************/
--===== Declare some obviously named variables
DECLARE @StartTime DATETIME,
        @BitBucket VARCHAR(36)
;
--===== Start the "Timer"
 SELECT @StartTime = GETDATE()
;
--===== Run the test on the function
 SELECT @BitBucket = cleaned.ITEM
   FROM #MyHead
  CROSS APPLY [dbo].[CleanDuplicatesJBM](SomeString,'A') cleaned
;
--===== Display the duration in milliseconds
  PRINT DATEDIFF(ms,@StartTime,GETDATE())
;
--===== Run the test a total of 5 times
GO 5

以下是该运行的结果。

Beginning execution loop
6856
6810
7020
7350
6996
Batch execution completed 5 times.
(Average is 7,006 {more than twice as fast} on my 10 year old, single 1.8Ghz CPU)

我的观点并不是说乔治的代码很糟糕。一点也不。事实上，当没有“单一查询”解决方案时，我使用标量 UDF。我还将声明并支持乔治，并不是所有的“单一查询”解决方案总是最好的。

当涉及到 UDF 时，请不要停止寻找它们。 ;-)

George Mastros wrote:
I would suggest a UDF to do this. Since the UDF I am about to suggest
doesn't touch any tables, the performance should be pretty good.

I agree that "memory only" Scalar UDF's are quite fast. In fact, I actually used one of George's Scalar UDFs, which solved the "Initial Caps" problem, to demonstrate that sometimes "Set Based" code ISN'T always the best way to go.

However, Martin Smith (another poster on this very thread) was definitely on the right track. In this case, "Set Based" is still the way to go. Of course, anyone can make an unsubstantiated claim as to performance so let's heat this up with a performance demonstration.

To demonstrate, we first need some test data. A LOT of test data because both of the functions we're going to test run nasty fast. Here's the code to build a million row test table.

--===== Conditionally drop the test table 
     -- to make reruns in SSMS easier
     IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL
        DROP TABLE #MyHead
GO
--===== Create and populate the test table on-the-fly.
     -- This builds a bunch of GUIDs and removes the dashes from them to 
     -- increase the chances of duplicating adjacent characters.
     -- Not to worry.  This takes less than 7 seconds to run because of
     -- the "Pseudo Cursor" created by the CROSS JOIN.
 SELECT TOP 1000000
        RowNum     = IDENTITY(INT,1,1),
        SomeString = REPLACE(CAST(NEWID() AS VARCHAR(36)),'-','')
   INTO #MyHead
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;
GO

No need to repost George's fine function here but I do need to post mine. The following function produces the same result as George's does. It looks like an "iTVF" (Inline Table Valued Function) and it is but it only returns one value. That's why Microsoft calls them "Inline Scalar Functions" (I call them "iSFs" for short).

 CREATE FUNCTION dbo.CleanDuplicatesJBM
        (@Data VARCHAR(8000), @DuplicateChar VARCHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN 
 SELECT Item =  STUFF(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
                    @DuplicateChar+@Data COLLATE LATIN1_GENERAL_BIN,
                REPLICATE(@DuplicateChar,33),@DuplicateChar),
                REPLICATE(@DuplicateChar,17),@DuplicateChar),
                REPLICATE(@DuplicateChar, 9),@DuplicateChar),
                REPLICATE(@DuplicateChar, 5),@DuplicateChar),
                REPLICATE(@DuplicateChar, 3),@DuplicateChar),
                REPLICATE(@DuplicateChar, 2),@DuplicateChar),
                REPLICATE(@DuplicateChar, 2),@DuplicateChar)
                ,1,1,'')
;
GO

First, let's test George's Scalar UDF. Please read the comments about why we're not using SET STATISTICS TIME ON here.

/******************************************************************************
 Test George's code.
 Since Scalar Functions don't work well with SET STATISTICS TIME ON, we measure
 duration a different way.  We'll also throw away the result in a "Bit Bucket"
 variable because we're trying to measure the performance of the function 
 rather than how long it takes to display or store results.
******************************************************************************/
--===== Declare some obviously named variables
DECLARE @StartTime DATETIME,
        @BitBucket VARCHAR(36)
;
--===== Start the "Timer"
 SELECT @StartTime = GETDATE()
;
--===== Run the test on the function
 SELECT @BitBucket = [dbo].[CleanDuplicates](SomeString,'A')
   FROM #MyHead
;
--===== Display the duration in milliseconds
  PRINT DATEDIFF(ms,@StartTime,GETDATE())
;
--===== Run the test a total of 5 times
GO 5

Here are the returns from that "fiver" run...

Beginning execution loop
15750
15516
15543
15480
15510
Batch execution completed 5 times.
(Average is 15,559 on my 10 year old, single 1.8Ghz CPU)

Now, we'll run the "iSF" version...

/******************************************************************************
 Test Jeff's code.
 Even though this uses an "iSF" (Inline Scalar Function), we'll test exactly
 the same way that we tested George's code so we're comparing apples-to-apples.
 This includes throwing away the result in a "Bit Bucket" variable because 
 we're trying to measure the performance of the function rather than how long 
 it takes to display or store results.
******************************************************************************/
--===== Declare some obviously named variables
DECLARE @StartTime DATETIME,
        @BitBucket VARCHAR(36)
;
--===== Start the "Timer"
 SELECT @StartTime = GETDATE()
;
--===== Run the test on the function
 SELECT @BitBucket = cleaned.ITEM
   FROM #MyHead
  CROSS APPLY [dbo].[CleanDuplicatesJBM](SomeString,'A') cleaned
;
--===== Display the duration in milliseconds
  PRINT DATEDIFF(ms,@StartTime,GETDATE())
;
--===== Run the test a total of 5 times
GO 5

Here are the results from that run.

Beginning execution loop
6856
6810
7020
7350
6996
Batch execution completed 5 times.
(Average is 7,006 {more than twice as fast} on my 10 year old, single 1.8Ghz CPU)

My point ISN'T that George's code is bad. Not at all. In fact, I use Scalar UDFs when there is no "single query" solution. I'll also state and back George up by saying that not all "single query" solutions are always the best.

Just don't stop looking for them when it comes to UDFs. ;-)

回复收藏 0 原文

烟沫凡尘 2024-11-10 18:06:20

你的解决方案很好，但

它只是逗号，
我讨厌基于循环的TSQL代码;-)

所以我基于Marcin解决方案集的通用代码编写了用于替换每种声明类型的重复项：

DECLARE @Duplicate NVARCHAR(100)= '#
您可以在重复字符串中声明每种字符以及@Replacement 中的每个替换字符串。

恕我直言，额外的好处是我仅在输入字符串的最大长度范围内搜索替换

DECLARE @TestString NVARCHAR(MAX)= 'test_test__f##f2$g'
DECLARE @Replacement NVARCHAR(MAX)= ''
DECLARE @OutputString NVARCHAR(MAX)= @teststring ;
WITH    numbers
          AS ( SELECT   ROW_NUMBER() OVER ( ORDER BY o.object_id, o2.object_id ) Number
               FROM     sys.objects o
                        CROSS JOIN sys.objects o2
             ),
        chars
          AS ( SELECT   SUBSTRING(@Duplicate, 1, 1) CHAR ,
                        CAST(1 AS INT) [LEVEL]
               UNION ALL
               SELECT   SUBSTRING(@Duplicate, numbers.Number, 1) CHAR ,
                        CAST(numbers.Number AS INT) [LEVEL]
               FROM     numbers
                        JOIN chars ON chars.Level + 1 = numbers.Number
               WHERE    LEN(SUBSTRING(@Duplicate, numbers.Number, 1)) > 0
             ),
        Replicated
          AS ( SELECT   REPLICATE(CHAR, numbers.number) Repl ,
                        numbers.Number
               FROM     chars
                        CROSS JOIN numbers
             )
    SELECT  @OutputString = REPLACE(@OutputString, Repl, @Replacement)
    FROM    replicated
    WHERE   number <= LEN(@TestString)

SELECT  @OutputString

您可以在重复字符串中声明每种字符以及@Replacement 中的每个替换字符串。
恕我直言，额外的好处是我仅在输入字符串的最大长度范围内搜索替换

Your solutions are good but

it is comma only
i hate loop-based TSQL code ;-)

so i wrote based on Marcin solution set-based universal code for replacement of every declared kind of duplicates:

DECLARE @Duplicate NVARCHAR(100)= '#
You can declare every kind of char in Duplicate string and  every  replacement string in @Replacement.

Additional gain IMHO is that i search for replacement only in range of maximum length of input string

DECLARE @TestString NVARCHAR(MAX)= 'test_test__f##f2$g'
DECLARE @Replacement NVARCHAR(MAX)= ''
DECLARE @OutputString NVARCHAR(MAX)= @teststring ;
WITH    numbers
          AS ( SELECT   ROW_NUMBER() OVER ( ORDER BY o.object_id, o2.object_id ) Number
               FROM     sys.objects o
                        CROSS JOIN sys.objects o2
             ),
        chars
          AS ( SELECT   SUBSTRING(@Duplicate, 1, 1) CHAR ,
                        CAST(1 AS INT) [LEVEL]
               UNION ALL
               SELECT   SUBSTRING(@Duplicate, numbers.Number, 1) CHAR ,
                        CAST(numbers.Number AS INT) [LEVEL]
               FROM     numbers
                        JOIN chars ON chars.Level + 1 = numbers.Number
               WHERE    LEN(SUBSTRING(@Duplicate, numbers.Number, 1)) > 0
             ),
        Replicated
          AS ( SELECT   REPLICATE(CHAR, numbers.number) Repl ,
                        numbers.Number
               FROM     chars
                        CROSS JOIN numbers
             )
    SELECT  @OutputString = REPLACE(@OutputString, Repl, @Replacement)
    FROM    replicated
    WHERE   number <= LEN(@TestString)

SELECT  @OutputString

You can declare every kind of char in Duplicate string and every replacement string in @Replacement.
Additional gain IMHO is that i search for replacement only in range of maximum length of input string

回复收藏 0 原文

深居我梦 2024-11-10 18:06:20

你可以尝试

SELECT REPLACE(LTRIM(REPLACE(',,,sam,,bob,', ',', ' ')),' ', ',')

You could try

SELECT REPLACE(LTRIM(REPLACE(',,,sam,,bob,', ',', ' ')),' ', ',')

回复收藏 0 原文

~没有更多了~

关于作者

帅气称霸

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

删除重复的重复字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

删除重复的重复字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。