如何递归解析SQL语句中的数据

发布于 2024-10-11 08:51:52 字数 1678 浏览 2 评论 0原文

我正在尝试清理数据库中“名称”字段中的数据,并将该数据拆分为名字、中间名和名称。姓。目前,我正在使用 Case 语句来查找文本内的各种触发器,以帮助我以某种方式格式化输出。

然而,我开始注意到我将测试嵌套在其他测试中,并且需要弄清楚如何递归地处理数据。请参阅此示例,了解我如何提取名字。

Case 
  When CharIndex(' ',LTrim(RTrim(Name))) in (0,1) Then '' --'empty or LName'
  When Left(Name,3) IN ('MR ','MS ', 'DR ','MRS') Then    --'Prefix Titles'
    Case --'If we found a prefix, run the same "tests" with the prefix removed'
      When CharIndex(' ',LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))))
        in (0,1) Then '' 
      When SubString(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))),3,1) 
        = '&' Then SubString(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',
        Name)))),1,5)
      Else Left(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))),
        CHarIndex(' ',LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))))-1) 
    End
  When SubString(LTrim(RTrim(Name)),3,1) = '&' Then 
    SubString(LTrim(RTrim(Name)),1,5) --'Look for initials e.g. J & A Smith'
  Else Left(LTrim(RTrim(Name)),CHarIndex(' ',LTrim(RTrim(Name)))-1) 
  End

因此,为了让它在更复杂的情况下工作(例如MR JOHN A SMITH JR),我需要递归测试。在命令式编程中,如果我有一个名为 GetFirstName 的函数,我会执行类似的操作:

GetFirstName('MR JOHN A SMITH JR')
//GetFirstName identfies 'MR' and within the function it calls:
||
\\
   ==> GetFirstName('JOHN A SMITH JR')
       //GetFirstName identifies 'JR' and within the function it calls:
       ||
       \\
          ==> GetFirstName('JOHN A SMITH')
              //Finally, it returns 'JOHN'

理想情况下,用直接 SQL 执行此操作会很棒,但我不确定这是否可行。如果我不使用直接 SQL,我还有哪些替代方案? (我使用的是 SQL Server 2005)

I'm trying to clean up data in a "Name" field in our database and split that data into FirstName, MiddleName, & LastName. Currently, I'm using a Case statement to look for various triggers inside the text to help me format the output in a certain way.

However, I'm starting to notice that I'm nesting tests inside other tests and need to figure out how to process data recursively. See this example of how I'm extracting the FirstName.

Case 
  When CharIndex(' ',LTrim(RTrim(Name))) in (0,1) Then '' --'empty or LName'
  When Left(Name,3) IN ('MR ','MS ', 'DR ','MRS') Then    --'Prefix Titles'
    Case --'If we found a prefix, run the same "tests" with the prefix removed'
      When CharIndex(' ',LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))))
        in (0,1) Then '' 
      When SubString(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))),3,1) 
        = '&' Then SubString(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',
        Name)))),1,5)
      Else Left(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))),
        CHarIndex(' ',LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))))-1) 
    End
  When SubString(LTrim(RTrim(Name)),3,1) = '&' Then 
    SubString(LTrim(RTrim(Name)),1,5) --'Look for initials e.g. J & A Smith'
  Else Left(LTrim(RTrim(Name)),CHarIndex(' ',LTrim(RTrim(Name)))-1) 
  End

So, in order to get this to work in more complicated situations (e.g. MR JOHN A SMITH JR), I would need to test recursively. In imperative programming, I would do something like this if I had a function called GetFirstName:

GetFirstName('MR JOHN A SMITH JR')
//GetFirstName identfies 'MR' and within the function it calls:
||
\\
   ==> GetFirstName('JOHN A SMITH JR')
       //GetFirstName identifies 'JR' and within the function it calls:
       ||
       \\
          ==> GetFirstName('JOHN A SMITH')
              //Finally, it returns 'JOHN'

Ideally, it would be great to do this in straight SQL, but I'm not sure that is possible. What alternatives would I have if I'm not using straight SQL? (I'm using SQL Server 2005)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

会发光的星星闪亮亮i 2024-10-18 08:51:52

我认为用直接的SQL 来完成这件事并不容易/干净。您可以使用正则表达式,但是您必须编写自己的 CLR 函数来提供正则表达式功能。

I don't think it is something easy/clean to accomplish in straight SQL. You can use regular expressions though however you have to write your own CLR function to provide regex functionality.

原野 2024-10-18 08:51:52

这听起来像是一项一次性活动。您确定不能在使用临时表的多个语句中执行此操作吗?如果它是一次性活动,那么您可能对代码调试的准确性和简单性的需求比对性能的需求更高。

考虑这样的事情:

CREATE TABLE #MyNames (
  PersonID INT PRIMARY KEY,
  OriginalName VARCHAR(50),
  WorkingName VARCHAR(50),
  CandidateTitle VARCHAR(10),
  CandidateLastName VARCHAR(50),
  CandidateFirstName VARCHAR(50),
  CandidateMiddleName VARCHAR(50)
  --Your other candidate fields.....
)

INSERT INTO #MyNames (PersonID, OriginalName)
 SELECT TOP 100 ID, LTRIM(RTRIM(Name)) from SourcePersonTable

--Possibly add some indexes here for original name

UPDATE #MyNames
 SET CandidateTitle = LEFT(OriginalName,3),
     WorkingName = SUBSTRING(OriginalName,4,9999)
Where LEFT(OriginalName,3) IN
    ('MR ','MRS','MS ','DR ')

-- etc...

只需继续添加步骤并编辑 [WorkingName] 字段即可。完成一次扫描后,只需执行...

UPDATE #MyNames SET WorkingName = OriginalName

...即可准备进行另一次扫描。

This sounds like a one-time activity. Are you sure you can't do this in several statements using temp tables? If it is a one-time activity, then likely you have a higher need for accuracy and simplicity of code debugging than performance.

Consider something like this:

CREATE TABLE #MyNames (
  PersonID INT PRIMARY KEY,
  OriginalName VARCHAR(50),
  WorkingName VARCHAR(50),
  CandidateTitle VARCHAR(10),
  CandidateLastName VARCHAR(50),
  CandidateFirstName VARCHAR(50),
  CandidateMiddleName VARCHAR(50)
  --Your other candidate fields.....
)

INSERT INTO #MyNames (PersonID, OriginalName)
 SELECT TOP 100 ID, LTRIM(RTRIM(Name)) from SourcePersonTable

--Possibly add some indexes here for original name

UPDATE #MyNames
 SET CandidateTitle = LEFT(OriginalName,3),
     WorkingName = SUBSTRING(OriginalName,4,9999)
Where LEFT(OriginalName,3) IN
    ('MR ','MRS','MS ','DR ')

-- etc...

Just keep adding steps and editing the [WorkingName] field. When you're done with one pass, just do...

UPDATE #MyNames SET WorkingName = OriginalName

...and you're ready to do another sweep.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文