Oracle Regex substr忽略可选组

发布于 2025-02-13 21:15:32 字数 1308 浏览 0 评论 0原文

我正在研究REGEX_SUBSTR以在Oracle

Test 1.Input: JOHN 10BC STUDENT Desired Output: JOHN

Test 2.Input: JOHN  STUDENT     Desired Output: JOHN

Test 3.Input: JOHN 10BC STU     Desired Output: JOHN

Test 4.Input: JOHN 10BC TEACHER Desired Output:NULL

Test 5.Input: JOHN  TEACHER     Desired Output:NULL

Test 6. Input: MR JOHN 08BC STU Desired Output: MR JOHN

Test 7. Input: MR JOHN STUDENT Desired Output: MR JOHN

Test 8. Input: MR JOHN 07BC TEACHER Desired Output: Null

Test 9. Input: MR STUART 06BC STDUENT Desired Output: MR STUART 

Test 10. Input: MR STUART LEE 05BC STDUENT Desired Output: MR STUART LEE

查询中获取以下结果：

测试1：

select REGEXP_SUBSTR('JOHN 10BC STUDENT','(.*)(\s+.*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;

输出：失败。返回约翰10BC而不是约翰

测试2：

select REGEXP_SUBSTR('JOHN STUDENT','(.*)(\s+.*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;

输出：通过。返回的约翰，

我通过删除？修改了以下查询？在第二个块

测试1中：选择Regexp_substr（'John 10BC Student'，'（。*）（\ s+。*bc）\ sstu（dent）？双重的; 输出：通过。返回的John

测试2：选择Regexp_substr（'John Student'，'（。*）（\ s+。*bc）\ sstu（dent）？'，1,1，'i'，1）来自Dual; 输出：失败。返回null而不是约翰。

如何使用regexp_substr忽略中间的可选单词BC，并始终将单词返回到BC作为学生的子弦。提前致谢！！！

原文

I am working on regex_substr to get the following result in oracle

Test 1.Input: JOHN 10BC STUDENT Desired Output: JOHN

Test 2.Input: JOHN  STUDENT     Desired Output: JOHN

Test 3.Input: JOHN 10BC STU     Desired Output: JOHN

Test 4.Input: JOHN 10BC TEACHER Desired Output:NULL

Test 5.Input: JOHN  TEACHER     Desired Output:NULL

Test 6. Input: MR JOHN 08BC STU Desired Output: MR JOHN

Test 7. Input: MR JOHN STUDENT Desired Output: MR JOHN

Test 8. Input: MR JOHN 07BC TEACHER Desired Output: Null

Test 9. Input: MR STUART 06BC STDUENT Desired Output: MR STUART 

Test 10. Input: MR STUART LEE 05BC STDUENT Desired Output: MR STUART LEE

Query:

Test 1:

select REGEXP_SUBSTR('JOHN 10BC STUDENT','(.*)(\s+.*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;

Output: Failed . Returned JOHN 10BC instead of JOHN

Test 2:

select REGEXP_SUBSTR('JOHN STUDENT','(.*)(\s+.*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;

Output: Passed. Returned JOHN

I modified the query as below by removing the ? in the second block

Test 1:select REGEXP_SUBSTR('JOHN 10BC STUDENT','(.*)(\s+.*BC)\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Passed. Returned JOHN

Test 2:select REGEXP_SUBSTR('JOHN STUDENT','(.*)(\s+.*BC)\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Failed. Returned Null instead of JOHN.

How to ignore the middle optional word BC and always return the words till BC as a substring for STUDENT using REGEXP_SUBSTR. Thanks in Advance!!!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

听风念你 2025-02-20 21:15:32

通过在其之后添加问号（*？*？）来防止其抓取可选零件（<。代码>（\ s+。*bc）？）：

 带有测试（ID，col）为（
  从双联盟中选择1，“约翰10BC学生”
  从双联盟中选择2，“ John Student”
  从双联盟中选择3，“约翰10BC Stu”
  从双联盟中选择4，“约翰10BC老师”
  从双联盟中选择4，“ JSTUOHN 10BC老师”
  从dual中选择5，“约翰老师”
）
选择
  ID，
  上校
  REGEXP_SUBSTR（col，'（。
从测试

 id | Col | QWE  
 - ：| ：----------------------- | ：------
 1 |约翰10BC学生|约翰 
 2 |约翰学生|约翰 
 3 |约翰10BC Stu |约翰 
 4 |约翰10BC老师|  null  
 4 | JSTUOHN 10BC老师|  null  
 5 |约翰老师|  null

db＆lt;＆gt; fiddle

3CF734918BC764B24B24B24B24B24B53D9EFA17FEA17FEB 。要允许多个单词作为第一部分，您可以使用以下以下等级：

(.+?)(\s+\S*BC)?\s+STU(DENT)?$

说明：

（。+？） - 与最短匹配的任何一个或多次匹配（允许匹配以下可选组））。
（\ s+\ s*bc）？ - 零或多个非Whitespace组的可选匹配（\ s）字符，然后是bc。该组应通过一个或多个空格字符与以前的非Whitespace角色分开。 \ s*而不是。*将阻止将多个单词纳入该组。
\ s+Stu（dent）？$ - 最后一部分应该是Word student或缩写Stu。它应该是行中的最后一个单词（$之后）。

扩展测试用例的结果是：

 带有测试（ID，col）为（
  从双联盟中选择1，“约翰10BC学生”
  从双联盟中选择2，“ John Student”
  从双联盟中选择3，“约翰10BC Stu”
  从双联盟中选择4，“约翰10BC老师”
  从双联盟中选择4，“ JSTUOHN 10BC老师”
  从双联盟中选择5，“约翰老师”
  从双联盟中选择6，“约翰10BC Stu”
  从双联盟中选择7，“ Stuart ABC 10BC学生”
  选择8，“ Stuart ABC 10BC学生”来自Dual Union All
  选择9，“ Stuart Stu先生10BC老师”
）
选择
  ID，
  上校
  REGEXP_SUBSTR（col，'（。
从测试

 id | Col | QWE          
 - ：| ：----------------------------------- | ：-----------------
 1 |约翰10BC学生|约翰         
 2 |约翰学生|约翰         
 3 |约翰10BC Stu |约翰         
 4 |约翰10BC老师|  null          
 4 | JSTUOHN 10BC老师|  null          
 5 |约翰老师|  null          
 6 |约翰先生10BC Stu |约翰先生      
 7 | Stuart ABC 10BC学生| Stuart ABC先生
 8 | Stuart ABC 10BC学生|  null          
 9 | Stuart Stu先生10BC老师|  null

db＆lt;＆gt; fiddle

Use non-greedy pattern for any-character match (.*) by adding a question mark after it (.*?) to prevent it from grabbing an optional part ((\s+.*BC)?):

with test (id, col) as (
  select 1, 'JOHN 10BC STUDENT'  from dual union all
  select 2, 'JOHN  STUDENT'      from dual union all
  select 3, 'JOHN 10BC STU'      from dual union all
  select 4, 'JOHN 10BC TEACHER'  from dual union all
  select 4, 'JSTUOHN 10BC TEACHER'  from dual union all
  select 5, 'JOHN  TEACHER'      from dual
)
select
  id,
  col,
  regexp_substr(col, '(.*?)(\s+.*BC)?\sSTU(DENT)?',1,1,'i',1) as qwe
from test

ID | COL                  | QWE  
-: | :------------------- | :----
 1 | JOHN 10BC STUDENT    | JOHN 
 2 | JOHN  STUDENT        | JOHN 
 3 | JOHN 10BC STU        | JOHN 
 4 | JOHN 10BC TEACHER    | null 
 4 | JSTUOHN 10BC TEACHER | null 
 5 | JOHN  TEACHER        | null

db<>fiddle here

UPD. To allow multiple words as the first part, you may use the following regex:

(.+?)(\s+\S*BC)?\s+STU(DENT)?$

Explanation:

(.+?) - match any character one or more times with the shortest match (allowing following optional groups to be matched).
(\s+\S*BC)? - optional match for the group of zero or more non-whitespace (\S) characters followed by BC. The group should be separated from the previous non-whitespace character by one or more whitespace characters. \S* instead of .* will prevent from inclusion of multiple words into this group.
\s+STU(DENT)?$ - the last part should be a word STUDENT or it's abbreviation STU. It should be the last word in the line ($ right after it).

And the result of extended test case is:

with test (id, col) as (
  select 1, 'JOHN 10BC STUDENT'  from dual union all
  select 2, 'JOHN  STUDENT'      from dual union all
  select 3, 'JOHN 10BC STU'      from dual union all
  select 4, 'JOHN 10BC TEACHER'  from dual union all
  select 4, 'JSTUOHN 10BC TEACHER'  from dual union all
  select 5, 'JOHN  TEACHER'      from dual union all
  select 6, 'MR JOHN 10BC STU' from dual union all
  select 7, 'MR STUART ABC 10BC STUDENT' from dual union all
  select 8, 'MR STUART ABC 10BC STUDENTS' from dual union all
  select 9, 'MR STUART STU 10BC TEACHER' from dual
)
select
  id,
  col,
  regexp_substr(col, '(.*?)(\s+\S*BC)?\s+STU(DENT)?

ID | COL                         | QWE          
-: | :-------------------------- | :------------
 1 | JOHN 10BC STUDENT           | JOHN         
 2 | JOHN  STUDENT               | JOHN         
 3 | JOHN 10BC STU               | JOHN         
 4 | JOHN 10BC TEACHER           | null         
 4 | JSTUOHN 10BC TEACHER        | null         
 5 | JOHN  TEACHER               | null         
 6 | MR JOHN 10BC STU            | MR JOHN      
 7 | MR STUART ABC 10BC STUDENT  | MR STUART ABC
 8 | MR STUART ABC 10BC STUDENTS | null         
 9 | MR STUART STU 10BC TEACHER  | null         
db<>fiddle here
,1,1,'i',1) as qwe from test

db<>fiddle here

回复收藏 0 原文

╰◇生如夏花灿烂 2025-02-20 21:15:32

您真的需要检查中间的内容吗？也许您可以忽略这一点，并与其余的要求一起工作（检查Stu或Student（如您的示例数据所建议）并根据此获取数据）？

这是两个选项（result_1和result_2），请参阅其中任何一个有所帮助。

示例数据：

SQL> with test (id, col) as
  2    (select 1, 'JOHN 10BC STUDENT'  from dual union all
  3     select 2, 'JOHN  STUDENT'      from dual union all
  4     select 3, 'JOHN 10BC STU'      from dual union all
  5     select 4, 'JOHN 10BC TEACHER'  from dual union all
  6     select 5, 'JOHN  TEACHER'      from dual
  7    )

查询从这里开始：

  8  select id,
  9    col,
 10    case when instr(col, 'STU') > 0 then substr(col, 1, instr(col, ' ') - 1)
 11         else null
 12    end result_1,
 13    --
 14    case when regexp_substr(col, '\w+

  result_1 搜索 col   stu> stu  string;如果找到它，它将返回第一个单词（从 col的开头到第一个空格字符）
  result_2 检查是否末尾（但是这次使用Regexp_substr，到 col> col 的结尾 $ ）是“ stu”或“ student”，然后返回第一个单词（固定在开始^的 col 

) in ('STU', 'STUDENT') then
 15              regexp_substr(col, '^\w+')
 16         else null
 17    end result_2
 18  from test;

        ID COL               RESULT_1             RESULT_2
---------- ----------------- -------------------- --------------------
         1 JOHN 10BC STUDENT JOHN                 JOHN
         2 JOHN  STUDENT     JOHN                 JOHN
         3 JOHN 10BC STU     JOHN                 JOHN
         4 JOHN 10BC TEACHER
         5 JOHN  TEACHER

SQL>

result_1搜索col stu> stu string;如果找到它，它将返回第一个单词（从col的开头到第一个空格字符）
result_2检查是否末尾（但是这次使用Regexp_substr，到col> col的结尾$）是“ stu”或“ student”，然后返回第一个单词（固定在开始^的col

Do you really need to check what's in the middle? Perhaps you could ignore that and work with the rest of the requirement (check for STU or STUDENT (as your sample data suggest) and fetch data according to that)?

Here are two options (result_1 and result_2), see if any of these help.

Sample data:

SQL> with test (id, col) as
  2    (select 1, 'JOHN 10BC STUDENT'  from dual union all
  3     select 2, 'JOHN  STUDENT'      from dual union all
  4     select 3, 'JOHN 10BC STU'      from dual union all
  5     select 4, 'JOHN 10BC TEACHER'  from dual union all
  6     select 5, 'JOHN  TEACHER'      from dual
  7    )

Query begins here:

  8  select id,
  9    col,
 10    case when instr(col, 'STU') > 0 then substr(col, 1, instr(col, ' ') - 1)
 11         else null
 12    end result_1,
 13    --
 14    case when regexp_substr(col, '\w+

result_1 searches col for the STU string; if it is found, it returns the first word (from beginning of col up to the first space character)
result_2 checks whether the last word (but this time using regexp_substr, anchoring to the end $ of col) is "STU" or "STUDENT" and returns the first word (anchored to the beginning ^ of col

) in ('STU', 'STUDENT') then
 15              regexp_substr(col, '^\w+')
 16         else null
 17    end result_2
 18  from test;

        ID COL               RESULT_1             RESULT_2
---------- ----------------- -------------------- --------------------
         1 JOHN 10BC STUDENT JOHN                 JOHN
         2 JOHN  STUDENT     JOHN                 JOHN
         3 JOHN 10BC STU     JOHN                 JOHN
         4 JOHN 10BC TEACHER
         5 JOHN  TEACHER

SQL>

result_1 searches col for the STU string; if it is found, it returns the first word (from beginning of col up to the first space character)
result_2 checks whether the last word (but this time using regexp_substr, anchoring to the end $ of col) is "STU" or "STUDENT" and returns the first word (anchored to the beginning ^ of col

回复收藏 0 原文

红尘作伴 2025-02-20 21:15:32

如何忽略中间可选单词10BC，并始终将第一个单词作为student的子字符串？

您不需要使用（慢）正则表达式，并且可以使用（更快）简单的字符串函数：

WITH test (id, col) as (
  select 1, 'JOHN 10BC STUDENT'    from dual union all
  select 2, 'JOHN  STUDENT'        from dual union all
  select 3, 'JOHN 10BC STU'        from dual union all
  select 4, 'JOHN 10BC TEACHER'    from dual union all
  select 5, 'JSTUOHN 10BC TEACHER' from dual union all
  select 6, 'JOHN  TEACHER'        from dual
)
SELECT id,
       col,
       CASE
       WHEN col LIKE '% STUDENT'
       OR   col LIKE '% STU'
       THEN SUBSTR(col, 1, INSTR(col, ' ')  - 1)
       END AS first_word
FROM   test

（使用 astentx的数据）

输出为：

id col first_word
1 约翰10BC学生 john
2 john Student john
3 john 10bc stu john
4 约翰10BC老师 null
5 jstuohn 10BC老师 null
6 约翰老师 null

id	col	first_word
1	约翰10BC学生	john
2	john Student	john
3	john 10bc stu	john
4	约翰10BC老师	null
5	jstuohn 10BC老师	null
6	约翰老师	null

db＆lt;＆gt; fiddle “ nofollow noreferrer”>在这里

How to ignore the middle optional word 10BC and always return the first word as a substring for STUDENT?

You do not need to use (slow) regular expressions and can use (much faster) simple string functions:

WITH test (id, col) as (
  select 1, 'JOHN 10BC STUDENT'    from dual union all
  select 2, 'JOHN  STUDENT'        from dual union all
  select 3, 'JOHN 10BC STU'        from dual union all
  select 4, 'JOHN 10BC TEACHER'    from dual union all
  select 5, 'JSTUOHN 10BC TEACHER' from dual union all
  select 6, 'JOHN  TEACHER'        from dual
)
SELECT id,
       col,
       CASE
       WHEN col LIKE '% STUDENT'
       OR   col LIKE '% STU'
       THEN SUBSTR(col, 1, INSTR(col, ' ')  - 1)
       END AS first_word
FROM   test

(using astentx's data)

The output is:

ID COL FIRST_WORD
1 JOHN 10BC STUDENT JOHN
2 JOHN STUDENT JOHN
3 JOHN 10BC STU JOHN
4 JOHN 10BC TEACHER null
5 JSTUOHN 10BC TEACHER null
6 JOHN TEACHER null