cfml 中的正则表达式匹配大写的整个单词,后跟换行符

发布于 2024-11-06 13:23:40 字数 1315 浏览 3 评论 0原文

我一整天都在努力解决这个问题,因为正则表达式不是我最喜欢的主题。

我试图找到何时发生以下情况:

完整的大写单词 后面跟一个空格 随后是换行符 接下来是另一个空格 后面是另一个以大写字母开头的单词

在测试时,我发现如果我定义了大写字母应该是什么(在本例中为 S):

[AZ][AZ]+ \n S

它会匹配,但是如果我更改它类似于

[AZ][AZ]+ \n [AZ]

现在,它会拾取包含换行符的任何文本,无论其前面是否有大写单词。

我错过了一些明显的东西吗?

下面是我正在使用的一些示例文本(希望它可以正常粘贴而不会丢失换行符)。我正在尝试找到标题(大写),以便我可以对它们进行一些更改。

 People who have a disability that would prevent them from performing required 
 basic life support skills are advised that they will not be able to achieve the 
 unit of competency. 
 ENROLLING IN FIRST AID UNITS OF COMPETENCY 
 If you are seeking to enrol in a First Aid unit of competency e.g. HLTFA301B 
 Apply first aid, you are advised that to complete the unit you must be able to 
 perform basic life support skills, for example control bleeding and perform 
 cardiopulmonary resuscitation (CPR). If you have a disability that would prevent 
 you from performing required basic life support skills you are advised that you 
 will not be able to achieve the unit of competency. 
 REQUIREMENTS AND ADVICE FOR STUDENTS PARTICIPATING IN WORK PLACEMENT 
 Some or all of the following advice will apply to you, depending on your course 
 and the type of organisation where you will be undertaking work placement. 

干杯 标记

I've been struggling with this all day, as regular expressions aren’t my most favourite topic.

I’m trying to find when the following happens:

Complete word that is in uppercase
Followed by a space
Followed by a line feed
Followed by another space
Followed by another word that starts with an uppercase letter

While testing I found that if I defined what the capital letter should be (in this case S):

[A-Z][A-Z]+ \n S

It would match, however if I change it to something like

[A-Z][A-Z]+ \n [A-Z]

It now picks up any text that contains a line feed regardless if it is preceded by an uppercase word.

Am I missing something obvious?

Below is some sample text I’m using (hopefully it pastes ok without losing it's line feeds). I’m trying to find the headings (in uppercase) so that I can make some changes to them.

 People who have a disability that would prevent them from performing required 
 basic life support skills are advised that they will not be able to achieve the 
 unit of competency. 
 ENROLLING IN FIRST AID UNITS OF COMPETENCY 
 If you are seeking to enrol in a First Aid unit of competency e.g. HLTFA301B 
 Apply first aid, you are advised that to complete the unit you must be able to 
 perform basic life support skills, for example control bleeding and perform 
 cardiopulmonary resuscitation (CPR). If you have a disability that would prevent 
 you from performing required basic life support skills you are advised that you 
 will not be able to achieve the unit of competency. 
 REQUIREMENTS AND ADVICE FOR STUDENTS PARTICIPATING IN WORK PLACEMENT 
 Some or all of the following advice will apply to you, depending on your course 
 and the type of organisation where you will be undertaking work placement. 

Cheers
Mark

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

夢归不見 2024-11-13 13:23:40

有两个主要问题。这些行有空格,也可能有其他字符。您至少需要使用超过 [AZ] 才能搜索这些内容。您至少需要在集合 [AZ ] 中包含一个空格。如果还有其他字符(例如数字或标点符号),您也需要在此处添加它们。正如卡罗拉提到的,您需要检查休息时间的变化。

这是一个示例,其中还包含积极的前瞻,以防止它返回结果,因此您可以直接在代码的下一步中使用匹配结果数组。

<cfset matches = reMatch(" [A-Z ]+(?= \r?\n [A-Z])", teststring) />
<cfdump var="#matches#" />

There are two primary problems. The lines have spaces and possibly other characters. You will need to at least use more than [A-Z] to search for these. You will at least need to include a space in the set [A-Z ]. If there are other characters such as numbers or some punctuation you will need to add them here as well. And as karora mentioned you will need to check for variations on the breaks.

Here is an example that also includes a positive look ahead to prevent it from coming back in the result, so you can then probably just use the match results array directly in the next step of your code.

<cfset matches = reMatch(" [A-Z ]+(?= \r?\n [A-Z])", teststring) />
<cfdump var="#matches#" />
栀梦 2024-11-13 13:23:40

当您匹配换行符时,请确保考虑换行符前面可能(也可能没有)回车符。特别是对于来自 Windows 的文本文件。

因此,您可能需要类似:

“[ ][AZ]+\r?\n[AZ]”

确保您的正则表达式中没有留下随机空格,因为这些空格很可能被视为文字空格。我将上面表达式中的(唯一)空格括在 [ ] 中,以使其更清楚地表明它是正则表达式的一部分,并且我将整个正则表达式括在 " 字符中,因为您可能需要这样。该空格周围的 [ ]不过,应该不需要。

匹配后面的 ? 表示“前面的 0 个或多个”,因此在这种情况下,我们需要一个 \n (可选)前面有一个 \r。

When you are matching a line break, make sure you consider that line breaks may (or may not) have carriage-returns preceding them. Especially on text files from Windows.

So you might want something like:

"[ ][A-Z]+\r?\n[A-Z]"

Make sure you don't leave random spaces in your regex, because these will very likely be treated as literal spaces. I've enclosed the (only) space in the expression above in [ ] to make it clearer that it's part of the regex, and I've enclose the whole regex in " characters because you probably want that. The [ ] around that space should not be needed, though.

The ? following a match means "0 or more of the preceding", so in this case we want a \n optionally preceded by a \r.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文