如何在 TextWrangler 中替换两个分隔符之间的项目

发布于 2024-12-14 05:29:01 字数 1000 浏览 2 评论 0原文

我想在音标斜杠之间替换音标符号，如下所示：

/anycharacter*ou*anycharacter/

我的意思是

/anycharacter*au*anycharacter/

我想在所有情况下在任何两个音标斜杠之间用“au”替换“ou”。例如：

<font size=+2 color=#E66C2C> jocose /dʒə'kous/</font>
    =  suj vour ver / suwj dduaf

into

<font size=+2 color=#E66C2C> jocose /dʒə'kaus/</font>
    =  suj vour ver / suwj dduaf

文本文件包含 HTML 代码和一些文本正斜杠（如 A/B 而不是 A 或 B）
字符串“anycharacter”可以是任何字符，一个或多个字符或没有字符。例如： /folou/, /houl/, /sou/, /dʒə'kousnis/...

到目前为止，我一直在使用：

Find: \/(.*?)\bou*\b(.*?)\/\s
Replace: /\1au\2\3\4/

但它会找到任何 /.../ 之间的所有字符串，包括正常的正斜杠和HTLM 斜线，替换时会绕过 /gou/、/tou/ 等项目。与上面的示例一样，输出为：

<font size=+2 color=#E66C2C> jocose /dʒə'kaus/</font>
    =  suj vaur ver / suwj dduaf

注意：将普通斜杠之前的“vour”替换为“vaur”不是我的目的。

您能指导我如何解决上述问题吗？多谢。

原文

I want to replace a phonetic symbol between phonetic transcription slashes like this:

/anycharacter*ou*anycharacter/

/anycharacter*au*anycharacter/

I mean I want to replace "ou" by "au" between any two phonetic slashes in all cases. For example:

<font size=+2 color=#E66C2C> jocose /dʒə'kous/</font>
    =  suj vour ver / suwj dduaf

into

<font size=+2 color=#E66C2C> jocose /dʒə'kaus/</font>
    =  suj vour ver / suwj dduaf

The text file contains HTML code and some text forward slashes (like A/B instead of A or B)
The string "anycharacter" can be any characters, one or more or no character. For example: /folou/, /houl/, /sou/, /dʒə'kousnis/...

So far, I have been using:

Find: \/(.*?)\bou*\b(.*?)\/\s
Replace: /\1au\2\3\4/

but it finds all the strings between any /.../ including the normal forward slashes and HTLM slashes, and when replacing it bypasses the items such as /gou/, /tou/, ect. As with the above example, the output is:

<font size=+2 color=#E66C2C> jocose /dʒə'kaus/</font>
    =  suj vaur ver / suwj dduaf

Note: that "vour" before normal slash is replaced by "vaur" is not my purpose.

Could you please guide me how to solve the above problem? Thanks a lot.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

记忆消瘦 2024-12-21 05:29:01

可能满足您的需求（符合 POSIX ERE）的最简单的匹配表达式是：

(/[^ \t/<>]*?)ou([^ \t/<>]*?/)

分解，这意味着：

(             # Capture the following into back-reference #1
  /           #   match a literal '/'
  [^ \t<>]    #   match any character that is not a space, tab, slash, or angle bracket...
    *?        #     ...any number of times (even zero times), being reluctant
)             # end capture
ou            # match the letters 'ou'
(             # Capture the following into back-reference #2
  [^ \t/<>]   #   match any character that is not a space, tab, slash, or angle bracket...
    *?        #     ...any number of times (even zero times), being reluctant
  /           #   match a literal '/'
)             # end capture

然后使用替换表达式 \1au\2

这将忽略 / 字符（如果中间有空格、制表符、尖括号（< 和 >）或另一个正斜杠 (/)）。如果您知道其他字符不会出现在这些表达式之一中，请将其添加到字符类（[] 组）中。

在我的模拟器中，它会将以下文本：

<font size=+2 color=#E66C2C> jocose /dʒə'kous/</font>
    =  suj vour ver / suwj dduaf. 
Either A/B or B/C might happen, but <b>at any time</b> C/D might also occur

...转换为此文本:

<font size=+2 color=#E66C2C> jocose /dʒə'kaus/</font>
    =  suj vour ver / suwj dduaf. 
Either A/B or B/C might happen, but <b>at any time</b> C/D might also occur

有不懂的地方就问吧！如果您愿意，我还可以解释您之前尝试使用的一些问题。

编辑：

以上表达式匹配整个音标集，并完全替换它，使用匹配的某些部分并替换其他部分。下一次比赛尝试将在当前比赛结束后开始。

因此，如果 ou 可能在 / 分隔的语音表达式中出现多次，则上述正则表达式将需要运行多次。对于一次性执行，语言或工具需要支持可变长度向前查找和向后查找（统称为环视）

据我所知，这只是微软的.Net正则表达式和 JGSoft 正则表达式“风格”（在 EditPad Pro 和 RegexBuddy 等工具中）。 POSIX（UNIX grep 需要的）不支持任何类型的环视，而 Python（我认为 TextWrangler 使用的）不支持可变长度环视。我相信如果没有可变长度环视，这是不可能的。

需要可变长度环视并执行您需要的操作的表达式可能如下所示：

(?<=/[^ \t/<>]*?)ou(?=[^ \t/<>]*?/)

...并且替换表达式也需要修改，因为您仅匹配（并因此替换）要匹配的字符被替换：

au

它的工作原理大致相同，只是它只匹配ou，然后运行检查（称为零宽度断言）以确保它紧接在前面由 / 和任意数量的某些字符组成，并紧随其后由任意数量的某些字符组成，然后是 /。

The simplest match expression that might satisfy your needs (POSIX ERE compliant) is:

(/[^ \t/<>]*?)ou([^ \t/<>]*?/)

broken down, this means:

(             # Capture the following into back-reference #1
  /           #   match a literal '/'
  [^ \t<>]    #   match any character that is not a space, tab, slash, or angle bracket...
    *?        #     ...any number of times (even zero times), being reluctant
)             # end capture
ou            # match the letters 'ou'
(             # Capture the following into back-reference #2
  [^ \t/<>]   #   match any character that is not a space, tab, slash, or angle bracket...
    *?        #     ...any number of times (even zero times), being reluctant
  /           #   match a literal '/'
)             # end capture

Then use the replace expression \1au\2

This will ignore text between / characters if there is a space, tab, angle brackets (< and >) or another forward slash (/) in between them. if there are other characters you know will not occur in one of these expressions, add it into the character classes (the [] groups)

In my emulator, it turns this text:

<font size=+2 color=#E66C2C> jocose /dʒə'kous/</font>
    =  suj vour ver / suwj dduaf. 
Either A/B or B/C might happen, but <b>at any time</b> C/D might also occur

...into this text:

<font size=+2 color=#E66C2C> jocose /dʒə'kaus/</font>
    =  suj vour ver / suwj dduaf. 
Either A/B or B/C might happen, but <b>at any time</b> C/D might also occur

Just ask if there is something that you don't understand! If you would like, I can also explain a few problems with the one you were trying to use before.

EDIT:

The above expression matches the entire phonetic transcription set, and replaces it entirely, using certain parts of the match and replacing others. The next attempt at a match will begin after the current match.

For this reason, if ou might occur more than once in a / delimited phonetic expression, the above regex will need to be run multiple times. For a once-through execution, a language or tool needs to support both variable-length look-ahead and look-behind (collectively look-around)

As far as I know, this is only Microsoft's .Net Regex and the JGSoft "flavor" of regex (in tools such as EditPad Pro and RegexBuddy). POSIX (which UNIX grep requires) does not support any kind of look-around and Python (which I THINK TextWrangler uses) does not support variable-length look-around. I believe it would not be possible without variable length look-around.

An expression that requires variable-length look-around and does what you need could be like this:

(?<=/[^ \t/<>]*?)ou(?=[^ \t/<>]*?/)

...and the replacement expression will need to be modified as well, since you are matching (and thus replacing) only the characters which are to be replaced:

au

It works much the same except that it only matches the ou, then runs a check (called a zero-width assertion) to make sure that it is immediately preceded by a / and any number of certain characters, and immediately followed by any number of certain characters then a /.

回复收藏 0 原文

~没有更多了~