C# 文件 gettext-string 正则表达式解析器

发布于 2024-12-19 03:39:16 字数 1289 浏览 1 评论 0原文

简短问题

让我们有一个正则表达式,它读取双引号内的字符串。该字符串仅当内部没有双引号时才有效。

("([^"]+)")

如何编写一个正则表达式,它具有相同的功能,但也适用于带有双引号且前面带有斜杠的字符串?

"Valid string"      //VALID
"Valid \"string\""  //VALID
"Invalid " + "string"  //INVALID
"Invalid " + "\"string\""  //INVALID

长问题

我正在构建自己的 gettext 实现 - 我发现官方 gettext 应用程序 ( http://www.gnu.org/s/gettext/ )不足以满足我的需求。

这意味着我需要自己查找每个 C# 代码文件中的所有字符串,但只查找那些作为唯一参数传递给特定函数的字符串。

我构建了一个获取大部分字符串的正则表达式。 Translate 函数是公共的、静态的,位于命名空间 GetTextLocalization 和 Localization 类中。

(GetTextLocalization\.)?(Localization\.Translate)\("([^"]+)"\)

当然,这只会找到单独的字符串,而不会找到任何带有逐字字符的字符串。如果字符串参数作为操作传递(“字符串 a”+“字符串 b”)或以逐字开头(@“逐字字符串”),则它将不会解析,但这不是问题。

正则表达式定义:

([^"]+)

表示字符串内不能有双引号,并且我知道公司中没有人在将字符串传递到参数中时以某种方式连接该字符串。尽管如此,我仍然需要将这种结构作为安全“假设”措施。

但这也导致了问题。双引号实际上可以在那里。

Localization.Translate("Perfectly valid String with \"double quotes\"")

我需要更改正则表达式,以便它将包含带双引号的字符串(因此我跳过像 Translate("a" + "b") 这样的内容,这会扰乱翻译目录)但只有那些前面有斜杠的。

我想我可能需要以某种方式使用这个(?!)分组结构,但我不知道把它放在哪里。

SHORT QUESTION

Let's have a regex, which reads a string inside a double quotes. This string is valid only if it has NO double quotes inside.

("([^"]+)")

How would one write a regex, which would have the same functionality but will also work for a string with a double quotes WITH a preceding slash?

"Valid string"      //VALID
"Valid \"string\""  //VALID
"Invalid " + "string"  //INVALID
"Invalid " + "\"string\""  //INVALID

LONG QUESTION

I'm building my own gettext implementation - I found out that the official gettext apps ( http://www.gnu.org/s/gettext/ ) are not sufficient to my needs.

That means I need to find all strings inside each C# code file myself, but only those which are passed to a particular function as the only parameter.

I built a regex which gets most of the strings. The function Translate is public, static and is situated in the namespace GetTextLocalization and in the class Localization.

(GetTextLocalization\.)?(Localization\.Translate)\("([^"]+)"\)

Of course, this will ONLY find the strings alone and it won't find any strings with a verbatim character. If a string parameter is being passed as an operation ("string a" + "string b") or starts with a verbatim (@"Verbatim string"), it will not parse, but that is not the problem.

The regex definition:

([^"]+)

says that there must be no double quotes inside the string and I know that noone in the company is connecting the string somehow while passing it in the parameter. Still, I need to have this construction as a safety "what if" measure.

But that also causes the problem. The double quotes actually can be there.

Localization.Translate("Perfectly valid String with \"double quotes\"")

I need to change the regex so it will include the strings with a double quote (so I skip anything like Translate("a" + "b") which would mess with the translation catalog) but only those which are preceded by a slash .

I thought I might need to use this (?!) grouping construct somehow but I have no idea where to place it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情绪少女 2024-12-26 03:39:16

由于您可能希望在引号前允许使用双反斜杠,因此我建议

"(?:\\.|[^"\\])*"

解释:

"        # Match "
(?:      # Either match
 \\.     # an escaped character
|        # or
 [^"\\]  # any character except " or \
)*       # any number of times.
"        # Match "

这与 "hello""hello\"there""hello\\" 但在 "hello"there""hello\\"there" 上失败。

Since you probably want to allow doubled backslashes before a quote, I suggest

"(?:\\.|[^"\\])*"

Explanation:

"        # Match "
(?:      # Either match
 \\.     # an escaped character
|        # or
 [^"\\]  # any character except " or \
)*       # any number of times.
"        # Match "

This matches "hello", "hello\"there" or "hello\\" but fails on "hello" there" or "hello\\" there".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文