越南语字符的正则表达式
我有一个字符串,想要删除以下任何情况下都不存在的任何字符:
不在该列表中: ẶẸẺẼỀỀỂ ưăạảấầẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎỐỒỔỐỘỚỜỞỠỢỤỦỨỪễ
- >
中,不是:_和白空间。
谁能帮我解决 php 中的这个正则表达式?
I have one string and want remove any character not in any case below:
not in this list : ÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠàáâãèéêìíòóôõùúăđĩũơƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼỀỀỂ
ưăạảấầẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪễệỉịọỏốồổỗộớờởỡợụủứừỬỮỰỲỴÝỶỸửữựỳỵỷỹnot in [a-z 0-9 A-Z]
not is : _ and white space.
can anyone help me with this regex in php?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
您可以尝试,使用以下正则表达式传递“ê,ế,Ê,Ế”:
^[a-zA-Z_ÀÁÂÈÉÊẾÌÍÒÓÔÕÙÚĂĐĨŨƠàáâãèéêếìíòóôõùúăđĩũơƯĂẠẢẤ ẦẨẪẬẮẰẲẴẶẸẺẼỀỀỂưăạảấầ ẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎᐐ ỒỔỖỘỚỜỞỠỢỤỦỨỪễệỉịọỏốồổỗộớờởỡợụủứừỬỮỰỲỴÝỶỸ ửữựỳỵỷỹ\]+$
You can try, this is passed "ê,ế,Ê,Ế" with this following regex:
^[a-zA-Z_ÀÁÂÃÈÉÊẾÌÍÒÓÔÕÙÚĂĐĨŨƠàáâãèéêếìíòóôõùúăđĩũơƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼỀỀỂưăạảấầẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪễệỉịọỏốồổỗộớờởỡợụủứừỬỮỰỲỴÝỶỸửữựỳỵỷỹ\ ]+$
使用Js你可以添加这个
with Js you can add this
您可以使用unicode字符, https://vietunicode.sourceforge.net/charset/
you can use unicode character, https://vietunicode.sourceforge.net/charset/
我按序列 A_Z、六声调、大写和小写重新排序特殊越南字符:
以及正则表达式:
I re-order special Vietnamese Characters by sequence A_Z, six tones, uppercase, and lowercase:
And the regex:
试试这个正则表达式:
u修饰符使PHP能够解释模式字符串为 UTF-8。
如果这不起作用,请尝试使用 Unicode 字符属性,例如
\p{L}
字母或 转义序列\x{1234}
用于描述单个 Unicode 字符或自定义字符范围:Try this regular expression:
The u modifier makes PHP to interpret the pattern string as UTF-8.
If that doesn’t work, try using Unicode character properties like
\p{L}
for letters or the escape sequence\x{1234}
for describing single Unicode characters or custom character ranges:上述正则表达式缺少
ế
,并且ă
和ề
重复。正确的越南语字符列表:
- àáạảắằẳẵặấầẩẫậẹẻẽềếểễệềếểễệềếểễệềếểễệọỏọỏọỏốồổỗộơớờởỡợốồổỗộơớờởỡợốồổỗộơớờởỡợ了ạả了ạảẹẻẽẹẻẽẹẻẽẹẻẽềếểễệềếểễệềếểễệềếểễệềếểễệềếểễệọỏọỏọỏọỏọỏ
另外,请记住在使用正则表达式测试字符串之前以 NFC 形式规范化字符串 (
string.normalize('NFC')
)。请在此处了解更多信息。The above regexes lacks of
ế
, alsoă
andề
are duplicated.List of correct Vietnamese characters:
àáãạảăắằẳẵặâấầẩẫậèéẹẻẽêềếểễệđìíĩỉịòóõọỏôốồổỗộơớờởỡợùúũụủưứừửữựỳỵỷỹýÀÁÃẠẢĂẮẰẲẴẶÂẤẦẨẪẬÈÉẸẺẼÊỀẾỂỄỆĐÌÍĨỈỊÒÓÕỌỎÔỐỒỔỖỘƠỚỜỞỠỢÙÚŨỤỦƯỨỪỬỮỰỲỴỶỸÝ
Also, remember to normalize the string in NFC form (
string.normalize('NFC')
) before testing it with the regex. Read more here.当心。越南语 Unicode 字符可以“分解”为“组合字符”,其中基本字符有一个代码点,并且用于附加变音符号的一个或多个代码点,或者它们可以“预组合”为单个 Unicode 代码点。在正则表达式范围
[]
中组合变音符号将无法按预期工作,因为无论它们与什么基本字符组合,您都会匹配它们。旧版本的 Unicode 不包含完整的越南语预写字符集,因此您需要在野外找到具有组合字符的越南语。您可以使用 Unicode 规范化 form C、NFC。
Be careful. Vietnamese Unicode characters may be "decomposed" into "combining characters" with one codepoint for the base character and one or more codepoints for addittional diacritics, or they may be "precomposed" into single Unicode codepoints. Combining diacritics won't work as expected with a regular expression range
[]
since you will match them no matter what base character they combine with.Older versions of Unicode did not contain the full set of Vietnamese precomposed characters so expect to find Vietnamese with combining characters in the wild. You can convert combining characters into precomposed characters using Unicode normalization form C, NFC.