字符类 - JavaScript 编辑
字符类可以区分各种字符,例如区分字母和数字。
类型
The following table is also duplicated on this cheatsheet. Do not forget to edit it as well, thanks!Characters | Meaning |
---|---|
. | 有下列含义之一:
需要注意的是, ES2018 添加了 |
\d | 匹配任何数字(阿拉伯数字)。 相当于 |
\D | 匹配任何非数字(阿拉伯数字)的字符。相当于 |
\w | 匹配基本拉丁字母中的任何字母数字字符,包括下划线。相当于 |
\W | 匹配任何不是来自基本拉丁字母的单词字符。相当于 |
\s | Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces. Equivalent to |
\S | Matches a single character other than white space. Equivalent to |
\t | Matches a horizontal tab. |
\r | Matches a carriage return. |
\n | Matches a linefeed. |
\v | Matches a vertical tab. |
\f | Matches a form-feed. |
[\b] | Matches a backspace. If you're looking for the word-boundary character (\b ), see Boundaries. |
\0 | Matches a NUL character. Do not follow this with another digit. |
\cX | Matches a control character using caret notation, where "X" is a letter from A–Z (corresponding to codepoints |
\xhh | Matches the character with the code hh (two hexadecimal digits). |
\uhhhh | Matches a UTF-16 code-unit with the value hhhh (four hexadecimal digits). |
\u{hhhh} or \u{hhhhh} | (Only when the u flag is set.) Matches the character with the Unicode value U+hhhh or U+hhhhh (hexadecimal digits). |
\ | Indicates that the following character should be treated specially, or "escaped". It behaves one of two ways.
To match this character literally, escape it with itself. In other words to search for |
Examples
Looking for a series of digits
var randomData = "015 354 8787 687351 3512 8735";
var regexpFourDigits = /\b\d{4}\b/g;
// \b indicates a boundary (i.e. do not start matching in the middle of a word)
// \d{4} indicates a digit, four times
// \b indicates another boundary (i.e. do not end matching in the middle of a word)
console.table(randomData.match(regexpFourDigits));
// ['8787', '3512', '8735']
Looking for a word (from the latin alphabet) starting with A
var aliceExcerpt = "I’m sure I’m not Ada,’ she said, ‘for her hair goes in such long ringlets, and mine doesn’t go in ringlets at all.";
var regexpWordStartingWithA = /\b[aA]\w+/g;
// \b indicates a boundary (i.e. do not start matching in the middle of a word)
// [aA] indicates the letter a or A
// \w+ indicates any character *from the latin alphabet*, multiple times
console.table(aliceExcerpt.match(regexpWordStartingWithA));
// ['Ada', 'and', 'at', 'all']
Looking for a word (from Unicode characters)
Instead of the Latin alphabet, we can use a range of Unicode characters to identify a word (thus being able to deal with text in other languages like Russian or Arabic). The "Basic Multilingual Plane" of Unicode contains most of the characters used around the world and we can use character classes and ranges to match words written with those characters.
var nonEnglishText = "Приключения Алисы в Стране чудес";
var regexpBMPWord = /([\u0000-\u0019\u0021-\uFFFF])+/gu;
// BMP goes through U+0000 to U+FFFF but space is U+0020
console.table(nonEnglishText.match(regexpBMPWord));
[ 'Приключения', 'Алисы', 'в', 'Стране', 'чудес' ]
Note for MDN editors: please do not try to add funny examples with emoji as those characters are not handled by the platform (Kuma).
Specifications
Specification |
---|
ECMAScript (ECMA-262) RegExp: Character classes |
Browser compatibility
For browser compatibility information, check out the main Regular Expressions compatibility table.
See also
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论