JavaScript 正则表达式空白字符

发布于 2024-11-08 12:48:44 字数 318 浏览 0 评论 0原文

我进行了一些搜索,但找不到 JavaScript 正则表达式中 \s 中包含的空白字符的明确列表。

我知道我可以依赖空格、换行、回车和制表符作为空白,但我认为由于 JavaScript 传统上仅适用于浏览器,也许 URL 编码的空白和类似   和 %20 也将受到支持。

JavaScript 的正则表达式编译器到底考虑了什么? 如果浏览器之间存在差异,我只关心 webkit 浏览器,但很高兴知道任何差异。另外,Node.js 怎么样?

I have done some searching, but I couldn't find a definitive list of whitespace characters included in the \s in JavaScript's regex.

I know that I can rely on space, line feed, carriage return, and tab as being whitespace, but I thought that since JavaScript was traditionally only for the browser, maybe URL encoded whitespace and things like   and %20 would be supported as well.

What exactly is considered by JavaScript's regex compiler? If there are differences between browsers, I only really care about webkit browsers, but it would be nice to know of any differences. Also, what about Node.js?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

指尖凝香 2024-11-15 12:48:44

一个简单的测试:

for(var i = 0; i < 1114111; i++) {
    if(String.fromCodePoint(i).replace(/\s+/, "") == "") console.log(i);
}

字符代码(Chrome):

9
10
11
12
13
32
160
5760
8192
8193
8194
8195
8196
8197
8198
8199
8200
8201
8202
8232
8233
8239
8287
12288
65279

A simple test:

for(var i = 0; i < 1114111; i++) {
    if(String.fromCodePoint(i).replace(/\s+/, "") == "") console.log(i);
}

The char codes (Chrome):

9
10
11
12
13
32
160
5760
8192
8193
8194
8195
8196
8197
8198
8199
8200
8201
8202
8232
8233
8239
8287
12288
65279
¢蛋碎的人ぎ生 2024-11-15 12:48:44

对于 Mozilla 来说是这样的;

 [ \f\n\r\t\v\u00A0\u2028\u2029]

(Ref)

对于 IE (JScript),其

[ \f\n\r\t\v] 

(参考

For Mozilla its;

 [ \f\n\r\t\v\u00A0\u2028\u2029]

(Ref)

For IE (JScript) its

[ \f\n\r\t\v] 

(Ref)

追我者格杀勿论 2024-11-15 12:48:44

HTML!= JavaScript。 JavaScript 完全是字面意思,%20 是 %20,  是一串字符 & nbsp 和;。对于字符类,我认为几乎每个 Perl 中的 RegEx 都适用于 JS(你不能执行命名组等)。

http://www.regular-expressions.info/javascript.html 是我的参考使用。

HTML != Javascript. Javascript is completely literal, %20 is %20 and   is a string of characters & n b s p and ;. For character classes I consider nearly every that is RegEx in perl to be applicable in JS (you can't do named groups etc).

http://www.regular-expressions.info/javascript.html is the refernece I use.

不交电费瞎发啥光 2024-11-15 12:48:44

这是 primvdb 的答案的扩展,涵盖整个 16 位空间,包括 unicode 代码点值以及与 str 的比较。修剪()。我尝试编辑答案以改进它,但我的编辑被拒绝,所以我不得不发布这个新答案。

识别将作为空白正则表达式 \s 或通过 String.prototype.trim()

const regexList = [];
const trimList = [];

for (let codePoint = 0; codePoint < 2 ** 16; codePoint += 1) {
  const str = String.fromCodePoint(codePoint);
  const unicode = codePoint.toString(16).padStart(4, '0');

  if (str.replace(/\s/, '') === '') regexList.push([codePoint, unicode]);
  if (str.trim() === '') trimList.push([codePoint, unicode]);
}

const identical = JSON.stringify(regexList) === JSON.stringify(trimList);
const list = regexList.reduce((str, [codePoint, unicode]) => `${str}${unicode} ${codePoint}\n`, '');

console.log({identical});
console.log(list);

列表(V8 中):

0009 9
000a 10
000b 11
000c 12
000d 13
0020 32
00a0 160
1680 5760
2000 8192
2001 8193
2002 8194
2003 8195
2004 8196
2005 8197
2006 8198
2007 8199
2008 8200
2009 8201
200a 8202
2028 8232
2029 8233
202f 8239
205f 8287
3000 12288
feff 65279

Here's an expansion of primvdb's answer, covering the entire 16-bit space, including unicode code point values and a comparison with str.trim(). I tried to edit the answer to improve it, but my edit was rejected, so I had to post this new one.

Identify all single-byte characters which will be matched as whitespace regex \s or by String.prototype.trim():

const regexList = [];
const trimList = [];

for (let codePoint = 0; codePoint < 2 ** 16; codePoint += 1) {
  const str = String.fromCodePoint(codePoint);
  const unicode = codePoint.toString(16).padStart(4, '0');

  if (str.replace(/\s/, '') === '') regexList.push([codePoint, unicode]);
  if (str.trim() === '') trimList.push([codePoint, unicode]);
}

const identical = JSON.stringify(regexList) === JSON.stringify(trimList);
const list = regexList.reduce((str, [codePoint, unicode]) => `${str}${unicode} ${codePoint}\n`, '');

console.log({identical});
console.log(list);

The list (in V8):

0009 9
000a 10
000b 11
000c 12
000d 13
0020 32
00a0 160
1680 5760
2000 8192
2001 8193
2002 8194
2003 8195
2004 8196
2005 8197
2006 8198
2007 8199
2008 8200
2009 8201
200a 8202
2028 8232
2029 8233
202f 8239
205f 8287
3000 12288
feff 65279
明月夜 2024-11-15 12:48:44

Firefox 中 \s - 匹配单个空白字符,包括空格、制表符、换页符、换行符。相当于[\f\n\r\t\v\u00A0\u2028\u2029]。

例如,/\s\w*/ 匹配“foo bar”中的“bar”。

https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions

In Firefox \s - matches a single white space character, including space, tab, form feed, line feed. Equivalent to [ \f\n\r\t\v\u00A0\u2028\u2029].

For example, /\s\w*/ matches ' bar' in "foo bar."

https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文