为什么变量名不能以数字开头?

发布于 2024-07-09 23:54:28 字数 275 浏览 9 评论 0原文

不久前,我与一位新 C++ 开发人员一起工作,当时他问了一个问题:“为什么变量名不能以数字开头?”

除了某些数字可以包含文本(123456L、123456U)之外,我无法给出答案,如果编译器认为带有一定数量的字母字符的所有内容都是变量名,那么这是不可能的。

这是正确的答案吗? 还有更多的理由吗?

string 2BeOrNot2Be = "that is the question"; // Why won't this compile?

I was working with a new C++ developer a while back when he asked the question: "Why can't variable names start with numbers?"

I couldn't come up with an answer except that some numbers can have text in them (123456L, 123456U) and that wouldn't be possible if the compilers were thinking everything with some amount of alpha characters was a variable name.

Was that the right answer? Are there any more reasons?

string 2BeOrNot2Be = "that is the question"; // Why won't this compile?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(24

鹤舞 2024-07-16 23:54:28

因为这样一串数字既是有效的标识符,也是有效的数字。

int 17 = 497;
int 42 = 6 * 9;
String 1111 = "Totally text";

Because then a string of digits would be a valid identifier as well as a valid number.

int 17 = 497;
int 42 = 6 * 9;
String 1111 = "Totally text";
谁的新欢旧爱 2024-07-16 23:54:28

好好想一想:

int 2d = 42;
double a = 2d;

什么是a? 2.0? 还是42?

提示,如果你不明白,数字后面的 d 表示它前面的数字是双精度数字

Well think about this:

int 2d = 42;
double a = 2d;

What is a? 2.0? or 42?

Hint, if you don't get it, d after a number means the number before it is a double literal

金橙橙 2024-07-16 23:54:28

现在它已成为一种惯例,但最初是作为一项技术要求。

过去,FORTRAN 或 BASIC 等语言的解析器不需要使用空格。 因此,基本上,以下内容是相同的:

10 V1=100
20 PRINT V1

现在

10V1=100
20PRINTV1

假设允许使用数字前缀。 你会如何解释这一点?

101V=100

as

10 1V = 100

或 as

101 V = 100

或 as

1 01V = 100

所以,这被定为非法。

It's a convention now, but it started out as a technical requirement.

In the old days, parsers of languages such as FORTRAN or BASIC did not require the uses of spaces. So, basically, the following are identical:

10 V1=100
20 PRINT V1

and

10V1=100
20PRINTV1

Now suppose that numeral prefixes were allowed. How would you interpret this?

101V=100

as

10 1V = 100

or as

101 V = 100

or as

1 01V = 100

So, this was made illegal.

迎风吟唱 2024-07-16 23:54:28

因为编译时词法分析避免了回溯。 像这样的变量:

Apple;

当遇到字母“A”时,编译器会立即知道它是一个标识符。

然而,像这样的变量:

123apple;

编译器在遇到“a”之前将​​无法确定它是数字还是标识符,因此需要回溯。

Because backtracking is avoided in lexical analysis while compiling. A variable like:

Apple;

the compiler will know it's a identifier right away when it meets letter 'A'.

However a variable like:

123apple;

compiler won't be able to decide if it's a number or identifier until it hits 'a', and it needs backtracking as a result.

北方的巷 2024-07-16 23:54:28

编译器/解析器/词法分析器对我来说是很久很久以前的事了,但我想我记得明确确定编译单元中的数字字符是代表文字还是标识符是很困难的。

由于这个原因,空格不重要的语言(比如 ALGOL 和原始的 FORTRAN,如果我没记错的话)不能接受数字作为标识符的开头。

这可以追溯到很久以前——在表示存储或数字基数的特殊符号之前。

Compilers/parsers/lexical analyzers was a long, long time ago for me, but I think I remember there being difficulty in unambiguosly determining whether a numeric character in the compilation unit represented a literal or an identifier.

Languages where space is insignificant (like ALGOL and the original FORTRAN if I remember correctly) could not accept numbers to begin identifiers for that reason.

This goes way back - before special notations to denote storage or numeric base.

神魇的王 2024-07-16 23:54:28

我同意允许标识符以数字开头会很方便。 一两个人提到,您可以通过在标识符前面添加下划线来绕过此限制,但这确实很难看。

我认为部分问题来自数字文字,例如 0xdeadbeef,这使得很难为以数字开头的标识符制定易于记忆的规则。 一种方法可能是允许任何匹配 [A-Za-z_]+ 的内容,但不是关键字或数字文字。 问题是,这会导致奇怪的事情,比如允许 0xdeadpork,但不允许 0xdeadbeef。 最终,我认为我们应该公平对待所有肉类:P。

当我第一次学习 C 语言时,我记得感觉变量名称的规则是任意的和限制性的。 最糟糕的是,它们很难记住,所以我放弃了学习它们。 我只是做了感觉正确的事情,而且效果很好。 现在我已经学到了很多东西,看起来也没有那么糟糕,我终于抽出时间来学习它了。

I agree it would be handy to allow identifiers to begin with a digit. One or two people have mentioned that you can get around this restriction by prepending an underscore to your identifier, but that's really ugly.

I think part of the problem comes from number literals such as 0xdeadbeef, which make it hard to come up with easy to remember rules for identifiers that can start with a digit. One way to do it might be to allow anything matching [A-Za-z_]+ that is NOT a keyword or number literal. The problem is that it would lead to weird things like 0xdeadpork being allowed, but not 0xdeadbeef. Ultimately, I think we should be fair to all meats :P.

When I was first learning C, I remember feeling the rules for variable names were arbitrary and restrictive. Worst of all, they were hard to remember, so I gave up trying to learn them. I just did what felt right, and it worked pretty well. Now that I've learned alot more, it doesn't seem so bad, and I finally got around to learning it right.

街角迷惘 2024-07-16 23:54:28

变量名不能以数字开头,因为它可能会导致一些问题,例如:

int a = 2;
int 2 = 5;
int c = 2 * a; 

c 的值是多少? 是4,还是10!

另一个例子:

float 5 = 25;
float b = 5.5;

前 5 是一个数字,还是一个对象(. 运算符)
第二个5也有类似的问题。

也许,还有其他一些原因。 因此,我们不应该在变量名的开头使用任何数字。

Variable names cannot start with a digit, because it can cause some problems like below:

int a = 2;
int 2 = 5;
int c = 2 * a; 

what is the value of c? is 4, or is 10!

another example:

float 5 = 25;
float b = 5.5;

is first 5 a number, or is an object (. operator)
There is a similar problem with second 5.

Maybe, there are some other reasons. So, we shouldn't use any digit in the beginnig of a variable name.

猫弦 2024-07-16 23:54:28

这可能是出于多种原因做出的决定,当您解析令牌时,您只需查看第一个字符即可确定它是标识符还是文字,然后将其发送到正确的函数进行处理。 这就是性能优化。

另一种选择是检查它是否不是文字,并将标识符的域保留为宇宙减去文字。 但要做到这一点,您必须检查每个标记的每个字符以了解如何对其进行分类。

还有文体上的含义,标识符应该是助记符,因此单词比数字更容易记住。 当许多原始语言被编写并设定未来几十年的风格时,他们并没有考虑用“2”代替“to”。

It's likely a decision that came for a few reasons, when you're parsing the token you only have to look at the first character to determine if it's an identifier or literal and then send it to the correct function for processing. So that's a performance optimization.

The other option would be to check if it's not a literal and leave the domain of identifiers to be the universe minus the literals. But to do this you would have to examine every character of every token to know how to classify it.

There is also the stylistic implications identifiers are supposed to be mnemonics so words are much easier to remember than numbers. When a lot of the original languages were being written setting the styles for the next few decades they weren't thinking about substituting "2" for "to".

污味仙女 2024-07-16 23:54:28

该限制是任意的。 各种 Lisp 允许符号名称以数字开头。

The restriction is arbitrary. Various Lisps permit symbol names to begin with numerals.

水溶 2024-07-16 23:54:28

COBOL 允许变量以数字开头。

COBOL allows variables to begin with a digit.

伴我心暖 2024-07-16 23:54:28

使用数字来开始变量名会使编译或解释过程中的错误检查变得更加复杂。

允许使用以数字开头的变量名可能会给语言设计者带来巨大的问题。 在源代码解析过程中,每当编译器/解释器遇到以数字开头且需要变量名的标记时,它就必须搜索大量复杂的规则来确定该标记是否确实是一个变量,还是一个错误。 语言解析器所增加的复杂性可能无法证明此功能的合理性。

从我记事起(大约 40 年),我认为我还没有使用过允许使用数字作为变量名称开头的语言。 我确信这至少被做过一次。 也许,这里有人真的在什么地方见过这个。

Use of a digit to begin a variable name makes error checking during compilation or interpertation a lot more complicated.

Allowing use of variable names that began like a number would probably cause huge problems for the language designers. During source code parsing, whenever a compiler/interpreter encountered a token beginning with a digit where a variable name was expected, it would have to search through a huge, complicated set of rules to determine whether the token was really a variable, or an error. The added complexity added to the language parser may not justify this feature.

As far back as I can remember (about 40 years), I don't think that I have ever used a language that allowed use of a digit to begin variable names. I'm sure that this was done at least once. Maybe, someone here has actually seen this somewhere.

抱着落日 2024-07-16 23:54:28

正如一些人所注意到的,关于变量名称的有效格式有很多历史包袱。 语言设计者在创造新语言时总是受到他们所知道的信息的影响。

也就是说,几乎所有时候语言都不允许变量名称以数字开头,因为这是语言设计的规则。 通常是因为如此简单的规则使得语言的解析和词法分析变得更加容易。 不过,并非所有语言设计者都知道这是真正的原因。 现代词法分析工具会有所帮助,因为如果您尝试将其定义为允许的,它们会给您带来解析冲突。

OTOH,如果您的语言有一个唯一可识别的字符来预示变量名称,则可以将其设置为以数字开头。 类似的规则变体也可用于允许变量名称中存在空格。 但由此产生的语言很可能与任何流行的传统语言不太相似(如果有的话)。

有关允许变量以数字开头并嵌入空格的相当简单的 HTML 模板语言的示例,请查看 Qompose< /a>.

As several people have noticed, there is a lot of historical baggage about valid formats for variable names. And language designers are always influenced by what they know when they create new languages.

That said, pretty much all of the time a language doesn't allow variable names to begin with numbers is because those are the rules of the language design. Often it is because such a simple rule makes the parsing and lexing of the language vastly easier. Not all language designers know this is the real reason, though. Modern lexing tools help, because if you tried to define it as permissible, they will give you parsing conflicts.

OTOH, if your language has a uniquely identifiable character to herald variable names, it is possible to set it up for them to begin with a number. Similar rule variations can also be used to allow spaces in variable names. But the resulting language is likely to not to resemble any popular conventional language very much, if at all.

For an example of a fairly simple HTML templating language that does permit variables to begin with numbers and have embedded spaces, look at Qompose.

云之铃。 2024-07-16 23:54:28

因为如果您允许关键字和标识符以数字字符开头,则词法分析器(编译器的一部分)无法轻松区分数字文字和关键字的开头,而不会变得更加复杂(并且速度更慢)。

Because if you allowed keyword and identifier to begin with numberic characters, the lexer (part of the compiler) couldn't readily differentiate between the start of a numeric literal and a keyword without getting a whole lot more complicated (and slower).

傲世九天 2024-07-16 23:54:28

C++ 不能拥有它,因为语言设计者制定了一条规则。 如果您要创建自己的语言,您当然可以允许它,但您可能会遇到与他们相同的问题并决定不允许它。 可能导致问题的变量名示例:

0x、2d、5555

C++ can't have it because the language designers made it a rule. If you were to create your own language, you could certainly allow it, but you would probably run into the same problems they did and decide not to allow it. Examples of variable names that would cause problems:

0x, 2d, 5555

只涨不跌 2024-07-16 23:54:28

放松句法约定的关键问题之一是它在编码过程中引入了认知失调。 您对代码的看法可能会因缺乏清晰度而受到深刻影响。

Dykstra 不是说过“任何工具最重要的方面是它对用户的影响”吗?

One of the key problems about relaxing syntactic conventions is that it introduces cognitive dissonance into the coding process. How you think about your code could be deeply influenced by the lack of clarity this would introduce.

Wasn't it Dykstra who said that the "most important aspect of any tool is its effect on its user"?

一向肩并 2024-07-16 23:54:28

编译器分为以下七个阶段:

  1. 词法分析
  2. 语法分析
  3. 语义分析
  4. 中间代码生成
  5. 代码优化
  6. 代码生成
  7. 符号表

在编译代码段时,在词法分析阶段避免了回溯。 像Apple这样的变量,编译器在词法分析阶段遇到字母“A”字符时就会立即知道它的标识符。 但是,对于像 123apple 这样的变量,编译器将无法确定它是数字还是标识符,直到它遇到“a”,并且需要回溯才能进入词法分析阶段以识别它是变量。 但编译器不支持它。

当您解析令牌时,您只需查看第一个字符即可确定它是标识符还是文字,然后将其发送到正确的函数进行处理。 这就是性能优化。

The compiler has 7 phase as follows:

  1. Lexical analysis
  2. Syntax Analysis
  3. Semantic Analysis
  4. Intermediate Code Generation
  5. Code Optimization
  6. Code Generation
  7. Symbol Table

Backtracking is avoided in the lexical analysis phase while compiling the piece of code. The variable like Apple, the compiler will know its an identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple, the compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in the compiler.

When you’re parsing the token you only have to look at the first character to determine if it’s an identifier or literal and then send it to the correct function for processing. So that’s a performance optimization.

指尖上的星空 2024-07-16 23:54:28

可能是因为它使人类更容易区分它是数字还是标识符,并且因为传统。 拥有以数字开头的标识符不会使词法扫描变得那么复杂。

并非所有语言都禁止使用以数字开头的标识符。 在 Forth 中,它们可以是数字,并且小整数通常被定义为 Forth 单词(本质上是标识符),因为将“2”作为例程读取以将 2 压入堆栈比将“2”识别为数字更快其值为 2。(在处理来自程序员或磁盘块的输入时,Forth 系统会根据空格分割输入。它会尝试在字典中查找标记以查看它是否是已定义的单词,并且如果不是,将尝试将其转换为数字,如果不是,将标记一个错误。)

Probably because it makes it easier for the human to tell whether it's a number or an identifier, and because of tradition. Having identifiers that could begin with a digit wouldn't complicate the lexical scans all that much.

Not all languages have forbidden identifiers beginning with a digit. In Forth, they could be numbers, and small integers were normally defined as Forth words (essentially identifiers), since it was faster to read "2" as a routine to push a 2 onto the stack than to recognize "2" as a number whose value was 2. (In processing input from the programmer or the disk block, the Forth system would split up the input according to spaces. It would try to look the token up in the dictionary to see if it was a defined word, and if not would attempt to translate it into a number, and if not would flag an error.)

物价感观 2024-07-16 23:54:28

假设您确实允许符号名称以数字开头。 现在假设您想将变量命名为 12345foobar。 您如何区分它和 12345? 实际上,使用正则表达式并不难。 问题实际上是性能问题之一。 我无法真正详细解释为什么会这样,但本质上可以归结为区分 12345foobar 和 12345 需要回溯这一事实。 这使得正则表达式具有不确定性。

此处对此有更好的解释。

Suppose you did allow symbol names to begin with numbers. Now suppose you want to name a variable 12345foobar. How would you differentiate this from 12345? It's actually not terribly difficult to do with a regular expression. The problem is actually one of performance. I can't really explain why this is in great detail, but it essentially boils down to the fact that differentiating 12345foobar from 12345 requires backtracking. This makes the regular expression non-deterministic.

There's a much better explanation of this here.

[浮城] 2024-07-16 23:54:28

编译器很容易在内存位置上使用 ASCII 而不是 number 来识别变量。

it is easy for a compiler to identify a variable using ASCII on memory location rather than number .

梦里兽 2024-07-16 23:54:28

我认为简单的答案是可以,限制是基于语言的。 在 C++ 和许多其他语言中,它不能,因为该语言不支持它。 规则中并没有允许这样做。

这个问题类似于问为什么国际象棋中国王不能一次移动四个空格? 因为在国际象棋中这是非法的一步。 肯定可以在其他游戏中使用。 这仅取决于所遵循的规则。

I think the simple answer is that it can, the restriction is language based. In C++ and many others it can't because the language doesn't support it. It's not built into the rules to allow that.

The question is akin to asking why can't the King move four spaces at a time in Chess? It's because in Chess that is an illegal move. Can it in another game sure. It just depends on the rules being played by.

海风掠过北极光 2024-07-16 23:54:28

最初只是因为将变量名称作为字符串而不是数字更容易记住(您可以赋予它更多含义),尽管数字可以包含在字符串中以增强字符串的含义或允许使用相同的变量名称,但是将其指定为具有单独但密切的含义或上下文。 例如,loop1、loop2 等总是让您知道您处于循环中和/或循环 2 是循环 1 内的循环。
您更喜欢哪个(具有更多含义)作为变量:地址或 1121298? 哪个更容易记住?
但是,如果该语言使用某些东西来表示它不仅仅是文本或数字(例如 $address 中的 $),那么它实际上应该没有什么区别,因为这会告诉编译器接下来的内容将被视为变量(在这种情况下)。
无论如何,这取决于语言设计者想要使用什么作为他们的语言的规则。

Originally it was simply because it is easier to remember (you can give it more meaning) variable names as strings rather than numbers although numbers can be included within the string to enhance the meaning of the string or allow the use of the same variable name but have it designated as having a separate, but close meaning or context. For example loop1, loop2 etc would always let you know that you were in a loop and/or loop 2 was a loop within loop1.
Which would you prefer (has more meaning) as a variable: address or 1121298? Which is easier to remember?
However, if the language uses something to denote that it not just text or numbers (such as the $ in $address) it really shouldn't make a difference as that would tell the compiler that what follows is to be treated as a variable (in this case).
In any case it comes down to what the language designers want to use as the rules for their language.

悲凉≈ 2024-07-16 23:54:28

编译器在编译期间也可以将变量视为一个值
所以该值可能会一次又一次递归地调用该值

The variable may be considered as a value also during compile time by the compiler
so the value may call the value again and again recursively

后来的我们 2024-07-16 23:54:28

编译代码段时在词法分析阶段避免回溯。 像苹果这样的变量; ,当编译器在词法分析阶段遇到字母“A”字符时,它会立即知道它是一个标识符。 然而,像 123apple 这样的变量; ,编译器将无法确定它是数字还是标识符,直到它命中“a”,并且需要回溯才能进入词法分析阶段以识别它是变量。 但编译器不支持它。

参考

Backtracking is avoided in lexical analysis phase while compiling the piece of code. The variable like Apple; , the compiler will know its a identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple; , compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in compiler.

Reference

追星践月 2024-07-16 23:54:28

当声明变量时,它可能没有任何问题。但是当它尝试在其他地方使用该变量时,会出现一些歧义,如下所示:

let 1 =“Hello world!”
打印(1)
print(1)

print 是一个接受所有类型变量的通用方法。 因此在这种情况下,编译器不知道程序员指的是哪一个(1):整数值的 1 或存储字符串值的 1。
在这种情况下,编译器可能更好地允许定义类似的东西,但是当尝试使用这种不明确的东西时,请带来一个具有纠正能力的错误,以了解如何修复该错误并清除这种歧义。

There could be nothing wrong with it when comes into declaring variable.but there is some ambiguity when it tries to use that variable somewhere else like this :

let 1 = "Hello world!"
print(1)
print(1)

print is a generic method that accepts all types of variable. so in that situation compiler does not know which (1) the programmer refers to : the 1 of integer value or the 1 that store a string value.
maybe better for compiler in this situation to allows to define something like that but when trying to use this ambiguous stuff, bring an error with correction capability to how gonna fix that error and clear this ambiguity.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文