当前位置：文江博客话题详情

原始数据类型应该大写吗？

发布于 2024-08-13 16:29:31 字数 188 浏览 1 评论 0原文

如果你要发明一种新语言，你认为原始数据类型应该大写吗，例如 Int、Float、Double、String 与标准类命名约定保持一致？为什么或为什么不呢？

我所说的“原始”并不是说它们不能是（或表现得像）对象。我想我应该说“基本”数据类型。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

墨离汐 2024-08-20 16:29:31

如果我要发明一种新语言，它不会有原始数据类型，只有包装对象。我已经在 Java 中完成了足够多的包装器到基元到包装器的转换，足以让我终生受益。

至于大小写？我会使用区分大小写的第一个字母大写，部分原因是这是我大脑中根深蒂固的惯例，部分原因是为了传达这样一个事实：嘿，这些也是对象。

回复收藏 0 原文

南烟 2024-08-20 16:29:31

不区分大小写会导致一些疯狂的国际化问题；想想元音变音、波形符等。它使编译器变得更难，并允许程序员自由，但不会产生更好的代码。说真的，你认为对于 C 语言中大括号的位置有足够多的争论......只需观看即可。

至于基元看起来像类......只有当你可以子类化基元时。不要假设每个人都将类名大写； C++ 标准库没有。

就我个人而言，我想要一种具有两种整数类型的语言：

int：平台上最快的任何整数类型，以及
int(bits)：具有给定位数的整数。

您可以从中 typedef 获取您需要的任何内容。然后也许我可以获得一个固定（w，f）类型（分别是小数点左侧和右侧的位数）和浮动（m，e）类型。 uint 和 ufixed 表示无符号。（任何想要无符号浮点数的人都可以请求。）并标准化位字段如何打包到结构中。如果编译器无法处理特定数量的位，它应该这样说并中止。

为什么，是的，我对嵌入式系统进行编程，并且厌倦了 int 和 long 每隔几年改变大小，你怎么知道？ ^_-

回复收藏 0 原文

烟凡古楼 2024-08-20 16:29:31

（警告：大量帖子。如果您想要我对这个问题的最终答案，请跳到底部，我会在那里回答它。如果您这样做，并且您认为我在胡言乱语，请在尝试之前阅读其余部分与我的“公牛争论。”）

如果我要制作一种编程语言，这里有一些警告：

类型系统或多或少是 Perl 6（但我完全首先想到了这个想法：P） - 动态和弱类型化，具有可以强加于其之上的更强大的（我认为是哈斯克尔式的）类型系统。
语言关键字的数量将达到最低限度。其他一切都将是可重新分配的一流对象（类型、函数等）。
它将是一种非常高级的语言，就像 Perl / Python / Ruby / Haskell / Lisp / 当今流行的任何语言一样。它可能会被解释，但我不排除编译。

如果这些（相当重要的）设计决策中的任何一个不适用于您的理想语言（而且很可能不适用于您的理想语言），那么我的以下（显然有争议的）决策将不适合您。如果你不是我，它可能也不适合你。我认为它适合我的语言，因为它是我的语言。您应该考虑您的语言以及您希望您的语言如何，这样您就可以像 Dennis Ritchie 或 Guido van Rossum 或 Larry Wall 一样长大后做出糟糕的设计并在事后回顾时用充分的论据来捍卫这些决定。

现在，我仍然认为，在我的语言中，标识符将不区分大小写，这将包括变量、函数（这将是变量）、类型（这也将是变量，都是内置的/原始的（这将是可以子类化）和用户定义），你能想到的。

解决出现的问题：

命名一致性是我见过的最好的论点，但我不同意。首先，允许两种不同的类型称为 int 和 Int 是荒谬的。 Java 有 int 和 Integer 的事实几乎与它们都不允许任意精度的事实一样荒谬。（免责声明：我最近非常喜欢“荒谬”这个词。）

通常情况下，我会喜欢允许人们用两个不同的对象（称为 int 和Int 如果他们愿意的话，但这是一个懒惰的问题，以及旧的多字变量名参数的问题。

我个人对 underscore_case 与 MixedCase 与 camelCase 问题的看法是，如果可能的话，它们都很丑陋且可读性较差你应该只使用一个词。在理想的情况下，所有代码都应该以商定的格式（大多数团队使用的风格）存储在源代码管理中，并且团队的反对者应该在他们的 VCS 中有钩子，将所有签出的代码从该风格转换为他们的风格，反之亦然，但我们并不生活在那个世界。

由于某种原因，当我必须不断地编写 MixedCaseVariableOrClassNames 时，它比编写 underscore_separated_variable_or_class_names 更让我烦恼。即使 TimeOfDay 和 time_of_day 也可能是相同的标识符，因为它们在概念上是相同的东西，但我有点犹豫是否要实现这一飞跃，如果只是因为这是一个不寻常的规则（变量名称中的内部下划线被删除）。一方面，它可以结束两种风格之间的争论，但另一方面，它也可能会惹恼人们。

因此，我的最终决定基于两个部分，这两个部分都是非常主观的：

如果我创建一个其他人必须使用的名称，该名称可能会导出到另一个名称空间，我可能会尽可能简单、清晰地命名它。我通常不会使用太多单词，并且会尽可能多地使用小写字母。 sizedint 给我的印象并不比 sized_int 或 SizedInt （就驼峰命名法的例子而言，看起来特别糟糕）因为 dI 恕我直言），所以我会同意。如果你喜欢驼峰命名法（很多人都喜欢），你可以使用它。如果您喜欢下划线，那么您就不走运了，但如果您确实需要，可以编写 sized_int = sizedint 并继续生活。
如果其他人写了它，并且想要使用 sized_int，我可以接受。如果他们编写它并使用 SizedInt，我就不必坚持使用他们烦人的驼峰式命名法，并且在我的代码中，可以自由地将其编写为 sizedint。

说一致性可以帮助我们记住事物的含义是愚蠢的。你说英语还是英语？两者都是，因为它们是同一个词，并且您将它们识别为同一个词。我认为 ee cummings 说对了，我们可能不应该这样做完全不同的情况，但我不能随心所欲地重写大多数人类和计算机语言。我所能做的就是说：“既然案件说的是同样的事情，你为什么还要对案件大惊小怪呢？”并用我自己的语言贯彻这种态度。

函数中的一次性变量（即Person person = /* some */）是一个很好的论点，但我不同意人们会这样做Person thePerson（或一个人一个人）。无论如何，我个人倾向于只做 Person p 。

首先，我不太喜欢大写类型名称（或任何其他东西），如果一次性变量足以将其非描述性地声明为 Person person，那么您就不会失去Person p 提供了很多信息。任何说“非描述性的单字母变量名不好”的人也不应该使用非描述性的多字母变量名，例如 Person person。

变量应该遵循合理的作用域规则（如 C 和 Perl，与 Python 不同 - 火焰战争从这里开始！），因此本地使用的简单名称（如 p）永远不会出现冲突。

至于如果您使用两个具有相同名称（仅大小写不同）的变量来实现 barf，这是一个好主意，但不是。如果有人创建了定义类型 XMLparser 的库 X，而其他人创建了定义类型 XMLParser 的库 Y，并且我想编写一个为以下对象提供相同接口的抽象层：许多 XML 解析器包括这两种类型，我很骨感。即使有了命名空间，这仍然变得非常烦人。

国际化问题已被提出。在我的解释器/编译器（可能是前者）中区分大写和小写变音 U 并不比在我的源代码中容易。

如果一种语言有字符串类型（即该语言不是 C）并且该字符串类型支持 Unicode（即该语言不是 Ruby - 这只是一个笑话，别钉死我），那么该语言已经提供了一种方法将 Unicode 字符串与小写转换，例如 Perl 的 lc() 函数（有时）和 Python 的 unicode.lower() 方法。该函数必须内置于语言中的某个位置并且可以处理 Unicode。

在解释器的编译时而不是运行时调用此函数很简单。对于编译器来说，这只是稍微困难一些，因为无论如何您仍然必须实现这种功能，因此将其包含在编译器中并不比将其包含在运行时库中更难。如果您用语言本身编写编译器（您应该这样做），并且功能内置于语言中，那么您将不会遇到任何问题。

回答你的问题，不。我认为我们不应该利用任何东西，就这样。打字（对我来说）很烦人，允许大小写差异会在大写和小写的事物、驼峰式和下划线的事物、或其他语义上不同但概念上相同的事物之间产生（或允许）不必要的混淆。如果这种区别完全是语义上的，那么我们根本不用理会它。

(Warning: MASSIVE post. If you want my final answer to this question, skip to the bottom section, where I answer it. If you do, and you think I'm spouting a load of bull, please read the rest before trying to argue with my "bull.")

If I were to make a programming language, here are a few caveats:

The type system would be more or less Perl 6 (but I totally came up with the idea first :P) - dynamically and weakly typed, with a stronger (I'm thinking Haskellian) type system that can be imposed on top of it.
There would be a minimal number of language keywords. Everything else would be reassignable first-class objects (types, functions, so on).
It will be a very high level language, like Perl / Python / Ruby / Haskell / Lisp / whatever is fashionable today. It will probably be interpreted, but I won't rule out compilation.

If any of those (rather important) design decisions don't apply to your ideal language (and they may very well not), then my following (apparently controversial) decision won't work for you. If you're not me, it may not work for you either. I think it fits my language, because it's my language. You should think about your language and how you want your language to be so that you, like Dennis Ritchie or Guido van Rossum or Larry Wall, can grow up to make bad design decisions and defend them in retrospect with good arguments.

Now then, I would still maintain that, in my language, identifiers would be case insensitive, and this would include variables, functions (which would be variables), types (which would also be variables, both built-in/primitive (which would be subclass-able) and user-defined), you name it.

To address issues as they come:

Naming consistency is the best argument I've seen, but I disagree. First off, allowing two different types called int and Int is ridiculous. The fact that Java has int and Integer is almost as ridiculous as the fact that neither of them allow arbitrary-precision. (Disclaimer: I've become a big fan of the word "ridiculous" lately.)

Normally I would be a fan of allowing people to shoot themselves in the foot with things like two different objects called int and Int if they want to, but here it's an issue of laziness, and of the old multiple-word-variable-name argument.

My personal take on the issue of underscore_case vs. MixedCase vs. camelCase is that they're both ugly and less readable and if at all possible you should only use a single word. In an ideal world, all code should be stored in your source control in an agreed-upon format (the style that most of the team uses) and the team's dissenters should have hooks in their VCS to convert all checked out code from that style to their style and vice versa for checking back in, but we don't live in that world.

It bothers me for some reason when I have to continually write MixedCaseVariableOrClassNames a lot more than it bothers me to write underscore_separated_variable_or_class_names. Even TimeOfDay and time_of_day might be the same identifier because they're conceptually the same thing, but I'm a bit hesitant to make that leap, if only because it's an unusual rule (internal underscores are removed in variable names). On one hand, it could end the debate between the two styles, but on the other hand it could just annoy people.

So my final decision is based on two parts, which are both highly subjective:

If I make a name others must use that's likely to be exported to another namespace, I'll probably name it as simply and clearly as I can. I usually won't use many words, and I'll use as much lowercase as I can get away with. sizedint doesn't strike me as much better or worse than sized_int or SizedInt (which, as far as examples of camelCase go, looks particularly bad because of the dI IMHO), so I'd go with that. If you like camelCase (and many people do), you can use it. If you like underscores, you're out of luck, but if you really need to you can write sized_int = sizedint and go on with life.
If someone else wrote it, and wanted to use sized_int, I can live with that. If they wrote it and used SizedInt, I don't have to stick with their annoying-to-type camelCase and, in my code, can freely write it as sizedint.

Saying that consistency helps us remember what things mean is silly. Do you speak english or English? Both, because they're the same word, and you recognize them as the same word. I think e.e. cummings was on to something, and we probably shouldn't have different cases at all, but I can't exactly rewrite most human and computer languages out there on a whim. All I can do is say, "Why are you making such a fuss about case when it says the same thing either way?" and implement this attitude in my own language.

Throwaway variables in functions (i.e. Person person = /* something */) is a pretty good argument, but I disagree that people would do Person thePerson (or Person aPerson). I personally tend to just do Person p anyway.

I'm not much fond of capitalizing type names (or much of anything) in the first place, and if it's enough of a throwaway variable to declare it undescriptively as Person person, then you won't lose much information with Person p. And anyone who says "non-descriptive one-letter variable names are bad" shouldn't be using non-descriptive many-letter variable names either, like Person person.

Variables should follow sane scoping rules (like C and Perl, unlike Python - flame war starts here guys!), so conflicts in simple names used locally (like p) should never arise.

As to making the implementation barf if you use two variables with the same names differing only in case, that's a good idea, but no. If someone makes library X that defines the type XMLparser and someone else makes library Y that defines the type XMLParser, and I want to write an abstraction layer that provides the same interface for many XML parsers including the two types, I'm pretty boned. Even with namespaces, this still becomes prohibitively annoying to pull off.

Internationalization issues have been brought up. Distinguishing between capital and lowercase umlautted U's will be no easier in my interpreter/compiler (probably the former) than in my source code.

If a language has a string type (i.e. the language isn't C) and the string type supports Unicode (i.e. the language isn't Ruby - it's only a joke, don't crucify me), then the language already provides a way to convert Unicode strings to and from lowercase, like Perl's lc() function (sometimes) and Python's unicode.lower() method. This function must be built into the language somewhere and can handle Unicode.

Calling this function during an interpreter's compile-time rather than its runtime is simple. For a compiler it's only marginally harder, because you'll still have to implement this kind of functionality anyway, so including it in the compiler is no harder than including it in the runtime library. If you're writing the compiler in the language itself (and you should be), and the functionality is built into the language, you'll have no problems.

To answer your question, no. I don't think we should be capitalizing anything, period. It's annoying to type (to me) and allowing case differences creates (or allows) unnecessary confusion between capitalized and lowercased things, or camelCased and under_scored things, or other sets of semantically-distinct-but-conceptually-identical things. If the distinction is entirely semantic, let's not bother with it at all.

回复收藏 0 原文

~没有更多了~