如何避免用动态类型语言创建临时类型系统?

发布于 2024-10-07 12:24:41 字数 1081 浏览 0 评论 0原文

在我使用没有类型系统的语言开始的每个项目中,我最终开始发明运行时类型系统。也许“类型系统”这个词太强了;至少,当我处理复杂的数据类型时,我创建了一组类型/值范围验证器,然后我觉得有必要对可以在哪里创建和修改数据类型保持偏执。

直到现在我还没有多想。作为一名独立开发人员,我的方法已经在许多小型项目的实践中发挥了作用,而且它们现在没有理由停止工作。

尽管如此,这一定是错误的。我觉得好像我没有“正确”使用动态类型语言。如果我必须发明一个类型系统并自己实施它,我也可能会使用一种一开始就具有类型的语言。

所以,我的问题是:

  • 是否存在现有的编程范例(对于没有类型的语言)可以避免使用或发明类型系统的必要性?
  • 对于如何解决静态类型在动态类型语言中解决的问题(无需羞怯地重新发明类型),是否还有其他常见的建议?

这是一个具体的例子供您考虑。我正在 erlang(一种动态的强类型语言)中处理日期时间和时区。这是我使用的常见数据类型:

{{Y,M,D},{tztime, {time, HH,MM,SS}, Flag}}

...其中 {Y,M,D} 是表示有效日期的元组(所有条目均为整数),tztimetime 是原子,HH,MM,SS 是代表 24 小时时间的整数,Flag 是原子之一 u,d,z,s,w

此数据类型通常从输入中解析,因此为了确保有效的输入和正确的解析器,需要检查值的类型正确性和有效范围。随后,该数据类型的实例会相互比较,使得它们值的类型变得更加重要,因为所有术语都会进行比较。来自 erlang 参考手册

number < atom < reference < fun < port < pid < tuple < list < bit string

In every project I've started in languages without type systems, I eventually begin to invent a runtime type system. Maybe the term "type system" is too strong; at the very least, I create a set of type/value-range validators when I'm working with complex data types, and then I feel the need to be paranoid about where data types can be created and modified.

I hadn't thought twice about it until now. As an independent developer, my methods have been working in practice on a number of small projects, and there's no reason they'd stop working now.

Nonetheless, this must be wrong. I feel as if I'm not using dynamically-typed languages "correctly". If I must invent a type system and enforce it myself, I may as well use a language that has types to begin with.

So, my questions are:

  • Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?
  • Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?

Here is a concrete example for you to consider. I'm working with datetimes and timezones in erlang (a dynamic, strongly typed language). This is a common datatype I work with:

{{Y,M,D},{tztime, {time, HH,MM,SS}, Flag}}

... where {Y,M,D} is a tuple representing a valid date (all entries are integers), tztime and time are atoms, HH,MM,SS are integers representing a sane 24-hr time, and Flag is one of the atoms u,d,z,s,w.

This datatype is commonly parsed from input, so to ensure valid input and a correct parser, the values need to be checked for type correctness, and for valid ranges. Later on, instances of this datatype are compared to each other, making the type of their values all the more important, since all terms compare. From the erlang reference manual

number < atom < reference < fun < port < pid < tuple < list < bit string

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

我的影子我的梦 2024-10-14 12:24:41

除了静态与动态以及强类型与弱类型的混淆之外:

大多数现有的静态类型系统并没有真正解决您想要在示例中实现的内容。无论您拥有哪种类型的系统,范围检查和复杂性(例如 2 月 31 日)以及特别是解析的输入通常都会在运行时进行检查。

你的例子是在 Erlang 中,我有一些建议:

  • 使用记录。除了出于多种原因而有用和有帮助之外,它还可以让您轻松地进行运行时类型检查,而无需付出很多努力,例如:

    is_same_day(#datetime{年=Y1,月=M1,日=D1}, 
                #datetime{年=Y2,月=M2,日=D2}) -> ...
    

    Effortless 仅匹配两个日期时间记录。如果来源不受信任,您甚至可以添加防护来检查范围。并且它符合 erlangs let it crash 的错误处理方法:如果没有找到匹配,你会得到一个错误的匹配,并且可以在适当的级别(通常是主管级别)处理这个问题。

  • 通常编写你的代码,当假设无效时它会崩溃

  • 如果这感觉静态检查不够:使用 typerdialyzer 来查找可以静态发现的错误类型,剩下的任何内容都将在运行时检查。

  • 不要在函数中对您接受的“类型”限制太多,有时,即使对于不同的输入,仅执行一些有用的操作的附加功能比检查每个上的类型和范围更有价值功能。如果你在重要的地方这样做,通常你会尽早发现错误,从而很容易修复它。对于函数式语言来说尤其如此,在函数式语言中,您总是知道每个值来自哪里。

Aside from the confsion of static vs. dynamic and strong vs. weak typing:

What you want to implement in your example isn't really solved by most existing static typing systems. Range checks and complications like February 31th and especially parsed input are usually checked during runtime no matter what type system you have.

Your example being in Erlang I have a few recommendations:

  • Use records. Besides being usefull and helpfull for a whole bunch of reasons, the give you easy runtime type checking without a lot of effort e.g.:

    is_same_day(#datetime{year=Y1, month=M1, day=D1}, 
                #datetime{year=Y2, month=M2, day=D2}) -> ...
    

    Effortless only matches for two datetime records. You could even add guards to check for ranges if the source is untrusted. And it conforms to erlangs let it crash method of error handling: if no match is found you get a badmatch, and can handle this on the level where it is apropriate (usually the supervisor level).

  • Generally write your code that it crashes when the assumptions are not valid

  • If this doesn't feel static checked enough: use typer and dialyzer to find the kind of errors that can be found statically, whatever remains will be checkd at runtime.

  • Don't be too restrictive in your functions what "types" you accept, sometimes the added functionality of just doing someting useful even for different inputs is worth more than checking the types and ranges on every function. If you do it where it matters usually you will catch the error early enough for it to be easy fixable. This is especially true for a functionaly language where you allways know where every value comes from.

入画浅相思 2024-10-14 12:24:41

有很多好的答案,让我补充一下:

是否存在现有的编程范例(对于没有类型的语言)可以避免使用或发明类型系统的必要性?

最重要的范例,尤其是在 Erlang 中,是这样的:假设类型正确,否则就崩溃。不要编写过多的检查偏执代码,而是假设您获得的输入是正确的类型或正确的模式。不要写(这个规则有例外,但一般来说)

foo({tag, ...}) -> do_something(..);
foo({tag2, ...}) -> do_something_else(..);
foo(Otherwise)  ->
    report_error(Otherwise),
    try to fix problem here...

杀死最后一个子句并让它立即崩溃。让主管和其他进程进行清理(您可以使用 monitors() 进行管理进程以了解何时发生崩溃)。

但是一定要精确。 请编写

bar(N) when is_integer(N) -> ...

baz([]) -> ...
baz(L) when is_list(L) -> ...

如果已知该函数只能分别处理整数或列表, 。是的,这是运行时检查,但目标是向程序员传达信息。此外,HiPE 倾向于利用提示进行优化,并在可能的情况下消除类型检查。因此,价格可能比您想象的要低。

您选择非类型/动态类型语言,因此您必须付出的代价是类型检查和冲突错误将在运行时发生。正如其他帖子所暗示的那样,静态类型语言也不能免于进行一些检查 - 类型系统(通常)是正确性证明的近似值。在大多数静态语言中,您经常会收到无法信任的输入。该输入在应用程序的“边界”处进行转换,然后转换为内部格式。转换用于标记信任:从现在开始,该事物已得到验证,我们可以假设有关它的某些事情。这种假设的力量和正确性直接与其类型签名以及程序员处理语言静态类型的能力直接相关。

关于如何解决静态类型在动态类型语言中解决的问题(无需羞怯地重新发明类型),还有其他常见的建议吗?

Erlang 有 dialyzer,可用于静态分析并推断程序的类型。它不会像 Ocaml 中的类型检查器那样出现那么多类型错误,但它也不会“喊狼来了”:来自透析器的错误可以证明是程序中的错误。并且它不会拒绝可能运行正常的程序。一个简单的例子是:

and(true, true) -> true;
and(true, _)    -> false;
and(false, _)   -> false.

调用 and(true, Greatmistake) 将返回 false,但静态类型系统将拒绝该程序,因为它将从第一行推断出类型签名采用 boolean() 值作为第二个参数。相反,透析器将接受此函数并为其提供签名 (boolean(), term()) ->布尔值()。它可以做到这一点,因为不需要先验地保护错误。如果出现错误,运行时系统会进行类型检查来捕获错误。

A lot of good answers, let me add:

Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?

The most important paradigm, especially in Erlang, is this: Assume the type is right, otherwise let it crash. Don't write excessively checking paranoid code, but assume that the input you get is of the right type or the right pattern. Don't write (there are exceptions to this rule, but in general)

foo({tag, ...}) -> do_something(..);
foo({tag2, ...}) -> do_something_else(..);
foo(Otherwise)  ->
    report_error(Otherwise),
    try to fix problem here...

Kill the last clause and have it crash right away. Let a supervisor and other processes do the cleanup (you can use monitors() for janitorial processes to know when a crash has occurred).

Do be precise however. Write

bar(N) when is_integer(N) -> ...

baz([]) -> ...
baz(L) when is_list(L) -> ...

if the function is known only to work with integers or lists respectively. Yes, it is a runtime check but the goal is to convey information to the programmer. Also, HiPE tend to utilize the hint for optimization and eliminate the type check if possible. Hence, the price may be less than what you think it is.

You choose an untyped/dynamically-typed language so the price you have to pay is that type checking and errors from clashes will happen at runtime. As other posts hint, a statically typed language is not exempt from doing some checks as well - the type system is (usually) an approximation of a proof of correctness. In most static languages you often get input which you can't trust. This input is transformed at the "border" of the application and then converted to an internal format. The conversion serves to mark trust: From now on, the thing has been validated and we can assume certain things about it. The power and correctness of this assumption is directly tied to its type signature and how good the programmer is with juggling the static types of the language.

Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?

Erlang has the dialyzer which can be used to statically analyze and infer types of your programs. It will not come up with as many type errors as a type checker in e.g., Ocaml, but it won't "cry wolf" either: An error from the dialyzer is provably an error in the program. And it won't reject a program which may be working ok. A simple example is:

and(true, true) -> true;
and(true, _)    -> false;
and(false, _)   -> false.

The invocation and(true, greatmistake) will return false, yet a static type system will reject the program because it will infer from the first line that the type signature takes a boolean() value as the 2nd parameter. The dialyzer will accept this function in contrast and give it the signature (boolean(), term()) -> boolean(). It can do this, because there is no need to protect a priori for an error. If there is a mistake, the runtime system has a type check that will capture it.

猫性小仙女 2024-10-14 12:24:41

为了使静态类型语言能够匹配动态类型语言的灵活性,我认为它需要很多(也许是无限多个)功能。

在 Haskell 世界中,人们会听到很多复杂的、有时甚至令人恐惧的术语。类型类。参数多态性。广义代数数据类型。类型家庭。功能依赖性。 Ωmega 编程语言 更进一步,该网站列出了“类型级函数”和“级别多态性”等等。

这些都是什么?静态类型添加的功能使其更加灵活。这些功能确实很酷,而且往往很优雅且令人兴奋,但通常很难理解。抛开学习曲线不谈,类型系统通常无法优雅地模拟现实世界的问题。一个特别好的例子是与其他语言交互(这是 C# 4 的 <代码>动态功能)。

动态类型语言使您可以灵活地实现自己的数据规则和假设框架,而不是受到永远有限的静态类型系统的限制。然而,“你自己的框架”不会被机器检查,这意味着你有责任确保你的“类型系统”安全并且你的代码“类型化”良好。

我从学习 Haskell 中发现的一件事是,我可以将有关强类型和合理推理的经验教训转移到弱类型语言(例如 C 甚至汇编)中,并自己进行“类型检查”。也就是说,通过牢记我的函数和值应该遵循的规则,以及我可以对其他函数和值做出的假设,我可以证明代码部分本身是正确的。调试的时候,我会再去一遍又一遍地检查,思考我的做法是否正确。

底线:动态输入让您触手可及的灵活性。另一方面,静态类型语言往往更高效(提高了几个数量级),并且良好的静态类型系统让计算机为您完成大部分工作,从而大大减少了调试时间。如果您想要两者的好处,请通过学习体面的强类型语言来在您的大脑中安装一个静态类型检查器

In order for a statically-typed language to match the flexibility of a dynamically-typed one, I think it would need a lot, perhaps infinitely many, features.

In the Haskell world, one hears a lot of sophisticated, sometimes to the point of being scary, teminology. Type classes. Parametric polymorphism. Generalized algebraic data types. Type families. Functional dependencies. The Ωmega programming language takes it even further, with the website listing "type-level functions" and "level polymorphism", among others.

What are all these? Features added to static typing to make it more flexible. These features can be really cool, and tend to be elegant and mind-blowing, but are often difficult to understand. Learning curve aside, type systems often fail to model real-world problems elegantly. A particularly good example of this is interacting with other languages (a major motivation for C# 4's dynamic feature).

Dynamically-typed languages give you the flexibility to implement your own framework of rules and assumptions about data, rather than be constrained by the ever-limited static type system. However, "your own framework" won't be machine-checked, meaning the onus is on you to ensure your "type system" is safe and your code is well-"typed".

One thing I've found from learning Haskell is that I can carry lessons learned about strong typing and sound reasoning over to weaker-typed languages, such as C and even assembly, and do the "type checking" myself. Namely, I can prove that sections of code are correct in and of themselves, by bearing in mind the rules my functions and values are supposed to follow, and the assumptions I am allowed to make about other functions and values. When debugging, I go through and check things again, and think through whether or not my approach is sound.

The bottom line: dynamic typing puts more flexibility at your fingertips. On the other hand, statically-typed languages tend to be more efficient (by orders of magnitude), and good static type systems drastically cut down on debugging time by letting the computer do much of it for you. If you want the benefits of both, install a static type checker in your brain by learning decent, strongly-typed languages.

爱已欠费 2024-10-14 12:24:41

有时数据需要验证。验证从网络接收到的任何数据几乎总是一个好主意——尤其是来自公共网络的数据。在这里偏执是好事。如果类似静态类型系统的东西可以以最不痛苦的方式帮助做到这一点,那就这样吧。 Erlang 允许类型注释是有原因的。甚至模式匹配也可以被视为一种动态类型检查;尽管如此,它是该语言的一个核心特征。在 Erlang 中,数据的真正结构就是它的“类型”。

好处是,您可以根据需要定制“类型系统”,使其灵活且智能,而 OO 语言的类型系统通常具有固定的功能。当您使用的数据结构是不可变的时,一旦验证了这种结构,您就可以安全地假设它符合您的限制,就像静态类型一样。

准备好在程序的任何点处理任何类型的数据(无论是否动态类型)都是没有意义的。 “动态类型”本质上是所有可能类型的联合;将其限制为有用的子集是一种有效的编程方法。

Sometimes data need validation. Validating any data received from the network is almost always a good idea — especially data from a public network. Being paranoid here is only good. If something resembling a static type system helps this in the least painful way, so be it. There's a reason why Erlang allows type annotations. Even pattern matching can be seen as just a kind of dynamic type checking; nevertheless, it's a central feature of the language. The very structure of data is its 'type' in Erlang.

The good thing is that you can custom-tailor your 'type system' to your needs, make it flexible and smart, while type systems of OO languages typically have fixed features. When data structures you use are immutable, once you've validated such a structure, you're safe to assume it conforms your restrictions, just like with static typing.

There's no point in being ready to process any kind of data at any point of a program, dynamically-typed or not. A 'dynamic type' is essentially a union of all possible types; limiting it to a useful subset is a valid way to program.

只涨不跌 2024-10-14 12:24:41

静态类型语言在编译时检测类型错误。动态类型语言在运行时检测它们。对于用静态类型语言编写的内容有一些适度的限制,以便可以在编译时捕获所有类型错误。

但是,是的,即使在动态类型语言中,您仍然拥有类型,这是一件好事。问题是您进行了大量的运行时检查,以确保您拥有您认为的类型,因为编译器没有为您处理这些问题。

Erlang 有一个非常好的工具来指定和静态验证大量类型 - dialyzer:Erlang 类型系统,供参考。

因此,不要重新发明类型,使用 Erlang 已经提供的类型工具来处理程序中已经存在的类型(但您尚未指定)。

不幸的是,这本身并不能消除范围检查。如果没有很多特殊的调味料,您确实必须按照约定自行强制执行此操作(以及智能构造函数等来提供帮助),或者退回到运行时检查,或者两者兼而有之。

A statically typed language detects type errors at compile time. A dynamically typed language detects them at runtime. There are some modest restrictions on what one can write in a statically typed language such that all type errors can be caught at compile time.

But yes, you still have types even in a dynamically typed language, and that's a good thing. The problem is you wander into lots of runtime checks to ensure that you have the types you think you do, since the compiler hasn't taken care of that for you.

Erlang has a very nice tool for specifying and statically verifying lots of types -- dialyzer: Erlang type system, for references.

So don't reinvent types, use the typing tools that Erlang already provides, to handle the types that already exist in your program (but which you haven't yet specified).

And this on its own won't eliminate range checks, unfortunately. Without lots of special sauce you really have to enforce this on your own by convention (and smart constructors, etc. to help), or fall back to runtime checks, or both.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文