结构化数据验证的依赖类型

发布于 2024-12-08 21:47:42 字数 586 浏览 12 评论 0原文

首先，我真的不知道依赖类型有什么问题，也不知道为什么我们没有看到它们在现有语言中实现用于实际编程，而不是发明各种技巧（模式！）来绕过依赖类型的限制当前的类型系统充其量具有非常简单和有限的泛化能力。

但我的问题是关于数据的依赖类型而不是程序，我们如何或可以使用它们进行结构化数据验证？这意味着，像 json 或 xml 或任何类型的结构化数据一样，是否可以使用某些依赖类型系统有效地验证它们？

编辑：

我所说的依赖类型是指最广泛的定义“依赖于值的类型”，而不需要那些定理证明者和 CoC 工作人员。我不认识他们，我不想走那条路，我不相信这些是获得体面的依赖类型的唯一或“最终”方法。在 FP 中，程序员每天都以非常优雅、有建设性的方式编写最复杂的逻辑，而且非常简单，没有任何问题。我相信他们将拥有最终的“优雅”依赖打字。

然而，我的问题是关于纯 Data ，与代码不同，在代码中，大量检查可能是不必要的，并且可以隐藏在程序流和逻辑中，甚至动态类型也可以这样正常工作。在数据中，当您想要检查某些文档的正确性并给出明确的错误消息时，情况并非如此。另一方面，当您必须在非常极端的依赖类型系统（CoC 系列）中处理“函数”时，数据不存在复杂性问题。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓦然回首 2024-12-15 21:47:42

您可能会对这篇论文感兴趣：
未来 700 种数据描述语言 (PDF)，Kathleen Fisher、Yitzhak Mandelbaum 和 David Walker， 2006年。

本文的主要目标是开始了解这个家庭
临时数据处理语言。我们这样做，就像兰丁所做的那样，
开发一个语义框架来定义、比较和
我们领域内的对比语言。这个语义框架围绕着
围绕数据描述微积分 (DDC^α) 的定义。这
微积分使用依赖类型理论中的类型来描述各种
即席数据的形式：描述原子数据片段的基本类型和
类型构造函数来描述更丰富的结构。我们展示如何给予
通过将类型解释为解析来实现 DDC^α 的指称语义
将外部表示（位）映射到数据结构的函数
在类型化 lambda 演算中。更准确地说，这些解析器产生
外部数据和解析描述符的内部表示
找出原始来源中的错误。

简而言之：是的，如果您想对数据的细粒度不变量进行静态编码，则依赖类型是必要的。它们比代数数据类型和 GADT 更具表现力，还允许表达它们和相关结构（例如未标记联合和标记乘积的组合），在某种意义上能够成为数据描述的汇编语言，甚至如果面向用户的规范不直接公开术语依赖关系。

然而要注意的是，这种正式的方法是以更陡峭的学习曲线和更高的前期复杂性为代价的，即使理论上它会以更简单、更安全、更好的规范、操作工具等来回报。该领域的从业者常常会忽视所有类型系统的美感，而求助于不明确的替代方案。 XML 正在输给 JSON，还有其他原因，因为指定模式很无聊，而且人们看不到它们带来的优势。是的，您可以稍后指定所采用的 JSON API 的静态结构（并且您很可能需要依赖类型来执行此操作，因为复杂性很容易渗透到这种演进而不是设计的格式中），但这在以下情况下没有什么用处：没有人关心它、使用它、理解它，更重要的是，维护它。

（关于你的恶搞介绍的第二个方面：请继续玩 ATS，Guru 或 Agda，它们是为了相对实用的编程而设计的，如果你想走弗兰肯斯坦路线，还有SHE; Coq 不是设计为“对软件开发实用”，但是众所周知，这种方式会被滥用——我不建议将其用于依赖类型编程，但它对于不太依赖的编程以及附带的正确性证明很有好处——如果你想出售你的灵魂还有F*即将推出。）

You would probably be interested in this paper :
The Next 700 Data Description Languages (PDF), Kathleen Fisher, Yitzhak Mandelbaum and David Walker, 2006.

The primary goal of this paper is to begin to understand the family of
ad hoc data processing languages. We do so, as Landin did, by
developing a semantic framework for defining, comparing, and
contrasting languages in our domain. This semantic framework revolves
around the definition of a data description calculus (DDC^α). This
calculus uses types from a dependent type theory to describe various
forms of ad hoc data: base types to describe atomic pieces of data and
type constructors to describe richer structures. We show how to give
a denotational semantics to DDC^α by interpreting types as parsing
functions that map external representations (bits) to data structures
in a typed lambda calculus. More precisely, these parsers produce both
internal representations of the external data and parse descriptors
that pinpoint errors in the original source.

In short : yes, dependent types are necessary if you want to statically encode fine-grained invariants about your data. Being more expressive than algebraic data types and GADTs, they also allow to express them and related constructs (such as the combination of untagged union and tagged product), having the ability to be, in some kind, the assembly language of data description, even if the user-facing specification doesn't expose term dependencies directly.

Beware, however, that such a formal approach comes at the cost of a steeper learning curve and higher upfront complexity, even if in theory it pays back with easier, safer, better specifications, manipulations tools and such. The practitioners in the field will more often than not neglect all that type system beauty, and fall back on ill-specified alternatives. XML is losing to JSON, in other reasons because specifying schemas is boring and people don't see what advantages they bring. Yes, you can later specify the static structure of the adopted JSON API (and you may well need dependent types to do that, as complexity easily crept into such evolution-rather-than-design formats), but this is only of little use if nobody cares about it, use it, understand it and, more importantly, maintain it.

(On a secondary front regarding your trollish introduction : please go forward and play with ATS, Guru or Agda, they are meant for relatively practical programming. If you want to go the frankenstein route, there is also SHE; Coq is not designed to be "practical for software development" but has been known to be abused this way -- I would not advise it for dependently typed programming, but it's good for not-very-dependent programming plus accompanying correctness proofs -- and if you want to sell your soul there is also F* coming soon.)

回复收藏 0 原文

百变从容 2024-12-15 21:47:42

我认为语言中依赖类型的主要问题是它使类型检查（更不用说类型推断）成为一个无法判定的问题。这意味着对于某些程序，编译器将被迫进入无限循环，仅在进行编译之前评估类型。

如果类型检查器的行为可以变得简单直观，那么这不一定会破坏交易。在编译时避免无限循环可能成为这种语言编程的另一部分，就像在运行时避免无限循环一样。

然而，这将有效地为您的编程添加一个“编译时层”，以与正常程序完全不同的模式进行完整的图灵完备计算。我相信这已经可以在某些语言中做到了；我相信 C++ 模板元编程是图灵完备的，并且根据我读过的一篇博客，Scala 的类型系统可以对 SKI 演算进行编码。

但实际上对这样的系统进行编程是非常可怕的。在具有良好类型系统的编程语言中，类型系统的存在是为了让编程变得更容易；它强制执行不变量，当程序的一个部分的更改影响另一部分时提醒我们，并提供有关如何使用库代码的文档（人们通常可以从名称和类型中准确地猜测 Haskell 函数的作用）。但是，当您在类型系统中进行编程时，您将得不到这种帮助，除非有人添加元类型系统。同样，这些实际上存在于某些语言中（我相信 Haskell 和 Scala 都有类型种类的概念），但它们的设计目的并不像常规类型那样与类型系统元编程具有相同的关系系统必须定期编程。

就我个人而言，我正在等待是否会开发出一种语言来尝试以可用的方式解决所有这些问题。不过我一点也不奇怪为什么这样的东西还不存在；这是一系列很难解决的问题！

回复收藏 0 原文

~没有更多了~