当前位置：文江博客话题详情

“私人语言WTF”的区别是什么？仅仅是因为库/API/DSL 设计不好？

发布于 2024-08-20 15:03:40 字数 995 浏览 1 评论 0原文

一些最有趣的 Daily WTF 故事以私人语言横行为特色。然而，特定于领域的语言可以相当强大并且似乎越来越流行。当然，如果没有好的库，我们根本无法编程，但正如谚语所说，“库设计就是语言设计”。

没有人喜欢糟糕的 API，但差异只是程度之一，还是 WTF 是完全不同的物种？显然这是主观的，所以我将其作为社区维基帖子。（众所周知，Stackoverflow 联合创始人对于一种特定的内部语言是否均匀有不同的看法。 WTF 与否。）

我自己的直觉是，正是对通用性的尝试才导致了 WTF 的出现，但我想看看其他人的想法。

（这个问题是通过阅读 JaredPar 对这个问题的回答的评论而触发的：https:// stackoverflow.com/questions/901320/anti-joel-test/901361#901361）

（为了澄清一点，术语“私有语言”通常带有负面含义，而“DSL”或“库”是中立的。除了可能使其成为坏工具的常见因素之外，“内部”工具是否会跨越哪些界限而被嘲笑为可怕的“私人语言”？是关于一种语言；它可以是一个库或框架。）

最终编辑：我接受了罗杰·佩特的回答“本质上？什么都没有”。因为我认为这对于我提出的问题来说实际上是正确的。不过，我想强调 Aaronaught 关于 DSL 的回答，因为我认为它特别好。谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

左耳近心 2024-08-27 15:03:41

我没有构建很多 DSL，但我对它们有一些经验，并且我相信对此有一个通用的答案，尽管事实是每种情况都是不同的。

请记住，缩写词 DSL 代表域-特定语言。

当 DSL 不再具体时，它就不再有用。我相信大多数 DSL 恐怖故事（又名“私有语言”）都围绕着 DSL，它们只是试图做太多事情。在某些情况下，它们甚至可能尝试实现图灵完备，此时它们只不过是功能失调的编程语言。

我在下面列出了一些现实生活中的例子；跳到最后查看 tl;dr 版本。

我自己的经验中的一个例子是设备之间或 PC 和外部设备之间的消息传递系统。如果您想象一个面向对象的 API，您最终可能会得到如下所示的代码：

public abstract class Message
{
    public byte[] GetBytes()
    {
        using (MemoryStream ms = new MemoryStream())
        {
            byte[] result = new byte[ms.Length + 3];
            result[0] = 0xFF;
            result[1] = (byte)ms.Length;
            WriteMessageData(result, 2);
            result[result.Length - 1] = GetChecksum(result, 0,
                result.Length - 2);
            return result;
        }
    }

    protected abstract void WriteMessageData(byte[] buffer, int offset);
}

不要太关注这个细节，或者代码有多漂亮。我不知道，我们的想法是，我们有 30 种不同类型的消息要发送，它们完全不同，但共享一些通用功能，例如内容长度标头和校验和。现在我们必须开始构建消息：

public class AddMessage : Message
{
    private const byte id = 0x9F;

    protected override void WriteMessageData(byte[] buffer, int offset)
    {
        buffer[offset] = id;
        MessageUtil.WriteInt32(buffer, offset + 1, Num1);
        MessageUtil.WriteInt32(buffer, offset + 5, Num2);
    }

    public int Num1 { get; set; }
    public int Num2 { get; set; }
}

同样，不要过多考虑消息的细节。它做什么并不重要。关键是，我们必须为其编写一个类。我们必须重写一些功能。我们不必编写很多代码，但我们必须编写一些代码。我不知道你怎么想，但编写 30 个这样的一次性小课程的想法对我来说似乎很令人麻木。

但我们还没有完成。我们必须创建消息、发送消息并接收结果：

public int Add(int num1, int num2)
{
    AddMessage msg = new AddMessage();
    msg.Num1 = num1;
    msg.Num2 = num2;
    MessagingSystem.SendMessage(msg);
    AddResultMessage result = MessageSystem.Receive<AddResultMessage>();
    if (result == null)
    {
        throw new InvalidResultException("AddResultMessage");
    }
    return result.Sum;
}

等等等等。这是最好的情况。我们正在公开一个方便的小 API，但我们必须继续编写这些类和方法才能做到这一点。随着消息数量增长到 10、20、50、100、1000...，它开始变得有点荒谬。

如果我们可以在某处写下一些“消息定义”，而不是编写所有这些样板文件，那不是很好吗？

Message(Add)
    Send: Num1 int, Num2 int
    Receive: Sum int

Message(Multiply)
    Send: Num1 int, Num2 int
    Receive: Product int

Message(Divide)
    Send: Divisor int, Dividend int
    Receive: Quotient int, Remainder int

好的，当然，您可以在数据文件中定义它，并使用一些笨拙的代码，其中大部分验证和实际逻辑发生在运行时。但我们真正想要的是将这些数据编译为我们可以实际编写代码、编译应用程序、获得编译时类型安全性和可测试性的东西。我们希望直接从上面的代码转到下面的代码，而不做任何额外的工作：

MyMessagingSystem ms = new MyMessagingSystem();
int sum = ms.Add(3, 4).Sum;
int product = ms.Multiply(5, 6).Product;
DivideResult = ms.Divide(10, 5);  // Contains Quotient and Remainder properties

现在，如果我们稍微挥挥手，忘记 DSL 是如何编译的（其实这并不那么困难，我已经做到了），我们消除了大约 20 行繁琐且容易出错的 OO 代码，转而采用大约 3 行易于理解的 DSL 代码。

我曾经参与过这样的项目。有很多消息。完善 DSL 和代码生成花了一些时间，但一旦完成，它为我节省了几个小时 - 不，几天的精力，编写和调试只是做同样事情的乏味无用代码反复。

那么为什么（在我看来）这是一个“好的”DSL？因为它是具体的。它只做一件事：它定义了一系列相似但仍然独立的消息的格式，我希望能够为其生成强类型类。

该 DSL 的一个关键方面是没有用户定义的逻辑。它定义了整个应用程序的一个非常狭窄的方面，特别是消息中的内容以及一些发送/接收配对。它没有说明如何对消息进行编码或如何发送消息。它没有说明消息的语义或应按什么顺序发送特定消息。它没有说明任何给定消息字段的有效值或应如何处理错误。

当然，所有这些前面提到的“附加功能”都可以在DSL中实现；但是我们添加的越多，我们拿走的也就越多。语言变得越复杂，它实际上的“特定领域”就越少。一个糟糕的 DSL（同样，在我看来）看起来像这样：

Event: PaymentReceived(Payment)
    Validation:
        Condition: Amount > 0, "Invalid payment amount"
        Condition: Date > Today - 7d, "Cannot backdate > 7 days"
    Actions:
        Update: Account(AccountID)
            SetProperty: Account.Balance, Account.Balance - Payment.Amount
            SetProperty: Account.LastPaymentDate, Payment.Date
        Notify: Billing
            Template: PaymentReceived.xlt
                Field: CustName, CustomerName
                Field: PaymentAmount, Amount
                Field: PaymentDate, Date

等等，我不打算详细说明这一点。这看起来看似简单，但功能强大却极具诱惑力。嘿，看看更改验证是多么容易！

但这容易吗？真的吗？明天，某个经理确定有些客户在银行里从来没有钱；他们的支票总是被退回，我们希望拒绝他们的此类付款。很简单，只需添加一个标志，对吧？但是我们如何添加这种类型的验证呢？我们必须查找有关客户的一些信息，就目前情况而言，验证语法仅用于处理付款本身的验证。因此，我们必须对 DSL 进行某种黑客式的更新来适应它：

Event: PaymentReceived(Payment)
    Validation:
        Condition: All(
            PaymentType = Cheque,
            Account(Payment.AccountID).DelinquentFlag = False
        ), "Cheques no longer allowed for this customer"

很可爱，尽管一些以前经历过这种情况的人可能已经开始明白“呃哦...” 感觉。第二天，经理说：嘿，验证效果很好，但我们也希望为此发送一条通知。

好吧，我们并没有真正在 DSL 中构建条件通知，但我想我们可以添加它们：

        Notify: Management
        Condition: All(
            PaymentType = Cheque,
            Account(Payment.AccountID).DelinquentFlag = False
        )
            Template: DelinquentCheque.xlt
                Field: CustName, CustomerName,
                ...

这里发生了什么？这组“简单”的条件和操作开始看起来相当难看。不仅如此，我们现在还在复制和粘贴。我们正在尝试在从未设计用于处理这些复杂条件的领域中处理这些复杂条件，并且 DSL 确实没有可重用的设施。

但这还不是故事的全部。这里真正的问题是什么？

真正的问题是这个 DSL 描述的是一个复杂的过程。它读起来不像属性集合，而是读起来像一组指令，而且我们已经有了一个用于编写通用指令的工具，它被称为编程语言。我将把细节作为练习留给读者，但在这一点上应该很明显，在对上面的规范进行更多“修改”之后，用正常的 < 重写可能会更容易em>通用语言。

另一个真正的问题是，这个 DSL 似乎是供非技术用户而不是程序员使用的，但它最终会变得过于复杂，除了程序员之外，任何人都无法维护。流程并不简单。这就是为什么人们聘请我们来分析和编码并解决所有小的不一致之处。根据我所看到和阅读的内容，旨在供非技术用户使用的 DSL 通常不仅不会被所述用户使用，而且对于程序员来说也很难维护，因为它们不是对于程序员需要做的事情来说还不够复杂。

当然，上面的这个例子在技术上是一种“领域特定语言”，但它并没有比仅仅拥有一个记录良好的领域模型和 API 增加任何价值。它把各种不同的概念混在一起，严重违反了凝聚力原则。每次我们需要添加一个新功能时，我们都需要开始研究 DSL 语法，而不是仅仅添加几行代码。这确实让我们的生活变得更加困难，而不是更容易。 “通用业务流程语言”似乎注定会不断发展壮大，直到它成为 Turbo Pascal 1.0 的苍白模仿品。

tl;dr 版本：

那么，是什么让特定领域语言成为“WTF”？根据我的经验，它是：

并非真正特定于领域。该设计似乎采用了非常自由的定义“域”，“特定” “，或两者兼而有之。
针对最终用户而不是开发人员。人们很容易将 DSL 视为“前端”API，许多 DSL 教程甚至似乎暗示这是一个合适的用例。也许是这样，但如果是这样，我个人还没有亲眼目睹过。
定义抽象流程。只有当潜在条件和操作是严格定义的超集的一部分时，DSL 才真正适用于流程定义。大多数业务流程根本不是这样的；它们充满了高度复杂的条件和/或顺序逻辑。它们反映了人类变化无常、冲动的思想和行为，而不是计算机系统的具体规范。
添加编程惯用语。如果您发现自己甚至在思考诸如循环、子例程、继承之类的概念，那么是时候退一步问一下了DSL 真正实现的目标是什么。

哇，写了这么多。祝贺所有走到这一步的人！

I don't build a lot of DSLs, but I've had a bit of experience with them and I believe that there is a general answer for this, although the truth is that every situation is different.

Remember that the acronym DSL stands for Domain-Specific Language.

A DSL stops being useful when it is no longer specific. I believe that the majority of DSL horror stories (AKA "private languages") revolve around DSLs that simply try to do too many things. In some cases they may even try to be Turing-complete, at which point they're not much more than dysfunctional programming languages.

I'm including some real-life examples below the fold; skip to the end for the tl;dr version.

One example from my own experience is that of a messaging system between devices, or betweeen a PC and an external device. If you imagine an object-oriented API, you might end up with code that looks like this:

public abstract class Message
{
    public byte[] GetBytes()
    {
        using (MemoryStream ms = new MemoryStream())
        {
            byte[] result = new byte[ms.Length + 3];
            result[0] = 0xFF;
            result[1] = (byte)ms.Length;
            WriteMessageData(result, 2);
            result[result.Length - 1] = GetChecksum(result, 0,
                result.Length - 2);
            return result;
        }
    }

    protected abstract void WriteMessageData(byte[] buffer, int offset);
}

Don't get too hung up on the specifics of this, or how beautiful the code is(n't). The idea is that we have, I don't know, 30 different types of messages to send that are all completely different but share some common functionality, like a content length header and a checksum. Now we have to start building the messages:

public class AddMessage : Message
{
    private const byte id = 0x9F;

    protected override void WriteMessageData(byte[] buffer, int offset)
    {
        buffer[offset] = id;
        MessageUtil.WriteInt32(buffer, offset + 1, Num1);
        MessageUtil.WriteInt32(buffer, offset + 5, Num2);
    }

    public int Num1 { get; set; }
    public int Num2 { get; set; }
}

Again, don't think too hard about the details of the message. It doesn't really matter what it does. The point is, we had to write a class for it. We had to override some functionality. We didn't have to write a lot of code, but we had to write some code. I don't know about you, but the thought of writing 30 of these little one-off classes seems mind-numbing to me.

But we're not even done yet. We have to create the message, send it, and receive the result:

public int Add(int num1, int num2)
{
    AddMessage msg = new AddMessage();
    msg.Num1 = num1;
    msg.Num2 = num2;
    MessagingSystem.SendMessage(msg);
    AddResultMessage result = MessageSystem.Receive<AddResultMessage>();
    if (result == null)
    {
        throw new InvalidResultException("AddResultMessage");
    }
    return result.Sum;
}

Blah blah blah, whatever. This is kind of a best-case scenario. We're exposing a convenient little API but we have to keep writing these classes and methods to do it. As the number of messages grows to 10, 20, 50, 100, 1000... it starts to become a little ridiculous.

Wouldn't it be nice if instead of writing all of this boilerplate, we could just write down some "message definitions" somewhere?

Message(Add)
    Send: Num1 int, Num2 int
    Receive: Sum int

Message(Multiply)
    Send: Num1 int, Num2 int
    Receive: Product int

Message(Divide)
    Send: Divisor int, Dividend int
    Receive: Quotient int, Remainder int

OK, sure, you can define this in a data file and use some kludgy code where most of the validation and actual logic happens at runtime. But what we really want is to compile this data into something we can actually write code against, compile an application against, get compile-time type safety and testability. We want to go directly from the code above to the code below without doing any additional work:

MyMessagingSystem ms = new MyMessagingSystem();
int sum = ms.Add(3, 4).Sum;
int product = ms.Multiply(5, 6).Product;
DivideResult = ms.Divide(10, 5);  // Contains Quotient and Remainder properties

Now if we wave our hands a little and forget about how the DSL is compiled (and it's not really that difficult, I've done it), we've eliminated about 20 lines of tedious error-prone OO code in favour of about 3 lines of easy-to-understand DSL code.

I've worked on a project like this. There were a lot of messages. It took a little while to perfect the DSL and code generation, but once it was done, it saved me hours - no, days of effort, of writing and debugging tedious useless code that just does the same thing over and over again.

So why is this (in my opinion) a "good" DSL? Because it's specific. It does exactly one thing: It defines the format of a series of similar but still independent messages that I want to be able to generate strongly-typed classes for.

A key aspect of this DSL is that there is no user-defined logic. It's defining a very narrow aspect of the overall application, specifically, what goes in a message, and some send/receive pairings. It says nothing about how to encode the message or how to send it. It says nothing about the semantics of a message or what order specific messages should be sent in. It says nothing about the valid values for any given message field or how errors should be handled.

Of course, all of these aforementioned "additional features" can be implemented in the DSL; but the more we add, the more we take away. The more complicated the language becomes, the less "domain-specific" it really is. A lousy DSL (again, IMO) looks like this:

Event: PaymentReceived(Payment)
    Validation:
        Condition: Amount > 0, "Invalid payment amount"
        Condition: Date > Today - 7d, "Cannot backdate > 7 days"
    Actions:
        Update: Account(AccountID)
            SetProperty: Account.Balance, Account.Balance - Payment.Amount
            SetProperty: Account.LastPaymentDate, Payment.Date
        Notify: Billing
            Template: PaymentReceived.xlt
                Field: CustName, CustomerName
                Field: PaymentAmount, Amount
                Field: PaymentDate, Date

And so on and so forth, I'm not going to belabor the point. This looks deceptively simple and seductively powerful. Hey, look how easy it is to change the validation!

But is it easy? Is it really? Tomorrow, some manager determines that some customers never have money in the bank; their cheques always bounce and we want to reject payments of that type from them. Easy, just add a flag, right? But how do we add this type of validation? We have to look up some piece of information about the customer, and as it stands, the Validation grammar is only equipped to handle validation on the Payment itself. So we have to come up with some sort of hackish update to the DSL in order to accommodate it:

Event: PaymentReceived(Payment)
    Validation:
        Condition: All(
            PaymentType = Cheque,
            Account(Payment.AccountID).DelinquentFlag = False
        ), "Cheques no longer allowed for this customer"

Cute, although some people who have experienced this before are probably already starting to get that "uh oh..." feeling. The next day, the manager says: Hey, validation's working great, but we want a notification to get sent for this as well.

Well, we didn't really build conditional notifications into the DSL, but I guess we can add them:

        Notify: Management
        Condition: All(
            PaymentType = Cheque,
            Account(Payment.AccountID).DelinquentFlag = False
        )
            Template: DelinquentCheque.xlt
                Field: CustName, CustomerName,
                ...

What's going on here? This "simple" set of conditions and actions is starting to look pretty ugly. Not only that, but we're now copying and pasting. We're trying to handle these complex conditions in areas that were never designed to handle them and the DSL really has no facility for re-use.

But that's not the whole story. What's the real problem here?

The real problem is that this DSL is describing a complex process. It doesn't read like a collection of attributes, it reads like a set of instructions, and we already have a tool for writing general-purpose instructions, it's called a programming language. And I'll leave the details as an exercise to the reader, but it should be pretty obvious at this point that after a few more "revisions" to our spec above, it's probably going to be easier to just rewrite in a normal, general-purpose language.

The other real problem is that this DSL appears to be intended for consumption by non-technical users, not programmers, and yet it will eventually become far too complicated for anyone but a programmer to maintain. Processes aren't simple. That's why people hire us to analyze and code them and work out all the little inconsistencies. From what I've seen and read, DSLs intended for use by non-technical users usually end up not only not being used by said users, but also being very difficult for programmers to maintain, because they aren't sophisticated enough for the kinds of things a programmer needs to do.

Sure, this example above is technically a "domain specific language", but it's not adding any value over just having a well-documented domain model and API. It's mashing together all sorts of different concepts and massively violating the principle of cohesion. Every time we need to add a new feature, we need to start screwing around with the DSL grammar instead of just adding a few lines of code. It's really making our lives harder, not easier. The "generic business process language" seems destined to grow and grow until it becomes a pale imitation of Turbo Pascal 1.0.

tl;dr version:

So, what makes a Domain-Specific Language a "WTF?" In my experience, it's:

Not being truly domain-specific. The design appears to employ a very liberal definition of either "domain", "specific", or both.
Targeting end-users rather than developers. It's very tempting to think of a DSL as a "front-end" API and many DSL tutorials even seem to hint at this being an appropriate use case. Maybe it is, but if so, I personally haven't witnessed it.
Defining an abstract process. DSLs only really work for process definitions when the potential conditions and actions are part of a rigidly-defined superset. Most business processes are not like this at all; they are full of highly complex conditional and/or sequential logic. They reflect the fickle, impulsive thoughts and behaviour of humans, not the concrete specifications of a computer system.
Adding programming idioms. If you find yourself even thinking about concepts like loops, subroutines, inheritance, that sort of thing, it's time to take a step back and ask what the DSL is really accomplishing.

Wow, that was a lot of writing. Congratulations to anyone who got this far!

回复收藏 0 原文

但可醉心 2024-08-27 15:03:41

本质上？没什么。（我把你理解为“糟糕的语言和糟糕的库/API 之间有什么区别”，而不是“糟糕的语言/库/API 和好的语言/库/API 之间有什么区别” .)

足够大的库会像语言的方言一样使用它，特别是在某些语言中（考虑 MFC、Qt、GTK 等），

低于该大小阈值的库要么不经常使用，要么被足够多的人使用。人们会抽出时间来标记它，或者将其描述为 API 是愚蠢的，即使该术语在技术上是正确的（想象一个库只有一个函数，或者没有函数，只是作为有用类型的集合，例如。）

您可以得出的唯一区别是不会尝试破坏语言的库/API 不会被称为“私有语言”或“DSL”，例如 Qt 需要（是吗？仍然吗？它曾经非常强烈地这样做）一个特殊的预处理器将其添加到C++中，并且在那时它超出了“方言”状态。 MFC 也使用宏走同样的路线。

回复收藏 0 原文

轮廓§ 2024-08-27 15:03:41

我认为这与最小意外原则有关。精心设计的 DSL 和 API 也完全符合您的期望（或者大部分时间都是如此）。如果您使用一个好的 API（并且您很聪明且经验丰富），您会发现自己说，“它应该有一个内置的方法来做这样那样的事情”，你瞧，API 开发人员也在想同样的事情。糟糕的 API/私有语言会产生意想不到的或不一致的行为，让简单的事情变得困难。

回复收藏 0 原文

~没有更多了~