如何消除编程语言中对象/类之间的运算符定义的歧义？

发布于 2024-10-21 21:47:11 字数 792 浏览 9 评论 0原文

我正在设计自己的编程语言（称为 Lima，如果您关心的话，请访问 www.btetrud.com），并且我正在尝试了解如何实现运算符重载。我决定将运算符绑定到特定对象（它是基于原型的语言）。（它也是一种动态语言，其中“var”就像 javascript 中的“var”——可以保存任何类型值的变量）。

例如，这将是一个带有重新定义的 + 运算符的对象：

x = 
{  int member

   operator + 
    self int[b]:
       ret b+self
    int[a] self:
       ret member+a
}

我希望它的作用相当明显。当 x 既是右操作数又是左操作数时定义该运算符（使用 self 来表示）。

问题是当您有两个对象以像这样的开放式方式定义运算符时该怎么办。例如，在这种情况下你会做什么：

A = 
{ int x
  operator +
   self var[b]:
    ret x+b
}

B = 
{ int x
  operator +
   var[a] self:
    ret x+a
}

a+b   ;; is a's or b's + operator used?

所以这个问题的一个简单答案是“好吧，不要做出模棱两可的定义”，但它并不那么简单。如果您包含一个具有 A 类型对象的模块，然后定义了 B 类型对象，该怎么办？

如何创建一种语言来防止其他对象劫持您想要对操作员执行的操作？

C++ 将运算符重载定义为类的“成员”。 C++ 如何处理这样的二义性？

原文

I'm designing my own programming language (called Lima, if you care its on www.btetrud.com), and I'm trying to wrap my head around how to implement operator overloading. I'm deciding to bind operators on specific objects (its a prototype based language). (Its also a dynamic language, where 'var' is like 'var' in javascript - a variable that can hold any type of value).

For example, this would be an object with a redefined + operator:

x = 
{  int member

   operator + 
    self int[b]:
       ret b+self
    int[a] self:
       ret member+a
}

I hope its fairly obvious what that does. The operator is defined when x is both the right and left operand (using self to denote this).

The problem is what to do when you have two objects that define an operator in an open-ended way like this. For example, what do you do in this scenario:

A = 
{ int x
  operator +
   self var[b]:
    ret x+b
}

B = 
{ int x
  operator +
   var[a] self:
    ret x+a
}

a+b   ;; is a's or b's + operator used?

So an easy answer to this question is "well duh, don't make ambiguous definitions", but its not that simple. What if you include a module that has an A type of object, and then defined a B type of object.

How do you create a language that guards against other objects hijacking what you want to do with your operators?

C++ has operator overloading defined as "members" of classes. How does C++ deal with ambiguity like this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ま昔日黯然 2024-10-28 21:47:11

大多数语言都会优先考虑左边的类。我相信，C++ 根本不允许您重载右侧的运算符。当您定义operator+时，您正在为该类型位于左侧的情况以及右侧的任何内容定义加法。

事实上，如果您允许运算符 + 在类型位于右侧时工作，那就没有意义了。它适用于+，但请考虑-。如果类型 A 以某种方式定义了 operator -，并且我执行了 int x - A y，我不希望调用 A 的 operator -，因为它会计算减法相反！

在具有更广泛的运算符重载规则的Python中，有一个单独的方法相反的方向。例如，当该类型位于左侧时，有一个 __sub__ 方法会重载 - 运算符，而当该类型位于右侧时，有一个 __rsub__ 方法会重载 - 运算符。这类似于在你们的语言中允许“自我”出现在左侧或右侧的功能，但它引入了歧义。

Python 优先考虑左边的内容——这在动态语言中效果更好。如果Python遇到x - y，它首先调用x.__sub__(y)来查看x是否知道如何减去y< /代码>。这可以产生一个结果，或者返回一个特殊值NotImplemented。如果 Python 发现返回了 NotImplemented，它就会尝试其他方法。它调用y.__rsub__(x)，在编程时就知道y位于右侧。如果也返回 NotImplemented，则会引发 TypeError，因为类型与该操作不兼容。

我认为这是动态语言理想的运算符重载策略。

编辑：为了给出一点总结，您遇到了一个不明确的情况，因此您实际上只有三个选择：

优先考虑一侧或另一侧（通常是左侧）。这可以防止具有右侧重载的类劫持具有左侧重载的类，但反之则不然。（这在动态语言中效果最好，因为这些方法可以决定它们是否可以处理它，并动态地遵循另一种语言。）
使其成为一个错误（正如 @dave 在他的回答中建议的那样）。如果有不止一种可行的选择，则会出现编译器错误。（这在静态语言中效果最好，您可以提前捕获这个东西。）
只允许最左边的类定义运算符重载，就像在 C++ 中一样。（那么你的 B 类将是非法的。）

唯一的其他选择是为运算符重载引入复杂的优先级系统，但你又说你想减少认知开销。

Most languages will give precedence to the class on the left. C++, I believe, doesn't let you overload operators on the right-hand side at all. When you define operator+, you are defining addition for when this type is on the left, for anything on the right.

In fact, it would not make sense if you allowed your operator + to work for when the type is on the right-hand side. It works for +, but consider -. If type A defines operator - in a certain way, and I do int x - A y, I don't want A's operator - to be called, because it will compute the subtraction in reverse!

In Python, which has more extensive operator overloading rules, there is a separate method for the reverse direction. For example, there is a __sub__ method which overloads the - operator when this type is on the left, and a __rsub__ which overloads the - operator when this type is on the right. This is similar to the capability, in your language, to allow the "self" to appear on the left or on the right, but it introduces ambiguity.

Python gives precedence to the thing on the left -- this works better in a dynamic language. If Python encounters x - y, it first calls x.__sub__(y) to see if x knows how to subtract y. This can either produce a result, or return a special value NotImplemented. If Python finds that NotImplemented was returned, it then tries the other way. It calls y.__rsub__(x), which would have been programmed knowing that y was on the right hand side. If that also returns NotImplemented, then a TypeError is raised, because the types were incompatible for that operation.

I think this is the ideal operator overloading strategy for dynamic languages.

Edit: To give a bit of a summary, you have an ambiguous situation, so you really only three choices:

Give precedence to one side or the other (usually the one on the left). This prevents a class with a right-side overload from hijacking a class with a left-side overload, but not the other way around. (This works best in dynamic languages, as the methods can decide whether they can handle it, and dynamically defer to the other one.)
Make it an error (as @dave is suggesting in his answer). If there is ever more than one viable choice, it is a compiler error. (This works best in static languages, where you can catch this thing in advance.)
Only allow the left-most class to define operator overloads, as in C++. (Then your class B would be illegal.)

The only other option is to introduce a complex system of precedence to the operator overloads, but then you said you want to reduce the cognitive overhead.

回复收藏 0 原文

少女七分熟 2024-10-28 21:47:11

我将通过说“呃，不要做出模棱两可的定义”来回答这个问题。

如果我用 C++ 重新创建您的示例（使用函数 f 而不是 + 运算符，使用 int/float 而不是 A/B，但确实没有太大区别）...

template<class t>
void f(int a, t b)
{
    std::cout << "me! me! me!";
}

template<class t>
void f(t a, float b)
{
    std::cout << "no, me!";
}

int main(void)
{
    f(1, 1.0f);
    return 0;
}

...编译器会准确地告诉我：错误 C2668：'f'：对重载函数的模糊调用

如果你创建了一种足够强大的语言，那么总是有可能在其中创建出没有意义的东西。当这种情况发生时，你可以举起双手说“这没有意义”。

I'm going to answer this question by saying "duh, don't make ambiguous definitions".

If I recreate your example in C++ (using a function f instead of the + operator and int/float instead of A/B, but there really isn't much difference)...

template<class t>
void f(int a, t b)
{
    std::cout << "me! me! me!";
}

template<class t>
void f(t a, float b)
{
    std::cout << "no, me!";
}

int main(void)
{
    f(1, 1.0f);
    return 0;
}

...the compiler will tell me precisely that: error C2668: 'f' : ambiguous call to overloaded function

If you create a language powerful enough, it's always going to be possible to create things in it that don't make sense. When this happens, it's probably ok to just throw up your hands and say "this doesn't make sense".

回复收藏 0 原文

怪我太投入 2024-10-28 21:47:11

在 C++ 中，op b 表示 a.op(b)，因此它是明确的；命令解决了它。在 C++ 中，如果要定义一个左操作数为内置类型的运算符，则该运算符必须是具有两个参数的全局函数，而不是成员；不过，操作数的顺序再次决定了调用哪个方法。定义两个操作数都是内置类型的运算符是非法的。

回复收藏 0 原文

白昼 2024-10-28 21:47:11

我建议给定 X + Y，编译器应该同时查找 X.op_plus(Y) 和 Y.op_add_to(X)；每个实现都应该包含一个属性，指示它是否应该是“首选”、“正常”、“后备”实现，并且还可以选择指示它是“通用”的。如果定义了两个实现，并且它们的实现具有不同的优先级（例如“首选”和“正常”），则使用类型来选择首选项。如果两者都定义为具有相同的优先级，并且两者都是“通用”，则倾向于使用 X.op_plus(Y) 形式。如果两者都定义为相同的优先级，并且它们不都是“通用的”，则标记一个错误。

我认为，恕我直言，对重载和转换进行优先级排序的能力对于语言来说是一个非常重要的功能。在两个候选者都会做同样的事情的情况下，语言对不明确的重载进行大声喧哗是没有帮助的，但是在两个可能的重载具有不同含义的情况下，语言应该大声喧哗，每个重载在某些上下文中都是有用的。例如，给定 someFloat==someDouble 或 someDouble==someLong，编译器应该发出警告，因为了解数值是否有用两个值表示的数量匹配，并且了解左侧操作数是否拥有右侧操作数中值的最佳可能表示（针对其类型）也很有用。 Java 和 C# 在这两种情况下都不会标记歧义，而是选择对第一个表达式使用第一个含义，对第二个表达式使用第二个含义，即使任一含义在这两种情况下都可能有用。我建议最好拒绝这种比较，而不是让它们实现不一致的语义。

总的来说，我建议作为一种哲学，良好的语言设计应该让程序员指出什么是重要的，什么是不重要的。如果程序员知道某些“歧义”不是问题，但其他“歧义”是问题，那么让编译器标记后者而不是前者应该很容易。

附录

我简要浏览了您的提案；它看到您期望绑定是完全动态的。我曾经使用过这样的语言（HyperTalk，大约 1988 年），它很“有趣”。例如，考虑“2X”＜1。 “3”＜ 4 < 10＜ “11”＜ “2X”。双重分派有时很有用，但仅限于具有不同语义（例如字符串和数字比较）的运算符重载仅限于对不相交的事物集进行操作的情况。在编译时禁止不明确的操作是一件好事，因为程序员将能够指定其意图。这种模糊性触发运行时错误是一件坏事，因为当错误出现时，程序员可能早已离开了。因此，除了说“不要”之外，我真的无法就如何对运算符进行运行时双重分派提供任何建议，除非在编译时将操作数限制为任何可能的重载始终具有相同语义的组合。

例如，如果您有一个抽象的“不可变数字列表”类型，其中有一个成员报告长度或返回特定索引处的数字，则您可以指定两个实例相等（如果它们具有相同的长度），并且每个实例都为他们的每个索引都返回相同的数字。虽然可以通过检查每个项目来比较任何两个实例的相等性，但如果例如一个实例是“BunchOfZeroes”类型，它只保存一个整数 N=1000000 并且实际上不存储任何项目，那么这可能会效率低下。另一个是“NCopiesOfArray”，其中包含 N=500000 和 {0,0} 作为要复制的数组。如果要比较这些类型的许多实例，可以通过让此类比较调用一个方法来提高效率，该方法在检查整个数组长度后，检查“模板”数组是否包含任何非零元素。如果不是，则可以将其报告为等于零串数组，而无需执行 1,000,000 个元素比较。请注意，通过双重分派调用此类方法不会改变程序的行为 - 它只会使其执行得更快。

I would suggest that given X + Y, the compiler should look for both X.op_plus(Y) and Y.op_added_to(X); each implementation should include an attribute indicating whether it should be a 'preferred', 'normal', 'fallback' implementation, and optionally also indicating that it is "common". If both implementations are defined, and they implementations are of different priorities (e.g. "preferred" and "normal"), use the type to select a preference. If both are defined to be of the same priority, and both are "common", favor the X.op_plus(Y) form. If both are defined with the same priority and they are not both "common", flag an error.

I would suggest that the ability to prioritize overloads and conversions would IMHO a very important feature for a language to have. It is not helpful for languages to squawk about ambiguous overloads in cases where both candidates would do the same thing, but languages should squawk in cases where two possible overloads would have different meanings, each of which would be useful in certain contexts. For example, given someFloat==someDouble or someDouble==someLong, a compiler should squawk, since there can be usefulness to knowing whether the numerical quantities represented by two values match, and there can also be usefulness in knowing whether the left-hand operand holds the best possible representation (for its type) of the value in the right-hand operand. Java and C# do not flag ambiguity in either case, opting instead to use the first meaning for the first expression and the second for the second, even though either meaning might be useful in either case. I would suggest that it would be better to reject such comparisons than to have them implement inconsistent semantics.

Overall, I'd suggest as a philosophy that a good language design should let a programmer indicate what's important and what isn't. If a programmer knows that certain "ambiguities" aren't problems, but other ones are, it should be easy to have the compiler flag the latter but not the former.

Addendum

I looked briefly through your proposal; it sees you're expecting bindings to be fully dynamic. I've worked with a language like that (HyperTalk, circa 1988) and it was "interesting". Consider, for example, that "2X" < "3" < 4 < 10 < "11" < "2X". Double dispatch can sometimes be useful, but only in cases where operators overloads with different semantics (e.g. string and numeric comparisons) are limited to operating on disjoint sets of things. Forbidding ambiguous operations at compile time is a good thing, since the programmer will be in a position to specify what's intended. Having such ambiguity trigger a run-time error is a bad thing, because the programmer may be long gone by the time an error surfaces. Consequently, I really can't offer any advice for how to do run-time double dispatch for operators except to say "don't", unless at compile time you restrict the operands to combinations where any possible overload would always have the same semantics.

For example, if you had an abstract "immutable list of numbers" type, with a member to report the length or return the number at a particular index, you could specify that two instances are equal if they have the same length, and every for every index they return the same number. While it would be possible to compare any two instances for equality by examining every item, that could be inefficient if e.g. one instance was a "BunchOfZeroes" type which simply held an integer N=1000000 and didn't actually store any items, and the other was an "NCopiesOfArray" which held N=500000 and {0,0} as the array to be copied. If many instances of those types are going to be compared, efficiency could be improved by having such comparisons invoke a method which, after checking overall array length, checks whether the "template" array contains any non-zero elements. If it doesn't, then it can be reported as equal the bunch-of-zeroes array without having to perform 1,000,000 element comparisons. Note that the invocation of such a method by double dispatch would not alter the program's behavior--it would merely allow it to execute more quickly.

回复收藏 0 原文

~没有更多了~