Scala 和 Jython 中的中缀运算符

发布于 2024-08-02 03:30:02 字数 1130 浏览 7 评论 0原文

我正在评估面向计算的应用程序的语言，该应用程序需要为最终用户提供简单嵌入式脚本语言。我一直在考虑使用 Scala 作为主要底层语言，使用 Jython 作为脚本接口。 Scala 的吸引力在于我可以定义诸如 :* 之类的方法来进行矩阵对象的元素乘法，并将其与中缀语法 a :* b 一起使用。但是 :* 在 Python 中不是有效的方法名称。 Jython 如何处理这个问题？

由于 Scala 的灵活性，我会考虑使用 Scala 作为脚本语言。但即使使用类型推断，所有 val 和 var 以及所需的类型定义对于习惯像 matlab 这样的动态语言的外行用户来说还是太多了。相比之下，Boo 的 -ducky 选项可能会起作用，但我想留在 JVM 而不是 .NET。我假设 Scala 没有 -ducky 。

更一般地说，考虑以下 DSL（来自 http://www.cs.utah.edu /~hal/HBC/）来建模潜在狄利克雷分配：

model {
      alpha     ~ Gam(0.1,1)
      eta       ~ Gam(0.1,1)
      beta_{k}  ~ DirSym(eta, V)           , k \in [1,K]
      theta_{d} ~ DirSym(alpha, K)         , d \in [1,D]
      z_{d,n}   ~ Mult(theta_{d})          , d \in [1,D] , n \in [1,N_{d}]
      w_{d,n}   ~ Mult(beta_{z_{d,n}})     , d \in [1,D] , n \in [1,N_{d}]
}

result = model.simulate(1000)

对于熟悉分层贝叶斯建模的用户来说，这种语法非常棒（例如与 PyMCMC 相比）。 JVM 上是否有任何语言可以轻松定义此类语法，并且可以访问像 python 这样的基本脚本语言？

思想赞赏。

原文

I'm evaluating languages for a computational oriented app that needs an easy embedded scripting language for end users. I have been thinking of using Scala as the main underlying language and Jython for the scripting interface. An appeal of Scala is that I can define methods such as :* for elementwise multiplication of a matrix object and use it with infix syntax a :* b. But :* is not a valid method name in Python. How does Jython deal with this?

I would consider using Scala as the scripting language, due to its flexibility. But even with type inference, all the val and var and required type definitions are too much for lay users used to dynamic language like matlab. By comparison, Boo has the option -ducky option which might work, but I'd like to stay on the JVM rather than .NET. I assume there is no -ducky for Scala.

More generally, consider the following DSL (from http://www.cs.utah.edu/~hal/HBC/) to model a Latent Dirichlet Allocation:

model {
      alpha     ~ Gam(0.1,1)
      eta       ~ Gam(0.1,1)
      beta_{k}  ~ DirSym(eta, V)           , k \in [1,K]
      theta_{d} ~ DirSym(alpha, K)         , d \in [1,D]
      z_{d,n}   ~ Mult(theta_{d})          , d \in [1,D] , n \in [1,N_{d}]
      w_{d,n}   ~ Mult(beta_{z_{d,n}})     , d \in [1,D] , n \in [1,N_{d}]
}

result = model.simulate(1000)

This syntax is terrific (compared to PyMCMC for instance) for users familiar with hierarchical Bayesian modeling. Is there any language on the JVM that would make is easy to define such syntax, along with having access to a basic scripting language like python?

Thoughts appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

画▽骨i 2024-08-09 03:30:02

就我个人而言，我认为您夸大了 Scala 的开销。例如，

alpha     ~ Gam(10,10)
mu_{k}    ~ NorMV(vec(0.0,1,dim), 1, dim)     , k \in [1,K]
si2       ~ IG(10,10)
pi        ~ DirSym(alpha, K)
z_{n}     ~ Mult(pi)                          , n \in [1,N]
x_{n}     ~ NorMV(mu_{z_{n}}, si2, dim)       , n \in [1,N]

可以写成

def alpha =                   Gam(10, 10)
def mu    = 1 to 'K map (k => NorMV(Vec(0.0, 1, dim), 1, dim)
def si2   =                   IG(10, 10)
def pi    =                   DirSym(alpha, 'K)
def z     = 1 to 'N map (n => Mult(pi))
def x     = 1 to 'N map (n => NormMV(mu(z(n)), si2, dim))

在这种特殊情况下，除了定义 Gam、Vec、NorMV 等之外，几乎没有做任何事情，并创建从 Symbol 到 Int 或 Double 的隐式定义，从稍后存储此类定义的表中读取（例如使用loadM 等效项）。这种隐式定义会像这样：

import scala.reflect.Manifest
val unknowns = scala.collection.mutable.HashMap[Symbol,(Manifest[_], Any)]()
implicit def getInt(s: Symbol)(implicit m: Manifest[Int]): Int = unknowns.get(s) match {
  case Some((`m`, x)) => x.asInstanceOf[Int]
  case _ => error("Undefined unknown "+s)
}
// similarly to getInt for any other desired type

它也可以这样写：

Model (
  'alpha    -> Gam(10, 10),
  'mu -> 'n -> NorMV(Vec(0.0, 1, dim), 1, dim)      With ('k in (1 to 'K)),
  'si2      -> IG(10, 10),
  'pi       -> DirSym('alpha, 'K),
  'z -> 'n  -> Mult('pi)                            With ('n in (1 to 'N)),
  'x -> 'n  -> NorMV('mu of ('z of 'n), 'si2, dim)) With ('n in (1 to 'N)) 
)

在这种情况下，Gam、Mult等需要定义得有点不同，以处理传递给他们的符号。不过，过多的“'”肯定很烦人。

HBC 也不是没有自己的特性，比如偶尔需要类型声明、索引前加下划线、偶尔需要将“~”替换为“\in”，甚至是后面需要加的反斜杠。只要使用它而不是 HBC、MathLab 或人们习惯的其他任何东西确实有好处，他们就会给自己带来一些麻烦。

Personally, I think you overstate the overhead of Scala. For instance, this:

alpha     ~ Gam(10,10)
mu_{k}    ~ NorMV(vec(0.0,1,dim), 1, dim)     , k \in [1,K]
si2       ~ IG(10,10)
pi        ~ DirSym(alpha, K)
z_{n}     ~ Mult(pi)                          , n \in [1,N]
x_{n}     ~ NorMV(mu_{z_{n}}, si2, dim)       , n \in [1,N]

could be written as

def alpha =                   Gam(10, 10)
def mu    = 1 to 'K map (k => NorMV(Vec(0.0, 1, dim), 1, dim)
def si2   =                   IG(10, 10)
def pi    =                   DirSym(alpha, 'K)
def z     = 1 to 'N map (n => Mult(pi))
def x     = 1 to 'N map (n => NormMV(mu(z(n)), si2, dim))

In this particular case, almost nothing was done, except define Gam, Vec, NorMV, etc, and create an implicit definition from Symbol to Int or Double, reading from a table where you'll store such definitions later on (such as with a loadM equivalent). Such implicit definitions would go like this:

import scala.reflect.Manifest
val unknowns = scala.collection.mutable.HashMap[Symbol,(Manifest[_], Any)]()
implicit def getInt(s: Symbol)(implicit m: Manifest[Int]): Int = unknowns.get(s) match {
  case Some((`m`, x)) => x.asInstanceOf[Int]
  case _ => error("Undefined unknown "+s)
}
// similarly to getInt for any other desired type

It could be written as such, too:

Model (
  'alpha    -> Gam(10, 10),
  'mu -> 'n -> NorMV(Vec(0.0, 1, dim), 1, dim)      With ('k in (1 to 'K)),
  'si2      -> IG(10, 10),
  'pi       -> DirSym('alpha, 'K),
  'z -> 'n  -> Mult('pi)                            With ('n in (1 to 'N)),
  'x -> 'n  -> NorMV('mu of ('z of 'n), 'si2, dim)) With ('n in (1 to 'N)) 
)

In which case Gam, Mult, etc would need to be defined a bit different, to handle the symbols being passed to them. The excess of "'" is definitely annoying, though.

It's not like HBC doesn't have it's own idiosyncrasies, such as the occasional need for type declarations, underscores before indices, the occasional need to replace "~" with "\in", or even the backslash that needs to preceed the later. As long as there is a real benefit from using it instead of HBC, MathLab, or whatever else the person is used to, they'll trouble themselves a bit.

回复收藏 0 原文

秋意浓 2024-08-09 03:30:02

编辑：

阅读完所有讨论后，最好的方法可能是定义 DSL 的语法，然后使用 scala 内置的解析实用程序对其进行解析。

我不确定你想要实现什么目标。您的脚本语言会更多地是“什么”还是“如何”类型？你给我的例子是一个“什么”类型的 DSL -> 您描述了您想要实现的目标，而不关心实施情况。这些语言最适合用来描述问题，并且根据您正在构建应用程序的领域，我认为这是最好的方法。用户只需使用问题域非常熟悉的语法来描述问题，应用程序就会解析该描述并将其用作输入以运行模拟。为此，构建语法并使用 scala 解析实用程序解析它可能是最好的方法（您只想向用户公开一小部分功能）。

如果您需要“如何”脚本，那么使用已经建立的脚本语言是可行的方法（除非您想自己实现循环、基本数据结构等）。

在设计系统时，总是需要进行权衡。这里它介于您想要向用户公开的功能数量和脚本的简洁性之间。我自己，我将公开尽可能少的功能来完成工作，并以“如何”的方式完成它 - 如果模拟给出，用户不需要知道您将如何模拟其问题正确的结果并在合理的时间内运行。

如果您向用户公开完整的脚本语言，您的 DSL 将只是该脚本语言中的一个小型 API，并且用户必须学习完整的语言才能使用其全部功能。而且您可能不希望用户使用其全部功能（这可能会对您的应用程序造成严重破坏！）。例如，当您的应用程序不需要连接到互联网时，为什么要公开 TCP 套接字支持？这可能是一个可能的安全漏洞。

-- 以下部分讨论可能的脚本语言。我上面的答案建议不要使用它们，但为了完整性我已经离开了讨论。

我没有这方面的经验，但看看Groovy。它是 JVM 的动态类型脚本语言（由于 invokedynamic，JVM 支持在 JDK 7 中可能会变得更好）。它还对运算符重载和编写 DSL。不幸的是，它不支持用户定义的运算符，至少据我所知。

不过，我仍然会选择 scala（部分是因为我喜欢静态类型，而且我发现它的类型推断很好:)。它的脚本支持非常好，你几乎可以让任何东西看起来像本地语言支持（例如看看它的演员库！）。它还对函数式编程有很好的支持，可以使脚本非常短小简洁。另一个好处是，您将可以使用 Java 库的所有功能。

为了使用 scala 作为脚本语言，只需将脚本放入以 .scala 结尾的文件中，然后运行 scala filename.scala 即可。请参阅 Scala 作为脚本语言进行讨论，比较 scala 和 JRuby。

EDIT:

After reading all the discussion, probably the best way to go is to define the grammar of your DSL and then parse it with the inbuilt parsing utilities of scala.

I'm not sure though what you are trying to achieve. Will your scripting language be more of a "what" or of a "how" type? The example you have given me is a "what" type DSL -> you describe what you are trying to achieve, and not care about the implementation. These are languages best used to describe a problem, and by the domain you are building the app for, I think it's the best way to go. The user just describes the problem in a syntax very familiar to the problem domain, the application parses this description and uses it as an input in order to run the simulation. For this, building a grammar and parsing it with the scala parsing utilities will probably be the best way to go (you only want to expose a small subset of features for the users).

If you need a "how" script, then using an already established scripting language is the way to go (unless you want to implement loops, basic data structures, etc yourself).

In designing a system, there will always be trade-offs to be made. Here it is between the amount of features you want to expose to the user and the terseness of your script. Myself, I'll go with exposing as few features as possible to get the job done, and get it done in a "how" way - the user doesn't need to know how you are going to simulate its problem if the simulation gives correct results and runs in reasonable time.

If you expose a full scripting language to the user, your DSL will just be a small API in that scripting language and the user will have to learn a full language to be able to use its full power. And you may not want a user to use its full power (it may wreck havoc to your app!). Why would you expose, for example, TCP socket support when your application doesn't need to connect to the internet? That could be a possible security hole.

-- The following section discusses possible scripting languages. My above answer advises against using them, but I have left the discussion for completeness.

I have no experience with it, but have a look at Groovy. It is a dynamically typed scripting language for the JVM (with JVM support probably going to get better in JDK 7 due to invokedynamic). It also has good support for operator overloading and writing DSLs. Unfortunately, it doesn't have support for user defined operators, at least not to my knowledge.

I would still go with scala though (partially because I like static typing and I find its type inference good :). It's scripting support is quite good, and you can make almost anything look like native language support (for example have a look at its actors library!). It also has very good support for functional programming, which can make scripts very short and concise. And as a benefit, you'll have all the power of the Java libraries at your disposal.

In order to use scala as a scripting language, just put your script in a file ending with .scala and then run scala filename.scala. See Scala as a scripting Language for a discussion, comparing scala with JRuby.

回复收藏 0 原文