我是 PMML 的新手:预测模型标记语言 (www.dmg.org),我想知道是否有是某种用于创建/解析 PMML 文件的 Java 支持(开源/专业)。
最初我只考虑从 Java 环境以编程方式创建/解析 PMML 文件的可能性。
我一直在“谷歌搜索”,发现了几种可能性:
开源:
来自爪哇。
- JDM。 javax.datamining。看来是死了?有人有更多信息吗?
专业的。
DIY
- 使用 XML Java 库并为自己构建 PMML 文件的解析器/编写器
我感谢您的所有意见。
预先感谢
奥斯卡
I am new in PMML: Predictive Model Markup Language (www.dmg.org) and I was wondering if there is some kind of Java support (Open Source / professional) for creating/parsing PMML files.
Initially I only have in mind the possibility of creating/parsing PMML files programatically from Java environments.
I have been "googling" and I have found several possibilities:
Open source:
From Java.
- JDM. javax.datamining. Seems it a dead ? Someone has more info?
Professional.
DIY
- Use an XML Java library and build yourself a parser/writer of PMML files
I appreciate all your opinions.
Thanks in advance
Oscar
发布评论
评论(1)
您应该意识到答案可能取决于您想要使用的模型元素。创建 PMML 和解析 PMML 的最佳选择也很可能来自不同的软件包。我假设“创建 PMML”是指文档而不是模型。我从未听说过有人将自动模型拟合与执行相集成,但也许它已经存在了。当然,PMML 模型可以使用 SOAP 来传递。
我不能谈论其他项目,但 Zementis 提供的产品,名为 Adapa,仅用于PMML 的执行。该产品假设有一个模型拟合应用程序,该应用程序将通过将拟合模型导出到 PMML 来进行创建。已经有很多成熟的模型拟合应用程序,所以我认为这是一个合理的假设。
我使用的版本(3.6)通常很快,但如果没有特别大的堆,它无法处理典型随机森林大小(500+棵树)的集合。我认为他们可能已经在新版本中修复了这个问题。虽然没有做广告,但 Zementis 似乎没有提供一些模型,即文本模型、序列、基线模型或时间序列(PMML 标准目前仅具有指数平滑)。我的版本也没有 K 最近邻,但我听说更新的版本有。
除非您正在考虑集成拟合和执行(在这种情况下您应该考虑在线学习),我的建议是按顺序考虑这些问题:
如果您查看 DMG 组的成员列表,您会发现许多商业供应商无论是供应方(例如 SAS、SPSS、Togaware、Rapid-I)还是需求方(如此众多,无法列出)。
在你的列表中你也没有提到Weka,但是他们也执行一些PMML模型并且有基于 R/Java 的解决方案,因此您可以执行Java 环境中的 PMML->R 导入(请参阅 fileToXMLNode) (但你也可以只执行 R)。
最后,如果您心中有一个非常具体的模型,并且您了解“执行它”在数学上的含义,那么您自己构建您需要的模型应该不会太困难。
You should realize that the answer may depend on the MODEL-ELEMENT that you want to work with. It is also very likely that your best options for creating PMML and parsing PMML will come from different software packages. I am going to assume that by 'creation of PMML' you mean of the document and not of the model. I've never heard of anyone integrating automatic model fitting with execution but perhaps it exists already. Certainly a PMML model could be passed using SOAP.
I can't speak to the other projects but the product offered by Zementis, called Adapa, is used only for the execution of PMML. This product assumes that there is a model fitting application that will do the creating by exporting a fitted model into PMML. There are already a lot of well developed model fitting applications so I think this is a reasonable assumption.
The version I have used (3.6) was generally fast but it couldn't handle ensembles of typical random forest size (500+ trees) without an especially large heap. I think they may have fixed this in newer versions. Though it isn't advertised, Zementis doesn't appear to offer a few of the models, namely Text Models, Sequences, Baseline Models, or Time Series (for which the PMML standard currently only has Exponential Smoothing anyway). My version also doesn't have K-Nearest Neighbors but I hear that more recent versions do.
Unless you are considering integrated fitting and execution (in which case you should consider online learning) my advise would be to consider these questions in order:
If you look at the list of members to the DMG group you will find many commercial vendors that are either on the supply side (eg. SAS, SPSS, Togaware, Rapid-I) or the demand side (so many to list).
On your list you also didn't mention Weka but they also execute some PMML models and there are R/Java based solutions and so you could execute PMML->R imports (see fileToXMLNode) in a Java environment (but you could also just execute R).
Finally, if you have a very specific model in mind and you understand what it means mathematically to 'execute it' then it shouldn't be too difficult to build what you need yourself.