如何过滤/排序/排名对象模型节点?
我有某种对象模型,我需要对其节点进行过滤和排序以获取某种属性。 存在哪些类型的自动化系统来生成和选择与我想要的相关的对象模型属性? (我故意抽象和非具体)
我正在考虑一个类似于垃圾邮件过滤器或监督分类系统的系统,在给定示例数据集的情况下,它识别找到感兴趣节点的规则。 然而,我正在寻找一个更通用的系统,因为它不需要任何有关对象模型的设计时信息。 它应该像电子邮件上的垃圾邮件过滤器、代码库上的错误查找器、新闻组中的兴趣过滤器或社交网站上的机器人帐户查找器一样工作良好。 只要它可以通过反射探索对象模型并获得一组“有趣”的节点,它就应该能够找到规则来找到更多类似的节点。
I have some kind of object model and I need to filter and sort it's nodes for some kind of property. What kinds of automated systems exist to generate and select properties of the object model that correlate to what I want? (I'm intentionally being abstract and non-specific)
I'm thinking of a system that works kind of like spam filters or supervised classification systems in that given an example data set it identifies rules that find nodes of interest. However I'm looking for a more general system in that it shouldn't require any design time information about the object model. It should work equality well as a spam filter on e-mail, a bug finder on a code base, an interest filter in a newsgroup or bot accounts finder on a social networking site. As long as it can explore the object model via reflection and be given a set of "interesting" nodes, it should be able to find rules that will find more nodes like them.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不太可能有一个单一的自动分类系统可以完成您所要求的所有任务。 此外,我认为错误查找器应用程序不属于此类系统的范围,因为该领域成功使用的方法主要围绕语法分析、数据流分析和其他针对软件错误问题高度定制的算法方法。 尽管机器学习研究正在那里进行,但该领域的分类系统主要用于增强而不是取代分析方法(据我所知)。
对于大多数重要的分类问题,通常需要仔细选择和细化问题表示,以便通过机器学习获得有用且有效的结果。 简单地使用现有的“原始”数据对象模型而不对状态空间进行某种定制的转换往往会导致输入数据值的分布覆盖不完整和/或学习到的分类器的泛化能力较差。 此外,所使用的机器学习方法特有的其他参数可能需要反复试验调整才能针对给定问题获得不错的结果。 并不是所有的方法都有这样的参数,但很多方法都有,比如神经网络、遗传算法、贝叶斯推理方法等。
你所要求的是一种近乎通用的机器学习方法,目前还不存在。 我能看到的最可行的替代方案是(1)找到不同问题的子集,这不是所需的能力/复杂程度,或者(2)创建一个不仅使用一种分类技术而是使用一种分类技术的系统有一个包含不同方法的工具箱,它可以自动针对给定问题进行测试,然后使用在监督学习机制下生成最佳分类结果的方法。 不过,后者要有效实现仍然是一个相当大的挑战,并且它并没有消除如何表示/转换数据模型的状态空间的问题。
It is highly unlikely that there is a single automated classification system which could do all that you are asking. Additionally, I believe the bug finder application falls outside the scope of such a system since the methods which are being successfully used in that domain largely revolve around syntactic analysis, data flow analysis, and other algorithmic methods highly tailored to issues surrounding software errors. Although machine learning research is being done there, the classification systems in this domain are mostly being used to augment rather than replace analytical methods (so far as I know).
For most non-trivial classification problems, careful selection and refinement of the problem representation is typically required in order to get useful and effective results via machine learning. Simply using the existing "raw" data object model without some sort of tailored transformation of the state space tends to lead to either incomplete coverage of the distribution of input data values and/or poor generalization of the learned classifiers. Additionally, other parameters specific to the machine learning method being used may require trial-and-error tweaking to get decent results for a given problem. Not all methods have such parameters, but many do, such as neural networks, genetic algorithms, bayesian inference methods, etc.
What you are asking for is a nearly universal machine learning method, which is not something which currently exists. The most viable alternatives that I can see would be to (1) find a subset of different problems for which this would not be the level of capability/sophistication required, or (2) create a system which uses not just one classification technique but rather has a toolbox of different methods that it automatically tests out against a given problem and then uses the one which generates the best classification results under a supervising learning regime. The latter would still be quite a challenge to pull off effectively though, and it does not eliminate the problem of how to represent/transform the state space for the data model.