学习最佳参数以最大化奖励
我有一组示例,每个示例都用特征数据进行注释。这些示例和特征描述了任意域中的实验设置(例如,切换次数、执行天数、参与者数量等)。某些功能是固定的(即静态的),而其他功能我可以在将来的实验中手动设置(即可变的)。每个例子还有一个“奖励”特征,它是一个介于 0 和 1 之间的连续数字,表示专家确定的实验成功。
基于此示例集,并给定未来实验的一组静态特征,我将如何确定用于特定变量的最佳值,以便最大化奖励?
另外,这个过程有正式的名称吗?我做了一些研究,这听起来类似于回归分析,但我仍然没有确定是否是同一件事。
I have a set of examples, which are each annotated with feature data. The examples and features describe the settings of an experiment in an arbitrary domain (e.g. number-of-switches, number-of-days-performed, number-of-participants, etc.). Certain features are fixed (i.e. static), while others I can manually set (i.e. variable) in future experiments. Each example also has a "reward" feature, which is a continuous number bounded between 0 and 1, indicating the success of the experiment as determined by an expert.
Based on this example set, and given a set of static features for a future experiment, how would I determine the optimal value to use for a specific variable so as to maximise the reward?
Also, does this process have a formal name? I've done some research, and this sounds similar to regression analysis, but I'm still not sure if it's the same thing.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该过程称为“实验设计”。可以使用多种技术,具体取决于参数的数量,以及您是否能够在试验之间进行计算,或者是否必须提前选择所有治疗方法。
一旦您根据实验中的数据构建了回归模型,您就可以通过应用常用的数值优化技术来找到最佳值。
The process is called "design of experiments." There are various techniques that can be used depending on the number of parameters, and whether you are able to do computations between trials or if you have to pick all your treatments in advance.
Once you've built a regression model from the data in your experiments, you can find an optimum by applying the usual numerical optimization techniques.