当前位置：文江博客话题详情

如何设计机器学习的特征

发布于 2024-08-29 08:49:04 字数 1431 浏览 16 评论 0原文

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

陌若浮生 2024-09-05 08:49:04

良好的特征工程涉及两个组成部分。首先是了解您要解决的任务的属性以及它们如何与您正在使用的分类器的优点和局限性相互作用。第二个是实验性工作，您将测试您的期望并找出哪些确实有效，哪些无效。

这可以迭代地完成：您对问题的自上而下理解激发实验，然后您在这些实验中学到的自下而上信息可以帮助您更好地理解问题。对问题的更深入理解可以推动更多的实验。

使特征适合您的分类器

假设您正在使用一个简单的线性分类器，例如 logistic-回归或具有线性内核的SVM。如果您认为可以测量并作为分类器的输入提供的各种属性之间可能存在有趣的交互，则您需要手动构建并提供捕获这些交互的功能。但是，如果您使用具有多项式或高斯核的 SVM，则输入变量之间的相互作用将已经由模型的结构捕获。

类似地，如果某些输入变量的取值范围比其他变量大得多（例如，大多数特征取值为 0 或 1，但一个特征取值在 -1000 到 1000 之间），SVM 的性能可能会很差。因此，当您对 SVM 进行特征工程时，您可能需要尝试在将特征值提供给分类器之前对其进行标准化。但是，如果您使用决策树或随机森林，这种标准化是不必要的，因为这些分类器对于各种特征所取值之间的大小差异具有鲁棒性。

关于解谜的具体注释

如果您正在考虑解决复杂状态空间的问题，您可能需要使用强化学习方法，例如Q-learning 。这有助于构建涉及系统通过一系列中间步骤实现某些目标的学习任务。

Good feature engineering involves two components. The first is an understanding the properties of the task you're trying to solve and how they might interact with the strengths and limitations of the classifier you're using. The second is experimental work where you will be testing your expectations and find out what actually works and what doesn't.

This can be done iteratively: Your top down understanding of the problem motivates experiments, and then the bottom up information you learn for those experiments helps you obtain a better understanding of the problem. The deeper understanding of the problem can then drive more experiments.

Fitting Features to Your Classifier

Let’s say you're using a simple linear classifier like logistic-regression or a SVM with a linear kernel. If you think there might be interesting interactions between various attributes you can measure and provide as input to the classifier, you'll need to manually construct and provide features that capture those interactions. However, if you're using a SVM with a polynomial or Gaussian kernel, interactions between the input variables will already be captured by the structure of the model.

Similarly, SVMs can perform poorly if some input variables take on a much larger range of values than others (e.g., most features take on a value of 0 or 1, but one feature takes on values between -1000 and 1000). So, when you’re doing feature engineering for a SVM, you might want to try normalizing the values of your features before providing them to the classifier. However, if you're using decision trees or random forests, such normalization isn't necessary, as these classifiers are robust to differences in magnitude between the values that various features take on.

Notes Specifically on Puzzle Solving

If you're looking at solving a problem with a complex state space, you might want to use a reinforcement learning approach like Q-learning. This helps structure learning tasks that involve reaching some goal by a series of intermediate steps by the system.

回复收藏 0 原文

~没有更多了~

关于作者

平定天下

暂无简介

文章

576 人气

关注发私信

友情链接

文江博客

如何设计机器学习的特征

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

如何设计机器学习的特征

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。