Good feature engineering involves two components. The first is an understanding the properties of the task you're trying to solve and how they might interact with the strengths and limitations of the classifier you're using. The second is experimental work where you will be testing your expectations and find out what actually works and what doesn't.
This can be done iteratively: Your top down understanding of the problem motivates experiments, and then the bottom up information you learn for those experiments helps you obtain a better understanding of the problem. The deeper understanding of the problem can then drive more experiments.
Fitting Features to Your Classifier
Let’s say you're using a simple linear classifier like logistic-regression or a SVM with a linear kernel. If you think there might be interesting interactions between various attributes you can measure and provide as input to the classifier, you'll need to manually construct and provide features that capture those interactions. However, if you're using a SVM with a polynomial or Gaussian kernel, interactions between the input variables will already be captured by the structure of the model.
Similarly, SVMs can perform poorly if some input variables take on a much larger range of values than others (e.g., most features take on a value of 0 or 1, but one feature takes on values between -1000 and 1000). So, when you’re doing feature engineering for a SVM, you might want to try normalizing the values of your features before providing them to the classifier. However, if you're using decision trees or random forests, such normalization isn't necessary, as these classifiers are robust to differences in magnitude between the values that various features take on.
Notes Specifically on Puzzle Solving
If you're looking at solving a problem with a complex state space, you might want to use a reinforcement learning approach like Q-learning. This helps structure learning tasks that involve reaching some goal by a series of intermediate steps by the system.
发布评论
评论(1)
良好的特征工程涉及两个组成部分。首先是了解您要解决的任务的属性以及它们如何与您正在使用的分类器的优点和局限性相互作用。第二个是实验性工作,您将测试您的期望并找出哪些确实有效,哪些无效。
这可以迭代地完成:您对问题的自上而下理解激发实验,然后您在这些实验中学到的自下而上信息可以帮助您更好地理解问题。对问题的更深入理解可以推动更多的实验。
使特征适合您的分类器
假设您正在使用一个简单的线性分类器,例如 logistic-回归或具有线性内核的SVM。如果您认为可以测量并作为分类器的输入提供的各种属性之间可能存在有趣的交互,则您需要手动构建并提供捕获这些交互的功能。但是,如果您使用具有多项式或高斯核的 SVM,则输入变量之间的相互作用将已经由模型的结构捕获。
类似地,如果某些输入变量的取值范围比其他变量大得多(例如,大多数特征取值为 0 或 1,但一个特征取值在 -1000 到 1000 之间),SVM 的性能可能会很差。因此,当您对 SVM 进行特征工程时,您可能需要尝试在将特征值提供给分类器之前对其进行标准化。但是,如果您使用决策树或随机森林,这种标准化是不必要的,因为这些分类器对于各种特征所取值之间的大小差异具有鲁棒性。
关于解谜的具体注释
如果您正在考虑解决复杂状态空间的问题,您可能需要使用强化学习方法,例如Q-learning 。这有助于构建涉及系统通过一系列中间步骤实现某些目标的学习任务。
Good feature engineering involves two components. The first is an understanding the properties of the task you're trying to solve and how they might interact with the strengths and limitations of the classifier you're using. The second is experimental work where you will be testing your expectations and find out what actually works and what doesn't.
This can be done iteratively: Your top down understanding of the problem motivates experiments, and then the bottom up information you learn for those experiments helps you obtain a better understanding of the problem. The deeper understanding of the problem can then drive more experiments.
Fitting Features to Your Classifier
Let’s say you're using a simple linear classifier like logistic-regression or a SVM with a linear kernel. If you think there might be interesting interactions between various attributes you can measure and provide as input to the classifier, you'll need to manually construct and provide features that capture those interactions. However, if you're using a SVM with a polynomial or Gaussian kernel, interactions between the input variables will already be captured by the structure of the model.
Similarly, SVMs can perform poorly if some input variables take on a much larger range of values than others (e.g., most features take on a value of 0 or 1, but one feature takes on values between -1000 and 1000). So, when you’re doing feature engineering for a SVM, you might want to try normalizing the values of your features before providing them to the classifier. However, if you're using decision trees or random forests, such normalization isn't necessary, as these classifiers are robust to differences in magnitude between the values that various features take on.
Notes Specifically on Puzzle Solving
If you're looking at solving a problem with a complex state space, you might want to use a reinforcement learning approach like Q-learning. This helps structure learning tasks that involve reaching some goal by a series of intermediate steps by the system.