RandomForestRegressor:发现的输入变量,示例数量不一致
这是一个即将到期的项目,因此将不胜感激,我从未做过ML,因此很抱歉,如果错误是绝对光滑的大脑。
我有一个数据集,该数据集以及个性分数以及个性分数,我需要训练一个模型来预测分数。 到目前为止,这就是我所做的,通过遵循大量教程并将我学到的东西缝合在一起。
train = pandas.read_csv('../dataset/cleaner_dataset.csv')
train['tweet'] = train['tweet'].str.lower()
train['tweet'] = train['tweet'].replace('[^a-zA-Z0-9]', ' ', regex = True)
X = train['tweet']
y = train['neuroticism']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
vectorizer = TfidfVectorizer(min_df=5)
X_vectorized = vectorizer.fit_transform(X_train)
vectorizer = TfidfVectorizer(min_df=5)
X_test_vec = vectorizer.fit_transform(X_train)
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
model.fit(X_vectorized, y_train)
model.score(X_test_vec, y_test)
但是,当我在笔记本中运行时,我在最后一行的代码上会遇到错误。
ValueError: Found input variables with inconsistent numbers of samples: [495, 1980]
完整错误消息: https://i.sstatic.net/cff5w.jpg
This is for a project that's due soon so help would be greatly appreciated, I've never done ML before so sorry if the mistake is an absolute smooth brain one.
I have a dataset that's a bunch of tweets along with personality scores, and I need to train an model to predict the scores.
This is what I've done so far by following a bunch of tutorials and stitching together what I learned.
train = pandas.read_csv('../dataset/cleaner_dataset.csv')
train['tweet'] = train['tweet'].str.lower()
train['tweet'] = train['tweet'].replace('[^a-zA-Z0-9]', ' ', regex = True)
X = train['tweet']
y = train['neuroticism']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
vectorizer = TfidfVectorizer(min_df=5)
X_vectorized = vectorizer.fit_transform(X_train)
vectorizer = TfidfVectorizer(min_df=5)
X_test_vec = vectorizer.fit_transform(X_train)
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
model.fit(X_vectorized, y_train)
model.score(X_test_vec, y_test)
However I'm getting an error on the last line of code when I run it in the notebook.
ValueError: Found input variables with inconsistent numbers of samples: [495, 1980]
Full error message: https://i.sstatic.net/cff5w.jpg
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您正在使用X_Train进行火车和测试,这是您遇到错误的原因。
尝试:
如下所示,我们不适合测试集。
但是*您仍然需要使用y_test使用x_test
you are using x_train for both train and test and is the reason you are getting the error.
try:
As pointed out below, we dont fit the test set.
BUT* you still need to use the X_test with y_test