用模态积极学习 - 形状无效

发布于 2025-02-06 20:47:05 字数 1454 浏览 1 评论 0原文

我正在尝试在Python实施积极的学习。我的分类问题目前会采用Word2Vec矢量表示,并将其喂入随机的森林中。

我有一个很小的初始火车数据集,我想使用模态软件包来利用主动学习并增加其大小。

这是我到目前为止尝试的:

from modAL.models import ActiveLearner


learner = ActiveLearner(
    estimator=RandomForestClassifier(),
    query_strategy=modAL.uncertainty.uncertainty_sampling,
    X_training=X_train0, y_training=y_train
)

test=test.reset_index()
for i in range(20):
    query_idx, query_instance = learner.query(X_test0)
    y_new = input('Classify:')
    y_new=np.array([y_new])
    learner.teach(np.array(
    X_test0[query_idx].reshape(-1,1), y_new)

其中x_test0是一个带有形状1056x 100的熊猫数据框架(即1056个示例,每个示例具有100个功能,即Word2Vec表示)。我把它留下来,好像我没有标记以稍后检查性能。 同样,y_train是另一个包含培训数据(0s或1s)的二进制分类的pandas数据框。

我的问题是,我想让模态了解我在多个功能下工作,因此,每100个长度向量的分类是唯一的。在上面的示例中,出现以下错误:

ValueError: Found input variables with inconsistent numbers of samples: [100, 1]

在我看来,这100个功能仅与一个标签相对应...

有关如何解决它的任何线索?

编辑:我认为重塑功能可能是某种东西。由于似乎希望作为输入数组,所以我还尝试修改最后一行,如下所示:

learner.teach(X_test0.iloc[query_idx].values, np.array(y_new))

现在产生以下错误:

TypeError: cannot concatenate object of type '<class 'numpy.ndarray'>'; only Series and DataFrame objs are valid

删除.values使其成为数据帧也会产生错误:

TypeError: <class 'pandas.core.series.Series'> datatype is not supported
``

I am trying to implement active learning in Python. My classification problem currently takes Word2vec vector representations and feeds them into a Random Forest.

I have a tiny, initial train dataset and I would like to use the modAL package to exploit active learning and increase its size.

Here is what I've tried so far:

from modAL.models import ActiveLearner


learner = ActiveLearner(
    estimator=RandomForestClassifier(),
    query_strategy=modAL.uncertainty.uncertainty_sampling,
    X_training=X_train0, y_training=y_train
)

test=test.reset_index()
for i in range(20):
    query_idx, query_instance = learner.query(X_test0)
    y_new = input('Classify:')
    y_new=np.array([y_new])
    learner.teach(np.array(
    X_test0[query_idx].reshape(-1,1), y_new)

Where X_test0 is a pandas Dataframe with shape 1056x 100 (i.e 1056 examples with 100 features each, which are Word2vec representations). I leave this as if I had it unlabelled to later check performance.
Similarly, y_train is another pandas dataframe containing the binary classification for the training data (0s or 1s).

My issue is that I want to make modAL understand that I am working under multiple features, and thus the classification is unique per every 100 length vector. In the example above, the following error appears:

ValueError: Found input variables with inconsistent numbers of samples: [100, 1]

It seems to me that it is not understanding that those 100 features correspond to only one label...

Any clue on how to solve it?

EDIT: I thought it might have been something with the reshaping function. Since it seems that it wants as an input an array, I also tried modifying the last line as follows:

learner.teach(X_test0.iloc[query_idx].values, np.array(y_new))

which now produces the following error:

TypeError: cannot concatenate object of type '<class 'numpy.ndarray'>'; only Series and DataFrame objs are valid

Removing .values to make it a dataframe also produces an error:

TypeError: <class 'pandas.core.series.Series'> datatype is not supported
``

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文