用模态积极学习 - 形状无效

发布于 2025-02-06 20:47:05 字数 1454 浏览 1 评论 0原文

我正在尝试在Python实施积极的学习。我的分类问题目前会采用Word2Vec矢量表示，并将其喂入随机的森林中。

我有一个很小的初始火车数据集，我想使用模态软件包来利用主动学习并增加其大小。

这是我到目前为止尝试的：

from modAL.models import ActiveLearner


learner = ActiveLearner(
    estimator=RandomForestClassifier(),
    query_strategy=modAL.uncertainty.uncertainty_sampling,
    X_training=X_train0, y_training=y_train
)

test=test.reset_index()
for i in range(20):
    query_idx, query_instance = learner.query(X_test0)
    y_new = input('Classify:')
    y_new=np.array([y_new])
    learner.teach(np.array(
    X_test0[query_idx].reshape(-1,1), y_new)

其中x_test0是一个带有形状1056x 100的熊猫数据框架（即1056个示例，每个示例具有100个功能，即Word2Vec表示）。我把它留下来，好像我没有标记以稍后检查性能。同样，y_train是另一个包含培训数据（0s或1s）的二进制分类的pandas数据框。

我的问题是，我想让模态了解我在多个功能下工作，因此，每100个长度向量的分类是唯一的。在上面的示例中，出现以下错误：

ValueError: Found input variables with inconsistent numbers of samples: [100, 1]

在我看来，这100个功能仅与一个标签相对应...

有关如何解决它的任何线索？

编辑：我认为重塑功能可能是某种东西。由于似乎希望作为输入数组，所以我还尝试修改最后一行，如下所示：

learner.teach(X_test0.iloc[query_idx].values, np.array(y_new))

现在产生以下错误：

TypeError: cannot concatenate object of type '<class 'numpy.ndarray'>'; only Series and DataFrame objs are valid

删除.values使其成为数据帧也会产生错误：

TypeError: <class 'pandas.core.series.Series'> datatype is not supported
``

原文

I am trying to implement active learning in Python. My classification problem currently takes Word2vec vector representations and feeds them into a Random Forest.

I have a tiny, initial train dataset and I would like to use the modAL package to exploit active learning and increase its size.

Here is what I've tried so far:

from modAL.models import ActiveLearner


learner = ActiveLearner(
    estimator=RandomForestClassifier(),
    query_strategy=modAL.uncertainty.uncertainty_sampling,
    X_training=X_train0, y_training=y_train
)

test=test.reset_index()
for i in range(20):
    query_idx, query_instance = learner.query(X_test0)
    y_new = input('Classify:')
    y_new=np.array([y_new])
    learner.teach(np.array(
    X_test0[query_idx].reshape(-1,1), y_new)

Where X_test0 is a pandas Dataframe with shape 1056x 100 (i.e 1056 examples with 100 features each, which are Word2vec representations). I leave this as if I had it unlabelled to later check performance.
Similarly, y_train is another pandas dataframe containing the binary classification for the training data (0s or 1s).

My issue is that I want to make modAL understand that I am working under multiple features, and thus the classification is unique per every 100 length vector. In the example above, the following error appears:

ValueError: Found input variables with inconsistent numbers of samples: [100, 1]

It seems to me that it is not understanding that those 100 features correspond to only one label...

Any clue on how to solve it?

EDIT: I thought it might have been something with the reshaping function. Since it seems that it wants as an input an array, I also tried modifying the last line as follows:

learner.teach(X_test0.iloc[query_idx].values, np.array(y_new))

which now produces the following error:

TypeError: cannot concatenate object of type '<class 'numpy.ndarray'>'; only Series and DataFrame objs are valid

Removing .values to make it a dataframe also produces an error:

TypeError: <class 'pandas.core.series.Series'> datatype is not supported
``

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

关于作者

够运

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

用模态积极学习 - 形状无效

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

浪子阿飞

JK.Yang

人间不值得

静待花开

只涨不跌

污浊的双黑

友情链接

用模态积极学习 - 形状无效

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

浪子阿飞

JK.Yang

人间不值得

静待花开

只涨不跌

污浊的双黑

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。