将 pandas 数据帧中的字符串(类别)数组转换为 int 数组
我正在尝试做一些与上一个问题 但我收到错误。 我有一个包含特征、标签的 pandas 数据框,我需要进行一些转换以将特征和标签变量发送到机器学习对象中:
import pandas
import milk
from scikits.statsmodels.tools import categorical
然后我有:
trainedData=bigdata[bigdata['meta']<15]
untrained=bigdata[bigdata['meta']>=15]
#print trainedData
#extract two columns from trainedData
#convert to numpy array
features=trainedData.ix[:,['ratio','area']].as_matrix(['ratio','area'])
un_features=untrained.ix[:,['ratio','area']].as_matrix(['ratio','area'])
print 'features'
print features[:5]
##label is a string:single, touching,nuclei,dust
print 'labels'
labels=trainedData.ix[:,['type']].as_matrix(['type'])
print labels[:5]
#convert single to 0, touching to 1, nuclei to 2, dusts to 3
#
tmp=categorical(labels,drop=True)
targets=categorical(labels,drop=True).argmax(1)
print targets
输出控制台首先产生:
features
[[ 0.38846334 0.97681855]
[ 3.8318634 0.5724734 ]
[ 0.67710876 1.01816444]
[ 1.12024943 0.91508699]
[ 7.51749674 1.00156707]]
labels
[[single]
[touching]
[single]
[single]
[nuclei]]
我遇到以下错误:
Traceback (most recent call last):
File "/home/claire/Applications/ProjetPython/projet particule et objet/karyotyper/DAPI-Trainer02-MILK.py", line 83, in <module>
tmp=categorical(labels,drop=True)
File "/usr/local/lib/python2.6/dist-packages/scikits.statsmodels-0.3.0rc1-py2.6.egg/scikits/statsmodels/tools/tools.py", line 206, in categorical
tmp_dummy = (tmp_arr[:,None]==data).astype(float)
AttributeError: 'bool' object has no attribute 'astype'
是否可以将数据框中的类别变量“type”转换为 int 类型? “type”可以采用值“single”、“touching”、“nuclei”、“dusts”,我需要使用 int 值进行转换,例如 0、1、2、3。
I am trying to do something very similar to that previous question but I get an error.
I have a pandas dataframe containing features,label I need to do some convertion to send the features and the label variable into a machine learning object:
import pandas
import milk
from scikits.statsmodels.tools import categorical
then I have:
trainedData=bigdata[bigdata['meta']<15]
untrained=bigdata[bigdata['meta']>=15]
#print trainedData
#extract two columns from trainedData
#convert to numpy array
features=trainedData.ix[:,['ratio','area']].as_matrix(['ratio','area'])
un_features=untrained.ix[:,['ratio','area']].as_matrix(['ratio','area'])
print 'features'
print features[:5]
##label is a string:single, touching,nuclei,dust
print 'labels'
labels=trainedData.ix[:,['type']].as_matrix(['type'])
print labels[:5]
#convert single to 0, touching to 1, nuclei to 2, dusts to 3
#
tmp=categorical(labels,drop=True)
targets=categorical(labels,drop=True).argmax(1)
print targets
The output console yields first:
features
[[ 0.38846334 0.97681855]
[ 3.8318634 0.5724734 ]
[ 0.67710876 1.01816444]
[ 1.12024943 0.91508699]
[ 7.51749674 1.00156707]]
labels
[[single]
[touching]
[single]
[single]
[nuclei]]
I meet then the following error:
Traceback (most recent call last):
File "/home/claire/Applications/ProjetPython/projet particule et objet/karyotyper/DAPI-Trainer02-MILK.py", line 83, in <module>
tmp=categorical(labels,drop=True)
File "/usr/local/lib/python2.6/dist-packages/scikits.statsmodels-0.3.0rc1-py2.6.egg/scikits/statsmodels/tools/tools.py", line 206, in categorical
tmp_dummy = (tmp_arr[:,None]==data).astype(float)
AttributeError: 'bool' object has no attribute 'astype'
Is it possible to convert the category variable 'type' within the dataframe into int type ? 'type' can take the values 'single', 'touching','nuclei','dusts' and I need to convert with int values such 0, 1, 2, 3.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
之前的答案已经过时了,所以这里有一个将字符串映射到数字的解决方案,适用于 Pandas 0.18.1 版本。
对于系列:
对于数据框:
The previous answers are outdated, so here is a solution for mapping strings to numbers that works with version 0.18.1 of Pandas.
For a Series:
For a DataFrame:
如果您有一个字符串或其他对象的向量,并且想要为其提供分类标签,则可以使用
Factor
类(在pandas
命名空间中提供):该因子具有属性
labels
和levels
:这是用于一维向量的,因此不确定它是否可以立即应用于您的问题,但请看一下。
顺便说一句,我建议您在 statsmodels 和/或 scikit-learn 邮件列表上提出这些问题,因为我们大多数人都不是 SO 的频繁用户。
If you have a vector of strings or other objects and you want to give it categorical labels, you can use the
Factor
class (available in thepandas
namespace):The factor has attributes
labels
andlevels
:This is intended for 1D vectors so not sure if it can be instantly applied to your problem, but have a look.
BTW I recommend that you ask these questions on the statsmodels and / or scikit-learn mailing list since most of us are not frequent SO users.
我正在回答 Pandas 0.10.1 的问题。 Factor.from_array 似乎可以解决问题。
I am answering the question for Pandas 0.10.1.
Factor.from_array
seems to do the trick.因为这些都不适用于维度 >1,所以我编写了一些适用于任何 numpy 数组维度的代码:
because none of these work for dimensions>1, I made some code working for any numpy array dimensionality: