将列表划分为命名子阵列

发布于 2025-01-22 20:29:40 字数 512 浏览 1 评论 0原文

测试列车的拆分阵列

本质上，我试图将PANDAS DataFrame转换为Numpy阵列，以便我可以通过测试/火车运行它。

我的目的是将列分为用于运行测试训练的因变量和独立变量的组。

我能够将数据框架转换为有效的列表阵列，从而

x = df.values

为我提供了每个行中每个值的列表的列表。

如果我要在此数组上使用np.split（）尝试分组，则只会将某些行分组在一起，而不是按列值分组。

我打算做的最简单的例子（使用已经具有的IRIS数据集而不是我的插图）看起来像这样：

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=0)

数据和目标是数据集IRIS的子阵列。我如何将我的一个列表数列为列表的多个列出的子阵列？

原文

Splitting Arrays for Test Train

Essentially I am attempting to convert a pandas dataframe into numpy arrays so that I can run it through a Test/Train.

My goal here is to split the columns into groups of dependent and independent variables on which to run the test-train.

I am able to convert the dataframe into an array of lists with

x = df.values

This effectively gives me a list of a list of every value in every row.

If I were to use np.split() on this array to try to divide into groups, it would only group certain rows together, and not split by the column values.

The simplest example of what I aim to do (Using the already sectored iris dataset as opposed to mine) looks like this:

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=0)

with data and target being sub-arrays of the dataset iris. How can I turn my one array of lists, into multiple named sub-arrays of lists?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

围归者 2025-01-29 20:29:40

我最终将其保留为大熊猫的数据框架，JSUT将列分解为两个独立的新数据框架，

df2 = df.iloc[: , 1:]

features = list(df2.columns[1:18])

df2 = df2.dropna()

df_x = df2[['Vehicle']] df_y = df2[features]

target = df_x.values data = df_y.values


X_train, X_test, y_train, y_test = train_test_split(data, target,test_size=0.2)


train = xgb.DMatrix(X_train, label=y_train) test = xgb.DMatrix(X_test, label=y_test)

我过度复杂化了。谢谢大家的帮助

I ended up keeping it as a pandas data frame and jsut broke up the columns into two separate new data frames

df2 = df.iloc[: , 1:]

features = list(df2.columns[1:18])

df2 = df2.dropna()

df_x = df2[['Vehicle']] df_y = df2[features]

target = df_x.values data = df_y.values


X_train, X_test, y_train, y_test = train_test_split(data, target,test_size=0.2)


train = xgb.DMatrix(X_train, label=y_train) test = xgb.DMatrix(X_test, label=y_test)

I was overcomplicating things. Thank you everyone for your help

回复收藏 0 原文

~没有更多了~