读取CSV文件时出错：将列从字符串转换为float

发布于 2025-02-06 19:48:18 字数 635 浏览 2 评论 0原文

我正在尝试读取一个包含列SPTYPE的CSV文件，其中有字符串值。我的变量正在转换为对象，但我需要它是浮动类型。这是片段：

data = pd.read_csv("/content/Star3642_balanced.csv")

X_orig = data[["Vmag", "Plx", "e_Plx", "B-V", "SpType", "Amag"]].to_numpy()

这是给我错误的原因：

X = torch.tensor(X_orig, dtype=torch.float32)

错误读取“不能转换np.ndarray of type numpy.object_。唯一支持的类型是：float64，float32，float32，float16，confffers64，conffffer64，complect128，int int64，int int32，int32，int32 ，INT16，INT8，UINT8和BOOL。“

我在阅读CSV文件后尝试执行此操作，但这无济于事：

data["SpType"] = data.SpType.astype(float)

有人可以告诉我可以对此做些什么？

原文

I am trying to read a csv file that contains a column, SpType, in which there are String values. My variable is being converted into an object, but I need it to be float type.
Here's the snippet:

data = pd.read_csv("/content/Star3642_balanced.csv")

X_orig = data[["Vmag", "Plx", "e_Plx", "B-V", "SpType", "Amag"]].to_numpy()

Here's what's giving me the error:

X = torch.tensor(X_orig, dtype=torch.float32)

The error reads "can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool."

I tried doing this after reading the csv file, but it didn't help:

data["SpType"] = data.SpType.astype(float)

Can someone please tell me what can be done about this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一向肩并 2025-02-13 19:48:18

字符串应编码为数字值。最简单的方法是使用pandas单热编码（在这种情况下，这将创建许多额外的列，但是神经网络应处理那些无需付出的努力）：

ohe = pd.get_dummies(data["SpType"], drop_first=True)
data[ohe.columns] = ohe
data = data.drop(["SpType"], axis=1)

或者，您可以使用Sklearn Encoders或category_encoders库 - 更复杂的编码可能需要分别处理测试集以避免目标泄漏。

Strings should be encoded into numeric values. The easiest way would be using Pandas one-hot encoding (that will create lots of extra columns in this case, but a neural network should process those without much effort):

ohe = pd.get_dummies(data["SpType"], drop_first=True)
data[ohe.columns] = ohe
data = data.drop(["SpType"], axis=1)

Alternatively, you may use sklearn encoders or category_encoders library - more complex encoding might require to process the test set separately to avoid the target leakage.

回复收藏 0 原文

~没有更多了~