C# data-analysis data-mining neural-network

在不使用分析服务的情况下将一组具有离散和连续数据值的数据分割为两组中的一组？

发布于 2024-09-11 02:14:51 字数 747 浏览 13 评论 0原文

假设我有一个具有以下方案的表（注意：此示例是假设的，但实际用例是相似的）。

Type      | Name         | Notes
=====================================================================================
Gender    | Gender       | Either Male or Female (not null)
GeoCoord  | Location     | Lattitude and longitude coordinates
string    | FullName     | 
Date      | BirthDate    | 
bool?     | LikesToParty | Data from a survey (null for people who didn't answer)

手动查看数据，我知道 LikesToParty 和其他值的某些特定配置之间存在很强的相关性。例如，中间名以 Wells 为中间名、年龄在 15 至 30 岁之间且来自洛杉矶地区的男性几乎肯定在 LikeToParty 中具有真实性。我想预测 LikesToParty 对于未回答调查的用户的价值。

如何使用 C# 挖掘这些数据，而无需购买分析服务等昂贵的软件包？有没有免费的 C# 库？

我已经制作了一个神经网络，它能够完成我在上面的示例中描述的大部分功能，但训练速度非常慢，而且我不确定这是否是正确的方法。也许有更好、更有效的方法来分割数据？

原文

Say I have a table with the following scheme (note: this example is hypothetical, though the real use case is similar).

Type      | Name         | Notes
=====================================================================================
Gender    | Gender       | Either Male or Female (not null)
GeoCoord  | Location     | Lattitude and longitude coordinates
string    | FullName     | 
Date      | BirthDate    | 
bool?     | LikesToParty | Data from a survey (null for people who didn't answer)

Manually looking at the data I know there is a strong correlation between LikesToParty and certain specific configurations of the other values. For example, men who have Wells as their middle name and who are between 15 and 30 years old and who comes from the LA area almost certainly has true in LikeToParty. I would like to predict the value of LikesToParty for users that didn't answer the survey.

How do I mine this data using C# without having to buy an expensive package like analysis services? Are there any free libraries for c#?

I've already made a neural network that is capable of most of what I describe in my example above, but it is extremely slow to train and I'm not sure about if this is the right way to go. Maybe there is a better, more efficient, way to segment the data?

分享到QQ

分享到微博