在不使用分析服务的情况下将一组具有离散和连续数据值的数据分割为两组中的一组?

发布于 2024-09-11 02:14:51 字数 747 浏览 7 评论 0原文

假设我有一个具有以下方案的表(注意:此示例是假设的,但实际用例是相似的)。

Type      | Name         | Notes
=====================================================================================
Gender    | Gender       | Either Male or Female (not null)
GeoCoord  | Location     | Lattitude and longitude coordinates
string    | FullName     | 
Date      | BirthDate    | 
bool?     | LikesToParty | Data from a survey (null for people who didn't answer)

手动查看数据,我知道 LikesToParty 和其他值的某些特定配置之间存在很强的相关性。例如,中间名以 Wells 为中间名、年龄在 15 至 30 岁之间且来自洛杉矶地区的男性几乎肯定在 LikeToParty 中具有真实性。我想预测 LikesToParty 对于未回答调查的用户的价值。

如何使用 C# 挖掘这些数据,而无需购买分析服务等昂贵的软件包?有没有免费的 C# 库?

我已经制作了一个神经网络,它能够完成我在上面的示例中描述的大部分功能,但训练速度非常慢,而且我不确定这是否是正确的方法。也许有更好、更有效的方法来分割数据?

Say I have a table with the following scheme (note: this example is hypothetical, though the real use case is similar).

Type      | Name         | Notes
=====================================================================================
Gender    | Gender       | Either Male or Female (not null)
GeoCoord  | Location     | Lattitude and longitude coordinates
string    | FullName     | 
Date      | BirthDate    | 
bool?     | LikesToParty | Data from a survey (null for people who didn't answer)

Manually looking at the data I know there is a strong correlation between LikesToParty and certain specific configurations of the other values. For example, men who have Wells as their middle name and who are between 15 and 30 years old and who comes from the LA area almost certainly has true in LikeToParty. I would like to predict the value of LikesToParty for users that didn't answer the survey.

How do I mine this data using C# without having to buy an expensive package like analysis services? Are there any free libraries for c#?

I've already made a neural network that is capable of most of what I describe in my example above, but it is extremely slow to train and I'm not sure about if this is the right way to go. Maybe there is a better, more efficient, way to segment the data?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

ぶ宁プ宁ぶ 2024-09-18 02:14:51

由于您同时使用离散数据和连续数据,因此您可能会使用决策树(C4.5、CART)。有一些针对它们的实现库;不用担心 Java 库,因为您可以使用 Java 的 IKVM 实现。例如,我使用了 C# 中的 Weka API。

Because you are using both discrete and continous data, you might use a decision tree (C4.5, CART). There are some implemented libraries for them; don't beware of Java libs, as you can use the IKVM implementation of Java. For example, I have used the Weka API from C#.

一笑百媚生 2024-09-18 02:14:51

您所描述的是机器学习中的一个标准问题,称为:数据分类。

数据分类的方法包括:神经网络(正如您提到的)、支持向量机(例如 LIBSVM)、决策树(如前面的答案中提到的)。这些类型的方法的输出虽然非常准确,但可能难以解释。您还可以查看贝叶斯网络等概率图形模型,来回答更深层次的问题,例如:南加州喜欢聚会的男性二十多岁的概率是多少。

What you describe is a standard problem in machine learning called: data classification.

Methods for data classifiation include: Neural Networks (as you mention), Support Vector Machines (see for example LIBSVM), Decision Trees (as mentioned in the previous answer). The output from these types of methods while very accurate can be difficult to interpret. You can also look probabilistic graphical models like Bayesian Networks, to answer deeper questions like: what is the probability that a male from southern California who likes to party is in his mid twenties.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文