调查统计图思路

发布于 2024-10-09 16:49:58 字数 503 浏览 6 评论 0原文

我有一些主题调查和图表的家庭作业。第一个任务是标准化调查的输入,因为数据的结构会不时发生变化。 因此,调查分为三种类型:

  • 静态字段,其中存储文本
  • 动态字段,用户可以选择一个选项
  • 和多选字段,用户可以选择多个选项

所以我不是一个真正的统计人员,所以我真的不知道我可以用传入的数据做什么。

因此,我拥有的数据存储在一个巨大的 XML 文件中,从那里我可以轻松地获取调查被填写的次数以及字段被填写的次数,这样我就可以(例如在饼图上显示填充的关系)或未填写)。 第二个想法是使用条形图等来显示多选项元素的内容之间的关系。

对于多选项元素,我想到了显示一个选项的含义的数据。 但问题是,能展示什么?

另一个问题是静态元素(文本字段等)。单个字段可以表示哪些数据?

XML 字段中的数据是从 2001 年到 2005 年收集的,所以也许我可以使用调查日期,但正如我所说,我真的不知道如何处理数据,如何收集尽可能多的数据,如何创建图表数量非常多。

I've got some homework tasks in topic surveys and diagrams. The first task is to normalize the input of a survey, because the structure of the data is changing from time-to-time.
So there are three types of surveys:

  • static fields, where text is stored
  • dynamic ones, where the user can select one option
  • and multiselect fields, where the user can select multiple options

So I'm not really a statistics guy, so I have really no idea what I can do with that incomming data.

So the data I have is stored in an huge XML file from there I can easily get how man times a survey was filled, and how many times a field was filled, so I can (for eg on a pie chart show the relation of filled or not filled).
The second idea is to show the relation between the content of a multi option element using a bar chart or so.

In case of the multi option elements I've got the idea to show data in implication of one option.
But the question is, what could be shown?

The other problem are the static elements (text fields and so). What data could be represented from a single field?

The data in the XML field is collected from 2001 to 2005 So maybe I can work with the dates of the surveys, but as I said, I don't really know how to process the data, to collect as much as possible, to create a really great amount of diagrams.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

心安伴我暖 2024-10-16 16:49:59

我不会推荐饼图。请改用条形图或点图。区分其中不同类别的相似频率要容易得多。按类别中出现的频率对类别进行排序几乎总是一个好主意。在这里您可以找到一篇关于为什么 饼图很糟糕

3D 图表可能看起来不错,并且是一种给对数据可视化知之甚少的人留下深刻印象的好方法(这可能正是您所需要的;-))。但大多数专家认为它们是不好的做法,因为使用并非绝对需要的额外维度会分散读者对实际数据的注意力。

就我个人而言,我认为交叉表和散点图是非常不言自明的方式显示数据的两个维度之间的关系。

报告一些基本统计数据和图表通常是个好主意,但请确保仅在适当的时候才这样做。如果您想了解哪些单变量统计数据(例如均值),请参阅这篇维基百科文章适合哪些数据。

如果您真的想了解数据可视化,我强烈推荐 书籍爱德华·塔夫特(Edward Tufte)关于这个话题。阅读它们真的很愉快。虽然它们有着坚实的科学基础,但即使没有或很少有该领域的背景,它们也很容易理解。

祝你好运,
亚历克斯

I would not recommend pie charts. Use bar charts or dot plots instead. It is much easier to distinguish similar frequencies for different categories in them. Ordering the categories by frequencies in them is almost always a good idea, too. Here you can find a short article about why Pie Charts Are Bad.

3D diagrams might look nice and are a neat way of impressing people with little knowledge about data visialization (which might be what you need ;-) ). But they are considered bad practice by most experts, because using extra dimensions, which are not absolutely needed, distracts the reader from the actual data.

Personally I think that crosstables and scatter plots are pretty self-explanatory ways of displaying relationships between two dimensions of data.

It is often a good idea to report some basic statistics along with diagrams, but make sure you only do this when it's appropriate. See this Wikipedia article if you want to learn, which univariate statistics (like a mean for example) are appropriate for which data.

If you seriously want to learn about data visualization, I can highly recommend the books by Edward Tufte about the topic. They are really a pleasure to read. While they stand on a solid scientific base, they are easy to understand, even with little or no background in the field.

Good luck,
Alex

乖乖 2024-10-16 16:49:58

规范化数据后(这可能比工作的可视化部分更困难),您可以执行以下操作:

  • 要显示包含文本的静态字段,您可以将这些文本重新编码为包含较少类别的新变量,这可以适合在图表/绘图上。另一种方法是制作文本的文字云 - 就像 wordle.net 上的那样。
  • 每个动态字段仅包含一个答案,可能是最容易显示的。您可以制作饼图来显示属性的百分比,或者制作条形图来显示百分比/密度或频率(例如,请参阅 ggplot2 R 中的包)。
  • 要显示多选字段,您应该将日期重新构造为适当的格式(我不知道它现在是什么样子)。这可以通过不同的表来完成,这些表显示所有变量中每个类别的计数(频率)。例如:昨天有 187 人吃了巧克力,160 人吃了面包,50 人吃了披萨。然后,您可以轻松地通过条形图显示这些值。请注意:这些值的总和将不等于样本大小,因为任何人都可以选择多个值,因此饼图将是一个非常糟糕的选择。

我希望我能帮忙。

After normalizing your data (which could be more difficult than the visualizing part of your job), you might do the followings:

  • To show your static fields, which contains texts, you could recode theese text to new variables containing fewer categories, which could fit on a graph/plot. Another way is to make word clouds of the texts - like on wordle.net.
  • The dynamic fields, each containing only one answere, could be the easiest to display. You could make a pie chart to show the percentages of the attributes, or rather a bar chart which could also display percentages/densities or frequencies (e.g. see the ggplot2 package in R).
  • To display the multiselect fields, you should restructure the date to the appropriate format (I do not know how it looks like now). This could be done by different tables, that show the counts (frequencies) for every categories in all variables. E.g.: 187 people ate choclate, 160 ate bread and 50 people ate pizza yesterday. You could then show the values by a barchart easily. Look out for: the sum of theese vaules will not equal to the sample size, as anyone could select multiple values, so a pie chart would be a really bad choice.

I hope I could help.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文