通过python为选定的mysql表创建图表/统计信息
我想首先询问您关于我应该如何处理这项任务的意见,而不是简单地询问如何构建我的代码。
这就是我想要做的:我将大量数据加载到 mysql 表中,其中包含大量唯一名称+日期(即,其中日期是一个单独的字段)。我的目标是能够选择一个特定的名称(使用原始输入,也许将来添加一个下拉菜单)并查看其中一个字段(收入)的每月趋势,以及移动平均值,也许还有其他统计数据、每月收入、点击次数等)。您有什么建议 - 通过 python 将这些数据移动到 excel 工作簿,或者有没有办法在 python 中显示这些信息(当然还有与 excel 进行比较的图表)?
谢谢!
I'd like to start by asking for your opinion on how I should tackle this task, instead of simply how to structure my code.
Here is what I'm trying to do: I have a lot of data loaded into a mysql table for a large number of unique names + dates (i.e., where the date is a separate field). My goal is to be able to select a particular name (using rawinput, and perhaps in the future add a drop-down menu) and see a monthly trend, with a moving average, and perhaps other stats, for one of the fields (revenue, revenue per month, clicks, etc). What is your advice - to move this data to an excel workbook via python, or is there a way to display this information in python (with charts that compare to excel, of course)?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对此类数据(名称、日期)的分析可以视为发出临时 SQL 查询来获取时间序列信息。
您将根据数据集的大小,按日期/时间范围(日/周/月/年或更详细地按小时/分钟)对信息进行“采样”。
我经常使用这样的查询,其中日期字段被截断为采样率,在 mysql 中 DATE_FORMAT 函数对此很酷(postgres 和 oracle 分别使用 date_trunc 和 trunc)
您想在数据中看到的内容在您的 WHERE 条件中。
执行此查询并输出到 csv 文件。您可以使用直接 mysql 命令来实现此目的,但我建议创建一个执行此类查询的 python 脚本,并且您可以使用 getopt 选项进行输出格式设置(带或不带列标题、使用与默认分隔符不同的分隔符等)。甚至您可以根据某些选项动态构建查询。
要绘制此类信息,请查看时间序列工具。如果您缺少数据(此类 sql 查询结果中不会出现的日期),您应该注意选择。我认为 Excel 不是正确的选择(或者没有足够掌握它),但可以作为一个开始。
就我个人而言,我发现 dygraph,一个 javascript 库,对于时间序列绘图来说非常酷,并且它可以与 csv 文件作为源一起使用。在这种配置中要小心,由于跨域安全限制,显示 Dygraph 对象的 csv 文件和 html 页面应该位于同一服务器上(或者浏览器想要接受的任何安全限制)。
我曾经使用 django 构建这样的 web 应用程序,因为它是我最喜欢的 Web 框架,我将 url 调用包装如下:
第一个 url 调用一个视图,该视图仅输出一个模板文件,其中包含一个引用该 url 的变量以获取 csv 文件Dygraph 对象:
第二个 url 调用生成 sql 查询的视图,并将结果输出为 text/csv 以由 Dygraph 呈现。
它的“自制”可以简单或扩展,可以在任何台式计算机上轻松运行,可以扩展为输出 json 格式以供其他 javascript 库/框架使用。
另外还有与此类报告相关的开源工具(但时间序列功能通常不足以满足我的需要),例如 Pentaho、JasperReport、SOFA。您可以在此类工具中将查询作为报表内的数据源,并构建输出时间序列的图表。
我发现今天具有正确的 javascript 库/框架的 Web 技术确实开始正确地挑战这种经典 BI 工具的旧报告方式,并且它使事情具有交互性:-)
Analyze of such data (name,date) could be seen as issuing ad-hoc SQL queries to get timeseries information.
You will 'sample' your information by a date/time frame (day/week/month/year or more detailled by hour/minute) depending of how large is your dataset.
I often use such query where the date field is truncate to the sample rate, in mysql DATE_FORMAT function is cool for that (postgres and oracle use date_trunc and trunc respectivly)
What you want to see in your data is in your your WHERE conditions.
execute this query and output to a csv file. You could use direct mysql commands for that, but I recommend to make a python script that execute such query, and you can use getopt options for output formatting (with or without columns headers, use different separator than default one, etc). And even you can build dynamically the query based on some options.
To plot such information, look at time series tools. If you have missing data (date that won't appears in result of such sql query) you should take care for the choice. Excel is not the correct one for that, I think (or not master enough it), but could be a start.
Personaly I found dygraph, a javascript library, really cool for time series plotting, and it can be used with a csv file as source. Careful in such configuration, due to crossdomain security constraint, the csv file and html page that display the Dygraph object should be on the same server (or whatever the security constraint of your browser want to accept).
I used to build such webapp using django, as it's my favourite web framework, where I wrap url call as this :
The first url call a view that simply output a template file with a variable that reference the url to get the csv file for the Dygraph object :
The second url call a view that generate the sql query and output the result as text/csv to be rendered by Dygraph.
It's "home made" could stand simple or be extended, run easily on any desktop computer, could be extended to output json format for use by others javascript libraries/framework.
Else there is tool in opensource, related to such reporting (but timeseries capabilities are often not enough for my need) like Pentaho, JasperReport, SOFA. You make the query as datasource inside a report in such tool and build a graph that output timeserie.
I found that today web technique with correct javascript library/framework is really start to be correct to challenge that old fashion of reporting by such classical BI tools and it make things interactive :-)
您的问题可以分为两个主要部分:分析数据并呈现数据。我假设您已经知道如何进行数据分析部分,并且想知道如何呈现它。
这似乎是一个特别适合网络应用程序的问题。您有理由想要避免这种情况吗?
如果您对网络编程和一般编程非常陌生,那么类似 web2py 可能是一种简单的方法开始吧。这里有一个简单教程。
对于桌面数据库密集型应用程序,请查看 dabo。它使得在数据库表上创建视图之类的事情变得非常简单。它所构建的 wxpython 也有许多简单的图形功能。
Your problem can be broken down into two main pieces: analyzing the data, and presenting it. I assume that you already know how to do the data analysis part, and you're wondering how to present it.
This seems like a problem that's particularly well suited to a web app. Is there a reason why you would want to avoid that?
If you're very new to web programming and programming in general, then something like web2py could be an easy way to get started. There's a simple tutorial here.
For a desktop database-heavy app, have a look at dabo. It makes things like creating views on database tables really simple. wxpython, on which it's built, also has lots of simple graphing features.