Python,针对频繁模式的网络日志数据挖掘
我需要开发一个用于网络日志数据挖掘的工具。
由于在特定用户会话中请求了许多 url 序列(从 Web 应用程序日志中检索),我需要找出网站用户的使用模式和组(集群)。
我是数据挖掘的新手,现在经常检查谷歌。 找到一些有用的信息,即查询 Web 日志数据中的频繁模式挖掘 似乎指出了几乎完全相同的研究。
所以我的问题是:
- 是否有任何基于 python 的工具可以满足我的需要,或者至少有类似的功能?
- Orange 工具包有什么帮助吗?
- 阅读集体智慧编程这本书有什么帮助吗?
- 谷歌搜索什么,读什么,最好使用哪些相对简单的算法?
我的时间非常有限(大约一周),所以任何帮助都非常宝贵。我需要的是为我指明正确的方向以及如何在最短的时间内完成任务的建议。
提前致谢!
I need to develop a tool for web log data mining.
Having many sequences of urls, requested in a particular user session (retrieved from web-application logs), I need to figure out the patterns of usage and groups (clusters) of users of the website.
I am new to Data Mining, and now examining Google a lot.
Found some useful info, i.e. querying Frequent Pattern Mining in Web Log Data seems to point to almost exactly similar studies.
So my questions are:
- Are there any python-based tools that do what I need or at least smth similar?
- Can Orange toolkit be of any help?
- Can reading the book Programming Collective Intelligence be of any help?
- What to Google for, what to read, which relatively simple algorithms to use best?
I am very limited in time (to around a week), so any help would be extremely precious. What I need is to point me into the right direction and the advice of how to accomplish the task in the shortest time.
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
1&2:Orange有频繁模式挖掘模块。它还支持集群。
3.我刚刚查了一下这本书的内容。没有关于频繁模式挖掘的章节。无论如何,对于数据挖掘初学者来说,这通常是一本好书。您会发现它对于帮助您准确定义问题非常有用。
4.需要了解聚类、频繁模式挖掘/关联规则挖掘的输入和输出。所以谷歌一下这些算法,或者找一本好的数据挖掘教科书来阅读。
1&2: Orange has a frequent pattern mining module. It also supports clustering.
3.I have just check the content of the book. There is not a chapter for frequent pattern mining. Anyway, it is generally a good book for beginners in data mining. You will find it very useful to help you define your problem precisely.
4.You need to understand the input and output of clustering, frequent pattern mining/association rule mining. So google these algorithms, or find a good data mining text book to read.
Pattern 模块可能就是您正在寻找的。
http://www.clips.ua.ac.be/pages/pattern
The Pattern module might be what you are looking for.
http://www.clips.ua.ac.be/pages/pattern