微软商业智能。我想做的事情可能吗?
我负责分析我公司网站的日志表。该表包含给定会话的用户在整个网站上的点击路径。我的公司希望根据用户的“点击路径”来了解/发现趋势。在此过程中,根据年龄/地理位置等识别采用特定“点击路径”的用户组。
正如您从标题中可以看出的那样,我对 BI 及其功能完全陌生,所以我想知道:
- 我们的目标可以实现吗?
- 我该怎么做呢?
我目前正在网上阅读书籍以及我找到的其他电子书。所有迹象似乎都表明这可以通过序列聚类实现。尽管我目前不知道所涉及的确切实施和调整。因此,如果有人有这样的事业的第一手经验,如果你能在这里分享,我会很棒。
干杯!
I have been charged with the task of analysing the log table of my company's website. This table contains a user's click path throughout the website for a given session. My company is looking to understand/spot trends based on the 'click paths' of our users. In doing so, identify groups of users that take on a certain 'click path' based on age/geography and so on.
As you can tell from the title, I am completely new to BI and its capabilities so I was wondering:
- Are our objectives attainable?
- How should I go about doing this?
I am currently reading books online as well as other e-books I have found. All signs seem to suggest this is possible via sequence clustering. Although the exact implementation and tweaks involved are currently lost on me. Therefore, if anyone has first hand experience in such an undertaking, I would be awesome if you could share it here.
Cheers!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您正在寻找的内容称为关联规则挖掘。我对 BI 不是特别熟悉,但我建议你看一下 Weka< /a> 其中包含 Apriori 算法 及其变体的几种实现。
What you're looking for is called Association Rule Mining. I'm not particularly familiar with BI, but I suggest you take a look at Weka which contains several implementations of the Apriori algorithm and its variations.
这不会帮助您处理现有的日志文件...(但它是一种替代方案,如果您搜索答案失败)
Google Analytics 是免费的,您可以设置几个自定义变量{年龄等}并查看流量在哪里去..(您将无法看到单个用户做了什么..)不完全是您尝试做的事情,但是免费,并且可以接近您正在寻找的内容
如果您想要真正好的分析,请查看Omniture (昂贵),但它在构建复杂的网站报告方面是一流的。它被用在许多电子商务场景中,跟踪用户如何进入网站并与网站交互等等~
有很多网站分析,在“滚动”你自己的网站之前,研究一下其中的一些,它们可能会帮助你专注于自己的目标。
This wont help you with your existing log files... (but it is an alternative, if your search for an answer fails)
Google Analytics is free, and you can set up several custom variables{age,etc} and see where the traffic goes.. ( you wont be able to see what an individual user does.. ) not exactly when u are trying to do but free and can be made to be close to what ur looking for
If you want really good Analytics look into Omniture ( expensive ) but its top notch for building complex website reporting. It is used in many e-commerce scenarios tracking how a user comes in and interacts with site + much much more~
There are plenty of Website analytics out there, before "rolling" your own, look into some of them, they might help you focus in on your own goals.
似乎您可以使用 神经网络 来完成该任务。可能是感知器。
我有一些神经网络方面的经验,但我不是专家。
我强烈推荐这本书集体智能编程:构建智能 Web 2.0 应用程序。 检查即使您不懂 Python,也能轻松搞定。
Seems that you can use neural networks for that task. Possibly perceptrons.
I have some experience with neural networks but I'm not an expert.
I strongly recommend the book Programming Collective Intelligence: Building Smart Web 2.0 Applications. Check it out even if you don't know Python.
首先从开源或商业网络分析软件包(谷歌搜索)开始,因为读取网络服务器日志文件并不简单
有些允许将数据映射到其他表(您的用户表与年龄等),或者混合您自己的解决方案将 Web 会话日志与其他数据映射
除了普通的 SQL 查询将解决您的分析问题之外,例如
将原始数据加载到 BI 框架中可能不会变得更容易。将这样的查询结果加载到 BI 框架中会很有意义。
根据您的 Web 应用程序,如果 Actaul 会话具有长时间运行的会话 ID 等,或者更改会话 ID,您可能会在识别 Actaul 会话时遇到困难。如果这是一个问题,您需要将 Web 分析滚动到实际的 Web 服务器代码中,以便您可以模拟长时间运行状态并记录它
First off start with a open source or commercial web analytics software package (google up for that), as reading web server log files is non trivial
Some allow mapping data to other tables (your user table with age etc), or blend your own solution to map web session logs with other data
Other than that normal SQL queries will solve your analystics problem e.g.
Loading the raw data into a BI framework may not make it much easier. Loading the results of queries like this into a BI framework would make scense
Depending on you web application, you may have trouble identifying actaul sessions if they have long running session id's etc, or changing session id's. If that is an issues you need to roll you web analytics into the actual web server code so you can simulate long running state and record that instead