选择 BI 模块的策略

发布于 2024-10-05 14:48:39 字数 1710 浏览 3 评论 0原文

我工作的公司生产一个内容管理系统(CMS),其中包含用于出版、电子商务、在线打印等的各种附加组件。我们现在正在添加“报告模块”,我需要研究应该采取哪种策略被跟随。 “报告模块”也称为“商业智能”或 BI。

该模块应该能够跟踪项目下载、执行的搜索并从中生成各种报告。实际上,搅动什么样的数据并不重要,因为从长远来看,我们可能希望能够推送我们认为需要的任何数据并从中获取报告。

粗略地说,我们有两个选择。

选项1是编写一个基于Apache Solr的解决方案(具体来说,使用https://issues.apache.org/jira/browse/SOLR-236)。这种方法的优点:

  • 免费/开源/质量好,
  • 我们在其他地方使用 Solr/Lucene,因此我们非常了解该领域,
  • 对索引内容具有完全的灵活性,因为我们可以获取传入数据(XML 格式),通过 XSLT 推送它并提供数据它赋予 Solr
  • 如何显示搜索结果的完全灵活性。与上面的步骤类似,我们可以拥有自定义 XSLT 搜索模板,并以我们认为必要的任何格式显示结果,
  • 我们的前端开发人员精通 XSLT,因此为不同的客户调整此机制应该相对容易
  • Solr 提供实时/全文/分面搜索这对我们来说是绝对必要的。快速原型(基于 Solr,1M 条记录)能够在 55 毫秒内提供搜索结果。我们估计的最大记录约为 10 亿行(这对于典型的 BI 应用程序来说并不是很多),如果情况变得更糟,我们可以随时查看 SolrCloud 等。
  • 有些公司使用 Solr 做非常类似的事情(Honeycomb Lexicon ,例如)

这种方法的缺点:

  • SOLR-236可能稳定也可能不稳定,而且,目前还不清楚它何时/是否会作为正式版本的一部分发布
  • ,可能会有一些我们必须编写的东西使一些特定于 BI 的功能发挥作用。这听起来有点像重新发明轮子,
  • 最大的问题是我们不知道将来可能需要什么(例如与某些 BI 软件集成、导出到 Excel 等)

选项 2Wabit,并将查看 QlikView,可能还有其他。这种方法的优点:

  • 不需要重新发明轮子,软件(希望)经过尝试和测试
  • 将节省我们的时间,我们可以花在解决我们专门解决的问题上。

缺点:

  • 由于我们是一家 Java 商店,并且我们的解决方案是跨平台的,所以我们' d 必须消除市场上的许多选项
  • 我不确定 BI 软件有多灵活。需要花一些时间来浏览一些 BI 产品,看看它们是否可以进行灵活的索引、实时/全文搜索、完全可定制的结果等。
  • 我被告知开源 BI 产品还不够成熟,而商业 BI(SAP、其他的)花费了很多钱,他们的许可证起价为数万英镑/美元。虽然我本身并不反对商业选择,但它会增加总体价格,很容易变得太大,
  • 不确定 BI 在处理无模式数据方面的表现如何,

我绝对不是寻找市场上最合适的集成选项(主要是因为缺乏 BI 领域的知识),但是需要快速做出决定。

有没有人遇到过类似的情况,并且可以建议采取哪条路线,或者更好地建议选项#2 可能的优点/缺点?这里最大的问题是我不知道我不知道什么;)

The company I work for produces a content management system (CMS) with different various add-ons for publishing, e-commerce, online printing, etc. We are now in process of adding "reporting module" and I need to investigate which strategy should be followed. The "reporting module" is otherwise known as Business Intelligence, or BI.

The module is supposed to be able to track item downloads, executed searches and produce various reports out of it. Actually, it is not that important what kind of data is being churned as in the long term we might want to be able to push whatever we think is needed and get a report out of it.

Roughly speaking, we have two options.

Option 1 is to write a solution based on Apache Solr (specifically, using https://issues.apache.org/jira/browse/SOLR-236). Pros of this approach:

  • free / open source / good quality
  • we use Solr/Lucene elsewhere so we know the domain quite well
  • total flexibility over what is being indexed as we could take incoming data (in XML format), push it through XSLT and feed it to Solr
  • total flexibility of how to show search results. Similar to step above, we could have custom XSLT search template and show results back in any format we think is necessary
  • our frontend developers are proficient in XSLT so fitting this mechanism for a different customer should be relatively easy
  • Solr offers realtime / full text / faceted search which are absolutely necessary for us. A quick prototype (based on Solr, 1M records) was able to deliver search results in 55ms. Our estimated maximum of records is about 1bn of rows (this isn't a lot for typical BI app) and if worse comes to worse, we can always look at SolrCloud, etc.
  • there are companies doing very similar things using Solr (Honeycomb Lexicon, for example)

Cons of this approach:

  • SOLR-236 might or might not be stable, moreover, it's not yet clear when/if it will be released as a part of official release
  • there would possibly be some stuff we'd have to write to get some BI-specific features working. This sounds a bit like reinventing the wheel
  • the biggest problem is that we don't know what we might need in the future (such as integration with some piece of BI software, export to Excel, etc.)

Option 2 is to do an integration with some free or commercial piece of BI software. So far I have looked at Wabit and will have a look at QlikView, possibly others. Pros of this approach:

  • no need to reinvent the wheel, software is (hopefully) tried and tested
  • would save us time we could spend solving problems we specialize in

Cons:

  • as we are a Java shop and our solution is cross-platform, we'd have to eliminate a lot of options which are in the market
  • I am not sure how flexible BI software can be. It would take time to go through some BI offerings to see if they can do flexible indexing, real time / full text search, fully customizable results, etc.
  • I was told that open source BI offers are not mature enough whereas commercial BIs (SAP, others) cost fortunes, their licenses start from tens of thousands of pounds/dollars. While I am not against commercial choice per se, it will add up to the overall price which can easily become just too big
  • not sure how well BI is made to work with schema-less data

I am definitely not be the best candidate to find the most approprate integration option in the market (mainly because of absence of knowledge in BI area), however a decision needs to be done fast.

Has anybody been in a similar situation and could advise on which route to take, or even better - advise on possible pros/cons of the option #2? The biggest problem here is that I don't know what I don't know ;)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

花开雨落又逢春i 2024-10-12 14:48:39

我花了一些时间玩 QlikViewWabit,不得不说,我非常失望。

我曾期望整个 BI 行业实际上有一些科学依据,但从我发现这只是一个流行词。 这篇 MSDN 文章实际上令人大开眼界。 BI 的整个业务包括从良好标准化的模式(他们称之为OLTP)中获取数据,并将其放入不太标准化的模式(OLAP雪花- 星型)并为您想要的每个方面创建索引(行业术语是数据立方体)。剩下的只是一些脚本来获得漂亮的图表。

好吧,我知道我在这里把事情过于简单化了。我知道我可能错过了许多不同的方面(好的报告?导出到 Excel?预测?),但从计算机科学的角度来看,我根本看不到除了数据库索引之外的任何内容。

有人告诉我一些 BI 工具支持压缩。 Lucene 也支持这一点。有人告诉我,一些 BI 工具能够将所有索引保留在内存中。为此,有一个 Lucene 缓存。

说到两个候选者(Wabit 和 QlikView) - 第一个根本不成熟(当我试图超越他们演示中建议的内容时,我遇到了几十个例外),而另一个只能在 Windows 下工作(不是很好,但是我可以忍受这一点)并且集成可能需要我编写一些 VBScript(恶心!)。我不得不在 QlikView 论坛上花费几个小时,只是为了让一个简单的日期范围控件正常工作,但失败了,因为我的个人版不支持其网站上提供的可下载演示项目。不要误会我的意思,它们都是很好的工具,就其构建目的而言,但我根本不认为与它们进行集成有任何意义,因为我不会获得太多。

为了解决 Solr 的(有争议的)不成熟问题,我将定义一个抽象 API,以便在出现问题时可以将所有数据移至支持全文查询的数据库。如果情况变得更糟,如果需要的话,我总是可以在 Solr/Lucene 之上编写东西。

I have spent some time playing with both QlikView and Wabit, and, have to say, I am quite disappointed.

I had an expectation that the whole BI industry actually has some science under it but from what I found this is just a mere buzzword. This MSDN article was actually an eye opener. The whole business of BI consists of taking data from well-normalized schemas (they call it OLTP), putting it into less-normalized schemas (OLAP, snowflake- or star-type) and creating indices for every aspect you want (industry jargon for this is data cube). The rest is just some scripting to get the pretty graphs.

OK, I know I am oversimplifying things here. I know I might have missed many different aspects (nice reports? export to Excel? predictions?), but from a computer science point of view I simply cannot see anything beyond a database index here.

I was told that some BI tools support compression. Lucene supports that, too. I was told that some BI tools are capable of keeping all index in the memory. For that there is a Lucene cache.

Speaking of the two candidates (Wabit and QlikView) - the first is simply immature (I've got dozens of exceptions when trying to step outside of what was suggested in their demo) whereas the other only works under Windows (not very nice, but I could live with that) and the integration would likely to require me to write some VBScript (yuck!). I had to spend a couple of hours on QlikView forums just to get a simple date range control working and failed because the Personal Edition I had did not support downloadable demo projects available on their site. Don't get me wrong, they're both good tools for what they have been built for, but I simply don't see any point of doing integration with them as I wouldn't gain much.

To address (arguable) immatureness of Solr I will define an abstract API so I can move all the data to a database which supports full text queries if anything goes wrong. And if worse comes to worse, I can always write stuff on top of Solr/Lucene if I need to.

笑饮青盏花 2024-10-12 14:48:39

如果您确实处于不确定自己不知道什么的情况,我认为最好在深入实施自己的实现之前探索开源工具并评估其有用性。使用开源解决方案很可能会帮助您进一步明确自己的理解和所需的功能。
我之前曾使用过一个名为 Pentaho 的开源解决方案。我认真地感觉到,通过学习使用 Pentaho 的功能,我对自己的理解有了更多的了解。当然,与使用大多数开源解决方案的情况一样,Pentaho 一开始似乎有点令人生畏,但我在一个月的时间内就成功地掌握了它。我们还使用了 Kettle ETL 工具和 Mondrian 立方体 - 我认为当今大多数严肃的 BI 工具都构建在其之上。
早些时候,所有这些组件都是独立的,但后来我相信 Pentaho 拥有了所有这些项目的所有权。

但是,一旦您对自己需要什么和不需要什么充满信心,我建议您在蒙德里安实现的基础上构建一些自己的基本报告工具。定制一个复杂的开源工具确实是一个大问题。此外,还有一些许可证需要警惕。我相信 Pentaho 是 GPL,尽管你可能想检查一下。

If you're truly in a scenario where you're not sure what you don't know i think it's best to explore an open-source tool and evaluate its usefulness before diving into your own implementation. It could very well be that using the open-source solution will help you further crystallise your own understanding and required features.
I had worked previously w/ an open-source solution called Pentaho. I seriously felt that I understood a whole lot more by learning to use Pentaho's features for my end. Of course, as is the case of working w/ most of the open-source solutions, Pentaho seemed to be a bit intimidating at first, but I managed to get a good grip of it in a month's time. We also worked with Kettle ETL tool and Mondrian cubes - which I think most of the serious BI tools these days build on top of.
Earlier, all these components were independent, but off-late i believe Pentaho took ownership of all these projects.

But once you're confident w/ what you need and what you don't, I'd suggest building some basic reporting tool of your own on top of a mondrian implementation. Customising a sophisticated open-source tool can indeed be a big issue. Besides, there are licenses to be wary of. I believe Pentaho is GPL, though you might want to check on that.

衣神在巴黎 2024-10-12 14:48:39

首先你应该明确你的报告应该显示什么。您需要哪种报告功能?您想要哪种输出格式?您想要在浏览器 (HTML) 中、以 PDF 形式或使用交互式查看器 (Java/Flash) 显示它。数据在哪里(数据库、Java 等)?您需要临时报告还是只需要一些硬编码报告?这只是一些问题。

如果没有这个问题的答案,很难给出真正的建议,但我的一般建议是i-net Clear Reports(以前称为 i-net Crystal-Clear)。它是一个Java工具。它是一个商业工具,但成本比 SAP 和 co 低。

First you should make clear what your reports should show. Which reporting feature do you need? Which output formats do you want? Do you want show it in the browser (HTML) or as PDF or with an interactive viewer (Java/Flash). Where are the data (database, Java, etc.)? Do you need Ad-Hoc reporting or only some hard coded reports? This are only some questions.

Without answers to this question it is difficult to give a real recommendation, but my general recommendation would be i-net Clear Reports (used to be called i-net Crystal-Clear). It is a Java tool. It is a commercial tool but the cost are lower as SAP and co.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文