有人可以解释一下数据挖掘、SSIS、BI、ETL 等相关技术吗?
昨天我和一位同事谈论了一个情况,他使用 SSIS(或类似的东西)通过 SSIS 包做了一些非常酷的事情,他在其中传递了“雷金纳德·威廉姆斯博士”这样的名字。 基于某种加权方案,系统足够聪明,能够弄清楚如何对其进行标记并将其存储在数据库中,如“称呼 - 名字 - 姓氏 - 后缀”。 他抛出了一些流行语,比如 BI、SSIS、ETL 和数据挖掘。 我真的想要更多信息,但什至不知道从哪里开始问。
我是一名 .Net 开发人员,精通 C#、Vb.Net、WPF 等...,但我不知道这些技术是什么,如何将它们添加到我的技能集中,以及它是否是我的技能。我确实应该集中精力。 任何和所有方向都会有帮助。
I was talking with a co-worker yesterday regarding a situation where he used SSIS (or something like that) to do some really cool thing with an SSIS Package where he passed in a name like "Dr. Reginald Williams, PhD." and based on some weighting scheme the system was smart enough to figure out how to tokenize it and store it in the database as "Salutation- First Name - Last Name - Suffix". He threw out some buzzwords like BI, and SSIS, ETL, and Data mining. I really wanted more information, but didn't even know where to begin to ask.
I'm a .Net developer and thoroughly versed in C#, Vb.Net, WPF, etc..., but I have no idea what these technologies are, how to add them to my skill set, and whether or not it's something that I really should be focusing on. Any and all direction would be helpful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
SSIS == SQL Server Integration Services,它是一个提取转换和加载(ETL)工具,它是 SQL7、SQL2K 时代的数据转换服务或 DTS 的高级实现。 它是表达工作流程的绝佳工具,其中数据从 A 点移动到 B 点(以及 c 和 d 等),并在该过程中经历变化,例如合并到非规范化设计或数据清理。
BI 或商业智能是科技界整个类别的代名词,现在是一个很好的地方。 BI 技能非常有价值且很难获得,造成这种情况的原因之一是很难在实验室中重现真实的 BI 案例,因此教学几乎总是在现实世界中进行。
从高层次来看,BI 项目通常涉及报告终点。 作为开发人员,我们常常习惯于编写事务性报告,例如 PO 的详细信息,但 BI 可以编写非常广泛的报告,涵盖数十年来的产品销售趋势并处理数亿条记录。 我们为应用程序设计数据库的方式对于此类报告并不理想,因此发明了其他工具和技术并在 BI 领域使用。 这些是像立方体这样的东西,您经常听到称为 OLAP 立方体。 OLAP 多维数据集通常源自数据仓库,该数据仓库只不过是另一个数据库 - 但典型的仓库包含来自多个(通常是数十个)其他应用程序数据库的数据。 您的库存应用程序、采购应用程序、人力资源应用程序和一大堆其他应用程序都包含创建业务完整图景的点滴数据,BI 架构师将使用 SSIS 之类的工具从所有这些系统中提取数据,对其进行处理并将其存储在数据仓库中,该数据仓库采用不同类型的设计,更好地进行报告。 一旦数据进入仓库,他将使用分析服务在该数据上创建多维数据集,并使用报告服务之类的工具向您显示该数据的报告。
编辑:抱歉,忘记了数据挖掘,它是另一个非特定术语,描述概念或过程,而不是工具。 在一个简单的例子中,它是一种识别数据模式的系统方法。 在过去,良好的业务分析会通过数据查找趋势,但对于现代数据库,您正在谈论的数据集太大而无法手动梳理 - 数据挖掘允许您指示计算机梳理该数据并识别感兴趣的模式。
希望有帮助
SSIS == SQL Server Integration Services and it is an Extract Transform and Load (ETL) tool, it is a far superior implementation of what was Data Transformation Services or DTS in SQL7, SQL2K era. It is a great tool for expressing workflow processes wherein data is moved from point A to point B (and c and d etc) and undergoes changes through that process such as consolidation to a denormalized design or data cleansing.
BI or Business Intelligence is a moniker for a entire category in the tech world and it is a great place to be right now. BI skills are very valued and hard to come by, one of the reasons this is the case is that it is hard to recreate a true BI case in a lab so teaching is almost always done in a real world situation.
From a high level, BI projects usually involve an end point of reporting. Often times as developers we are used to transactional report writing such as the details of a PO but BI can get into very broad reports that cover product sales trends over decades and deal with hundreds of millions of records. The way we design databases for applications is not ideal for this kind of reporting so other tools and technologies were invented and are used in the BI space. These are things like Cubes which you often hear called OLAP cubes. OLAP cubes usually originate from a data warehouse which is nothing more than another database - but typical warehouses contain data that came from more than one, and often dozens of other application databases. Your inventory app, purchasing app, HR app and a whole bunch of others all contain bits and pieces of data that create a complete picture of the business, a BI architect will use something like SSIS to pull the data from all these systems, massage it and store it in the data warehouse which is designed with a different kind of design better for reporting. Once it is in the warehouse he will use Analysis services to create cubes on that data and something like Reporting Services to show you reports over that data.
Edit: sorry, forgot Data Mining, it is another non-specific term that describes and concept or a process and not so much a tool. In a simple example, it is a methodical approach to identifying patterns in data. In the past a good business analysy would look through data for trends but with modern databases you are talking about datasets way too large to manually comb through - Data mining allows you to instruct the computer to comb through that data and identify patterns that are of interest.
Hope that helps
您的同事所做的事情可能更适合描述为字符串的“智能解析”。 这可以在许多复杂程度上完成——例如,使用统计模型来告诉你“博士”的可能性。 是称呼而不是名字。 或者它可以只使用常见称呼的简单查找列表,在这种情况下,它只是常规的程序代码,仅此而已。
SSIS 是 SQL Server 集成服务的缩写。 它基本上是DTS 的增强版; 有些人喜欢它,有些人讨厌它。 单独使用它来做你正在谈论的那种事情是很棘手的; 它主要用于从各种来源获取数据并将其组合、转换并将其加载到其他地方。 它可以做一些漂亮的事情,其中许多都类似于数据挖掘,但最终它是一种用于向一个方向或另一个方向填充数据的生产工具。 它在数据挖掘社区中并没有特别受到尊重。
数据挖掘是一门完整的学科,专注于使用一些(通常是大量)数据来预测未来的答案或更好地理解现有数据中的模式。 这绝对是一个值得进入的好领域,但如果不深入研究数学和算法,你就无法轻松入门。 关于这个主题的一本好书是这本书。
“商业智能”实际上更像是一个流行词,而不是一种特定的技术,并且对不同的人来说可能有不同的含义。 从根本上来说,这个想法建议对业务数据做更少的愚蠢的事情,通常它指的是对一段时间内的趋势进行分析,通常使用 OLAP。 它还可能包括数据挖掘或人工智能算法,但由于没有严格的定义,几乎任何想向你推销东西的人都会告诉你它提供“商业智能”,并希望你不要进一步挖掘。
What your coworker did might be better described as "intelligent parsing" of a string. That could be done at many levels of sophistication -- for example, using statistical models to give you the likelihood that "Dr." is a salutation and not a first name. Or it could just use a simple lookup list of common salutations, in which case it's just regular procedural code, nothing more.
SSIS is short for SQL Server Integration Services. It's basically DTS on steroids; some people love it, and some people hate it. It'd be tricky to use that by itself to do the kind of thing you're talking about; it's mainly just for taking data from various sources and combining it, transforming it, and loading it somewhere else. It can do some nifty things, many of which tend to be data-mining like, but ultimately it's a production tool for cramming data one direction or another. It isn't particularly well respected in the data mining community.
Data Mining is an entire academic discipline, focused on using some (typically large) quantity of data to either predict future answers or better understand patterns in existing data. It's definitely a great area to get into, but not something you can just pick up and do without some intensive study of math and algorithms. A good book on the subject is this one.
"Business Intelligence" is really more of a buzzword than a specific technology, and can mean different things to different people. At base, the idea suggests doing less dumb stuff with business data, and generally it refers to analysis of trends over time, often using OLAP. It may also include data mining or AI algorithms, but since there's no rigorous definition, just about anybody who wants to sell you something will tell you it offers "Business Intelligence", and hope you don't dig any further.
SSIS 是 SQL Server 集成服务,对于执行 ETL(提取、转换和加载)很有用),它们是许多数据仓库/商业智能解决方案的前端,这些解决方案将数据集成到易于使用的使用维度模型。 SSIS 对于小型项目也很有用,可以作为加载遗留数据或来自其他存储库或文件的数据的便捷方法。
数据挖掘通常意味着使用来自集成源的数据来推断那些不明显的信息事务数据(通过集成多个源,为数据提供更多“维度”。BI
是一个很大的主题,因此除非您想进入该领域,否则可能不需要关注它,但 SSIS 对于较小的项目和无论如何都值得学习。
SSIS is SQL Server Integration Services and is useful for doing the ETL (Extract, Transform, and Load) that are the front end of many data warehousing/business intelligence solutions that integrate data into easy to use dimensional models. SSIS is also useful for smaller projects as a convenient way to load legacy data or data from other repositories or files.
Data mining usually implies using the data from the integrated sources to infer information that would not be obvious from transactional data (via the integration of multiple sources giving more "dimensions" to the data.
BI is a huge topic so it may not be something to focus on unless you want to get into that field, but SSIS can be useful on smaller projects and is worth learning about in any event.