使用什么论据来解释为什么 SQL Server 比平面文件好得多
我公司的高层被好朋友告知,平面文件是正确的出路,我们所做的一切都应该从 SQL Server 切换到它们。我们拥有 300 多台服务器和数百个不同的数据库。仅从我参与的少数几个案例来看,我们就已经>其中相当多的记录有 100 亿条,每天有超过 10 万条新记录,谁知道有多少更新……我和其他几个人需要给出一个回应,说明为什么我们不应该这样做。我们的大部分内容都是 ASP.NET 以及一些遗留的 ASP。我们认为制作一个简单的控制台应用程序来测试/计算平面文件(存储在网络上)和通过网络的 SQL 之间的相同交互,执行大量插入、搜索、更新等以及网络随机断开等操作。这将向他们展示平面文件有多么糟糕,尤其是当您处理数百万条记录时。
我应该在回复中使用哪些内容?我应该如何使用我的演示代码来说明这一点?
到目前为止我的排序列表:
- 安全性
- 并发访问
- 大量数据的性能 进行
- 如此大规模重写/切换的时间量和巨大的成本
- 缺乏事务
- PITA 将关系数据映射到平面文件
- NTFS 不支持大量文件目录良好
- 缺乏临时数据搜索/操作
- 强制数据完整性
- 从网络中断中恢复 等待
- 其他客户端更改提交时出现客户端延迟
- 大多数人很久以前就出于充分的理由停止使用平面文件进行此类存储
- 负载平衡/复制
我担心如果我现在无法阻止的话,有一天这将是《Daily WTF》上的一篇很棒的文章。
此外
有谁知道有关 HIPPA 的任何内容是否可以在这场战斗中使用?我们的许多记录都是患者记录...
The higher-ups in my company were told by good friends that flat files are the way to go, and we should switch from SQL Server to them for everything we do. We have over 300 servers and hundreds of different databases. From just the few I'm involved with we have > 10 billion records in quite a few of them with upwards of 100k new records a day and who knows how many updates... Me and a couple others need to come up with a response saying why we shouldn't do this. Most of our stuff is ASP.NET with some legacy ASP. We thought that making a simple console app that tests/times the same interactions between a flat file (stored on the network) and SQL over the network doing large inserts, searches, updates etc along with things like network disconnects randomly. This would show them how bad flat files can be, especially when you are dealing with millions of records.
What things should I use in my response? What should I do with my demo code to illustrate this?
My sort list so far:
- Security
- Concurrent access
- Performance with large amounts of data
- Amount of time to do such a massive rewrite/switch and huge $ cost
- Lack of transactions
- PITA to map relational data to flat files
- NTFS doesn't support tons of files in a directory well
- Lack of Adhoc data searching/manipulation
- Enforcing data integrity
- Recovery from network outage
- Client delay while waiting for other clients changes to commit
- Most everybody stopped using flat files for this type of storage long ago for good reason
- Load balancing/replication
I fear that this will be a great post on the Daily WTF someday if I can't stop it now.
Additionally
Does anyone know if anything about HIPPA could be used in this fight? Many of our records are patient records...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(19)
数据完整性。首先,您可以在数据库中强制执行它,但不能在平面文件中强制执行。其次,您可以确保不同实体之间具有引用完整性,以防止孤立行。数据
存储效率取决于数据的性质。如果数据自然地分解为实体,那么从在平面文件的情况下需要编写附加代码来连接数据的角度来看,数据库将比大量平面文件更有效。
本机查询功能。您可以本机查询数据库,但不能使用平面文件。对于平面文件,您必须将文件加载到其他环境(例如 C# 应用程序)并使用其功能对其进行查询。
格式完整性。数据库格式更加严格,这意味着更加一致。平面文件很容易发生更改,从而导致读取平面文件的代码崩溃。差异与#3 有关。在数据库中,如果架构发生更改,您仍然可以使用本机工具对其进行查询。如果平面文件格式发生变化,您必须有效地进行搜索,因为读取它的代码可能会被破坏。
“通用”语言。 SQL 在某种程度上是普遍存在的,因为平面文件的结构更具可塑性。
Data integrity. First, you can enforce it in a database and cannot in a flat file. Second, you can ensure you have referential integrity between different entities to prevent orphaning rows.
Efficiency in storage depending on the nature of the data. If the data is naturally broken into entities, then a database will be more efficient than lots of flat files from the standpoint of the additional code that will need to be written in the case of flat files in order to join data.
Native query capabilities. You can query against a database natively whereas you cannot with a flat file. With a flat file you have to load the file into some other environment (e.g. a C# application) and use its capabilities to query against it.
Format integrity. The database format is more rigid which means more consistent. A flat file can easily change in a way that the code that reads the flat file(s) will break. The difference is related to #3. In a database, if the schema changes, you can still query against it using native tools. If the flat file format changes, you have to effectively do a search because the code that reads it will likely be broken.
"Universal" language. SQL is somewhat ubiquitous where as the structure of the flat file is far more malleable.
我还要提到数据损坏。大多数现代 SQL 数据库可能会在服务器上断电,或者服务器实例崩溃,但您不会(不应该)丢失数据。平面文件实际上并非如此。
我还要提到搜索时间。甚至可以编写一个包含 100 万个条目的简单平面文件数据库,并显示与 MS SQL 的搜索时间。使用索引,您应该能够以数千倍的速度搜索 SQL 数据库。
我还会注意你注销平面文件的速度。我什至会说“对于很多情况来说这是一个好主意,但就我们而言......”。这样你就不会让人觉得你没有在听其他人的观点。在这种情况下,机智是需要考虑的一个重要问题。他们可能错得很离谱,但你必须让你的老板相信这一点。
I'd also mention data corruption. Most modern SQL databases can have the power killed on the server, or have the server instance crash and you won't (shouldn't) loose data. Flat files aren't really that way.
Also I'd mention search times. Perhaps even write a simple flat file database with 1mil entries and show search times vs MS SQL. With indexes you should be able to search a SQL database thousands of times faster.
I'd also be careful how quickly you write off flat files. Id go so far as saying "it's a good idea for many cases, but in our case....". This way you won't sound like you're not listening to the other views. Tact in situations like this is a major thing to consider. They may be horribly wrong, but you have to convince your boss of that.
他们从使用平面文件中获得什么?转换过程将需要数百个小时——他们是付费的。平面文件多快可以产生积极的投资回报?提供粗略的成本估算。将技术考虑因素转化为金钱(成本),并将问题置于他们的角度。
除了数据转换之外,还添加了复制数据库功能的隐藏成本...
What do they gain from using flat files? The conversion process will be hundreds of hours - hours they pay for. How quickly can flat files generate a positive return on that investment? Provide a rough cost estimate. Translate the technical considerations into money (costs), and it puts the problem in their perspective.
On top of just the data conversion, add in the hidden costs for duplicating a database's capabilities...
数据库允许您轻松地为数据建立索引,以便通过搜索任意数量的不同列来找到特定记录或记录组。
对于平面文件,您必须编写自己的索引机制。当数据库已经为您完成所有工作时,无需再次执行所有工作。
Databases allow you to easily index your data to be able to particular records or groups of records by searching any number of different columns.
With flat files you have to write your own indexing mechanisms. There is no need to do all that work again when the database does it for you already.
如果您使用“文本文件”,则需要在其之上构建一个接口,微软已经为您完成了该接口,并将其称为 SQL Server。
询问你的经理,对于你的公司来说,花费所有这些资源构建一个自制的数据库系统是否有意义(因为事实就是如此),或者这些资源是否应该更好地用于专注于业务。
性能:SQL Server 专为存储方便的可搜索数据而构建。它优化了内存中的数据结构,在构建时考虑了搜索/插入/删除。由于定期查询的数据保存在内存中,因此降低了磁盘的使用率。
业务合作伙伴:如果您计划与第 3 方公司进行 B2B,SQL Server 具有称为链接服务器的内置功能。如果您只有一堆文件,您的业务合作伙伴就会放弃您,因为无法进行数据互连。除非您想再次重新发明轮子,并为您拥有的每个业务合作伙伴构建一个界面。
集群:您可以轻松地在 SQL Server 中对服务器进行集群,以实现高可用性和速度,这远远超过基于文本的解决方案所能实现的效果。
集群
If you use "text files", you'll need to build an interface on top of it which Microsoft has already done for you and called it SQL Server.
Ask your managers if it makes sense to your company to spend all these resources building a home-made database system (because really that's what it is), or would these resources be better spent focusing on the business.
Performance: SQL Server is built for storing conveniently searchable data. It has optimized data structures in memory built with searching/inserting/deleting in mind. Usage of the disk is lowered, as data regularly queried is kept in memory.
Business partners: if you ever plan to do B2B with 3rd party companies, SQL Server has built-in functionality for it called Linked Servers. If you have only a bunch of files, your business partner will give up on you as no data interconnection is possible. Unless you want to re-invent the wheel again, and build an interface for each business partner you have.
Clustering: you can easily cluster servers in SQL Server for high availability and speed, a lot more than what's possible with text based solution.
你的清单有一个良好的开端。我要添加的项目包括:
如果您想花时间构建数据引擎,可以复制这些项目,但有什么意义呢? SQL 引擎已经提供了这些好处。
You have a nice start to your list. The items I would add include:
These items can be replicated if you want to take the time to build a data engine, but what would be the point? SQL engines already provide these benefits.
我想我什至无法开始列出原因。我觉得我的头要爆炸了。不过,我会冒着风险尝试帮助您...
I don't think I can even start to list the reasons. I think my head is going to explode. I'll take the risk though to try to help you...
如果您是一家上市公司,那么股东如果知道正在认真考虑这一点,将会大有裨益。 “我们”都知道,考虑到您的运营规模和范围,这是一个荒谬的建议。 必须保护患者记录,不仅防止安全漏洞,而且防止 不负责任的损失 - 生命可能取决于数据。如果高管们真正关心患者,这应该是他们最关心的问题。
我从 74 年开始就使用 IBM 370 大型机,DB2 取代普通旧平面文件、VSAM 和 ISAM 的那一天是一个里程碑式的日子。在我使用 4 种类型的 RDBMS 的 25 年里,除了流数据之外,还没有回顾过平面文件存储。
如果我持有“你”的股票,那么在项目启动的那一刻匆忙抛售似乎是合适的……
If you are a public company, the shareholders would be well served to know this is being seriously contemplated. "We" all know this is a ridiculous suggestion given the size and scope of your operation. Patient records must be protected, not only from security breaches but from irresponsible exposure to loss - lives may depend up the data. If the Executives care at all about the patients, THIS should be their highest concern.
I worked with IBM 370 mainframes from '74 onwards and the day that DB2 took over from plain old flat files, VSAM and ISAM was a milestone day. Haven't looked back to flat-file storage, except for streaming data, in my 25 years with RDBMSs of 4 flavors.
If I owned stock in "you", dumping it in a hurry the moment the project took off would seem appropriate...
您的清单是坚持使用数据库的一个很好的开始。
但是,我建议,如果您正在与技术人员交谈,请在建议中避免提及技术原因,因为它们可能会被认为有偏见。
以下是我反对平面文件数据存储的两点:
1) 安全性 - HIPPA 审核要求患者数据保留在安全的环境中。常见的数据库系统(Oracle、Microsoft SQL、MySQL)都有实现符合 HIPPA 的安全访问的方法。在平面文件上执行此操作充其量也是很困难的。
旁注:我还看到过对数据库中的患者姓名进行加密的医疗实践,以添加额外的保护层和安全性。合规性,以确保即使数据库受到损害,患者记录也不会面临风险。
2) 报告 - 来自任何结构化数据库系统的报告都是简单且常见的。有数十万开发人员可以执行此任务。平面文件报告需要高于平均水平的开发人员。而且,由于没有普遍接受的方法来从平面文件数据库进行报告,因此一个开发人员可能会做与另一个开发人员不同的事情。这可能会影响能够在本土平面文件系统上工作的人才库,并最终因必须支持此类系统而导致成本上升。
我希望这有帮助。
Your list is a great start of reasons for sticking with a database.
However, I would recommend that if you're talking to a technical person, to shy away from technical reasons in a recommendation because they might come across as biased.
Here are my 2 points against flat file data storage:
1) Security - HIPPA audits require that patient data remain in a secure environment. The common database systems (Oracle, Microsoft SQL, MySQL) have methods for implementing HIPPA compliant security access. Doing so on a flat-file would be difficult, at best.
Side note: I've also seen medical practices that encrypt the patient name in the database to add extra layers of protection & compliance to ensure even if their DB is compromised that the patient records are not at risk.
2) Reporting - Reporting from any structured database system is simple and common. There are hundreds of thousands of developers that can perform this task. Reporting from flat-files will require an above-average developer. And, because there is no generally accepted method for doing reporting off of a flat-file database, one developer might do things different than another. This could impact the talent pool able to work on a home-grown flat-file system, and ultimately drive costs up by having to support that type of a system.
I hope that helps.
如何使用纯文本文件创建关系模型?
或者您计划为每个实体使用不同的文件?
How do you create a relational model with plain text files?
Or are you planning to use a different file for each entity?
Pro 文件系统:
sort
可以比 SQL 的order by
更快)因此,例如,您选择了一个文件系统来创建日志文件。除非您需要对数据进行复杂的分析,否则登录数据库是没有用的。
Pro DB:
数据库适合您。
Pro file system:
sort
can be faster than SQL'sorder by
)So you'd chose a filesystem to create log files, for example. Logging into a DB is useless unless you need to do complex analysis of the data.
Pro DB:
So if you need to add data rarely but search it often, select parts of it by certain criteria or aggregate values, a DB is for you.
NTFS 不能很好地支持大量 .txt 文件。根据平面文件系统的开发方式,硬盘驱动器的运行状况可能会成为一个问题。许多较旧的文件系统使用大量的小 .txt 文件来存储数据。这是一个糟糕的设计,但随着平面文件系统的老化,往往会发生这种情况。
碎片成为一个问题,您会到处丢失文本文件,从而导致丢失少量数据。在数据库设计方面,硬盘驱动器的健康状况不应成为问题。
NTFS does not support mass amounts of .txt files well. Depending on how a flat file system is developed, the health of a harddrive can become an issue. A lot of older file systems use mass amount of small .txt files to store data. It's bad design, but tends to happen as a flat file system gets older.
Fragmentation becomes an issue, and you lose a text file here and there, causing you to lose small amounts of data. Health of a hard drive should not be an issue when it comes to database design.
对于你的雇主来说,如果他认真地建议用平面文件处理所有事情,这确实是一个重大的问题......
你已经知道原因了(哦 - 将复制/负载平衡添加到你的列表中) - 你现在需要做的是让他相信这些。我对此的方法有两个方面。
首先,我会在您当前使用的任何工具中编写一个脚本,以使用 SQL 执行基本操作,并对其进行计时。然后,我会编写另一个脚本,在其中您真诚地尝试让平面文本解决方案发挥作用,然后突出显示性能差异。给他两组代码,这样他就知道你没有作弊。
指出技术在不断发展,仅仅因为某人在 20 年前取得了成功,并不意味着他们现在就自动获得可信的意见。
您可能还想提及文本文件中的解码/编码信息的错误范围,有人窃取它们是微不足道的,以及调整当前代码库以使用文本文件的成本(证明您的估计是合理的)。
然后,我会向管理层提出严肃的问题 - 其中最重要的是,我会直接问,“为什么你准备根据另一个人的意见推翻你的技术人员的技术问题” - 特别是当该人不太熟悉时按照我们现在的设置...
然后我也会使用这句话“我并不是有意贬低你,但我真的觉得为了公司的利益我必须在这一点上进行干预......”
另一种方法- 扭转局面 - 让 Wonderful 先生提供论据来解释为什么文本文件是前进的方向。然后你要么a)学到一些东西(不太可能),要么b)能够彻底摧毁他的论点。
祝你好运 - 我感受到你的痛苦......
马丁
This is indeed, on the part of your employer, a MAJOR WTF if he's seriously proposing flat files for everything...
You already know the reasons (oh - add Replication / Load Balancing to your list) - what you need to do now is to convince him of them. My approach on this would two fold.
First of all, I would write a script in whatever tool you currently use to perform a basic operation using SQL, and have it timed. I would then write another script in which you sincerely try to get a flat text solution working, and then highlight the difference in performance. Give him both sets of code so he knows you aren't cheating.
Point out that technology evolves, and that just because someone was successful 20 years ago, this does not automatically entitle them to a credible opinion now.
You might also want to mention the scope for errors in decoding / encoding information in text files, that it would be trivial for someone to steal them, and the costs (justify your estimate) in adapting the current code base to use text files.
I would then ask serious questions of management - foremost amongst them, and I would ask this DIRECTLY, is "Why are you prepared to overrule your technical staff on technical matters" based on one other individual's opinion - especially when said individual is not as familiar with our set up as we are...
I'd also then use the phrase "I do not mean to belittle you, but I seriously feel I have to intervene at this point for the good of the company..."
Another approach - turn the tables - have Mr. Wonderful supply arguments as to why text files are the way forward. You'll then either a) Learn something (not likely), or b) Be in a position to utterly destroy his arguments.
Good luck with this - I feel your pain...
Martin
我建议你先进行报复,现在就在 Daily WTF 上发帖。
至于你的问题:商业原因是为什么你的老板想要重写你的所有系统。实际上,您必须从头开始编写自己的数据库系统。
由于开发原因,您将无法访问 SQL Server 生态系统、所有库、工具、实用程序。
也许提出这个建议的人实际上是在考虑与你的公司竞争。
I suggest you get your retalliation in first, post on Daily WTF now.
As to your question: a business reason would be why does your boss want to rewrite all your systems. From scratch as you would, effectively, have to write your own database system.
For a development reason, you would lose access to the SQL server ecosystem, all the libraries, tools, utilities.
Perhaps the guy that suggested this is actually thinking of going into competition with your company.
反驳这一论点的最简单方法 - 列出一家使用平面文件处理如此规模数据的财富 500 强公司的名称?
现在请举出一家不使用关系数据库的财富 500 强公司的名字……
案件已结。
Simplest way to refute this argument - name a fortune 500 company that processes data on this scale using flat files?
Now name a fortune 500 company that doesn't use a relational database...
Case closed.
这里确实有些可疑。对于那些掌握了正确术语(“平面文件”)但不知道这个想法有多么愚蠢的人来说,它就是不成立。我愿意成为你的经理是非技术性的,但与你的经理交谈的人是非技术性的。这听起来更像是一个迷失在翻译问题中的问题。
您确定它们不是指 no-SQL,就好像您处于以文档为中心的环境中一样,从某些方面来说,放弃关系数据库实际上确实有意义,同时仍然具有传统 RDBMS 的许多优点。
因此,我不会证明为什么 SQL 比平面文件更好,而是会反转问题并询问平面文件旨在解决哪些问题。我敢打赌这是一个沟通问题。
如果不是,并且您的公司实际上正在考虑根据“朋友”的推荐用自制的平面文件系统替换其数据库,那么说服您的经理为什么他错了是您最不用担心的。相反,掸去灰尘并开始传播你的简历。
Something is really fishy here. For someone to get the terminology right ( "flat file" ) but not know how overwhelmingly stupid an idea that is, it just doesn't add up. I would be willing to be your manager is non-technical, but the person your manager is talking to is. This sounds more like a lost in translation problem.
Are you sure they don't mean no-SQL, as if you are in a document centric environment, moving away from a relational database actually does make sense in some regards, while still having many of the positives of a tradition RDBMS.
So, instead of justifying why SQL is better than flat files, I would invert the problem and ask what problems flat files are meant to solve. I would put odds on money that this is a communication problem.
If its not and your company is actually considering replacing its DB with a home grown flat file system off the recommendation of "a friend", convincing your manager why he is wrong is the least of your worries. Instead, dust off and start circulating your resume.
这不仅仅是时间的问题,而且还引入了新的错误。重写这些比例将导致当前工作的事情被破坏。
我建议给他一个成本估算,估算仅对一个系统进行重写所需的时间,然后估算需要更改的系统的数量。一旦他们有了成本估算,他们就会尽快摆脱这种情况。
经理喜欢数字,因此需要进行正式的书面决策分析。通过收益和风险并用数值并排比较这两个提案。当你的维护成本为 0,转换成本为 100,000,000 时,他们就会明白这一点。
It's not just amount of time it is the introduction of new bugs. A re-write of these proportions would cause things that currenty work to break.
I'd suggest a giving him a cost estimate of the hours to do such a rewrite for just one system and then the number of systems that would need to change. Once they have a cost estimate, they will run from this as fast as they can.
Managers like numbers, so do a formal written decision analysis. Compare the two proposals by benefits and risks, side by side with numeric values. When you get to cost 0 to maintain and 100,000,000 to convert they will get the point.
不区分平面文件和 sql 的人,不理解你之前所说的所有参数。
解释必须尽可能简单,如下所示:
SQL 是平面文件的某种搜索/并发包装器。
即使公司从零开始编写包装器,当前存在的所有问题也将继续存在。
此外,您还必须提供其他方法来解决当前问题,使用高级 BLL 等智能词或安装/卸载脚本环境。 :)
The people that doesn't distinguish between flat files and sql, doesnt understand all arguments that you say before.
The explanation must simple as possible, something like this:
SQL is a some kind of search/concurrency wrapper around the flat files.
All the problems that exist currently, will stay even the company going to write the wrapper from zero.
Also you must to give some other way to resolve the current problems, use smart words like advanced BLL or install/uninstall scripting environment. :)
你必须讲行政话。不用说出来,让他们意识到他们已经无法应对了。这里有一些弹药:
数据库理论是核心计算机科学。我们正在讨论构建一个可扩展的系统,该系统可以处理数百万条记录并容忍灾难,而不会导致每个人都停业。
这是博士级专家的工作。他们已经对该领域进行了 20 年的完善,其伟大之处在于:它使我们能够专注于构建业务系统。
如果必须的话,请站出来说企业中没有这样做。这样做的成本会很高,而且结果也会很差。这正是开发人员喜欢重新发明的轮子,在我看来,您唯一应该这样做的时候就是结果将是您可以销售的产品或服务。但事实并非如此。
You have to speak executive. Without saying it, make them realize they're in way over their heads here. Here's some ammunition:
Database theory is hardcore computer science. We're talking about building a scalable system that can handle millions of records and tolerate disasters without putting everyone out of business.
This is the work of PhD-level specialists. They've been refining the field for a good 20 years now, and the great thing about that is this: it allows us to specialize in building business systems.
If you have to, come right out and say that this just isn't done in the enterprise. It would be costly and the result would be inferior. It's exactly the kind of wheel that developers love to reinvent, and in my opinion the only time you should is if the result is going to be a product or service that you can sell. And it won't be.