如何对庞大的遗留数据库进行现代化改造?
我有一个问题,只是在这里寻求建议。
因此,我的应用程序通过将桌面应用程序转换为 Web 来“现代化”桌面应用程序,并使用 ICEFaces UI 和用 Java 编写的服务器端。然而,他们保留着相同的 Oracle 数据库,该数据库目前约有 700-900 个表,表中的总记录可能有 10 亿条。一些单独的表有 2.5 亿行,许多表的行数超过 2500 万行。
不用说,数据库的扩展性不好。因此,应用程序的性能看起来很糟糕。架构师/决策者都拒绝或不愿意重组持久性。因此,基本上我们是在功能性桌面应用程序上涂上一层新漆,该应用程序目前可以满足大多数用户的需求,并且相对容易实现。现在桌面应用程序中的实际数据库性能相当慢。我之前提到的快速性能与数据库无关(抱歉我说错了)。我晚上睡不着觉,一想到这个应用程序的性能将会多么糟糕,而日常用户完成他们的工作将会有多么困难。
所以,我的问题是,我有什么选择来减轻这场即将发生的灾难?是否可以在数据库和 Java 代码之间放置某种类型的中间层来提高性能,同时保持数据库结构完整?缓存显然是一种选择,但我不认为这是万能的。是否可以在两者之间分层 NoSQL DB 或者其他什么?
I have a question, just looking for suggestions here.
So, my application is 'modernizing' a desktop application by converting it to the web, with an ICEFaces UI and server side written in Java. However, they are keeping around the same Oracle database, which at current count has about 700-900 tables and probably a billion total records in the tables. Some individual tables have 250 million rows, many have over 25 million.
Needless to say, the database is not scaling well. As a result, the performance of the application is looking to be abysmal. The architects / decision makers-that-be have all either refused or are unwilling to restructure the persistence. So, basically we are putting a fresh coat of paint on a functional desktop application that currently serves most user needs and does so with relative ease. The actual database performance is pretty slow in the desktop app now. The quick performance I referred to earlier was non-database related stuff (sorry I misspoke there). I am having trouble sleeping at night thinking of how poorly this application is going to perform and how difficult it is going to be for everyday users to do their job.
So, my question is, what options do I have to mitigate this impending disaster? Is there some type of intermediate layer I can put in between the database and the Java code to speed up performance while at the same time keeping the database structure intact? Caching is obviously an option, but I don't see that as being a cure-all. Is it possible to layer a NoSQL DB in between or something?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
我不明白如何协调你所说的两件事。
并且
您并没有说要添加新用户或新功能,只是让相同的功能可以通过网络界面访问。
那么为什么会出现问题呢。您的 Web 应用程序将或多或少地执行与以前相同的数据库工作。
事实上,引入 Web 层很可能会提供新的缓存机会,从而减少数据库所做的工作。
如果您的早期 Web 应用程序开发表现出较差的性能,那么我将首先尝试了解您在 Web 应用程序中执行的查询与现有应用程序执行的查询有何不同。您是否可能使用某种工具来生成查询,该工具采用了一种比较幼稚的方法?
I don't understand how to reconcile two things you said.
and
You don't say you are adding new users or new function, just making the same function accessible via a web interface.
So why is there a problem. Your Web App will be doing more or less the same database work as before.
In fact introducing a web tier could well give new caching opportunities so reducing the work the DB is doing.
If your early pieces of web app development are showing poor performance then I would start by trying to understand how the queries you are doing in the web app differ from those done by the existing app. Is it possible that you are using some tooling which is taking a somewhat naive approach to generating queries?
如果当前应用程序运行良好,而新的 java 应用程序运行不佳,则问题不在于数据库层,而在于应用程序层。如果性能像您所说的那样糟糕,他们应该尽早注意到并可以选择返回桌面应用程序。
DBA 应该能够轻松地从您的应用程序中识别出数据库上的额外工作负载。假设逻辑没有改变,它不太可能进行更多的写入。它可以被读取,也可以“更闲聊”(以较小的数据块移动相同数量的信息)。聊天应用程序可能会使用大量 CPU。许多架构师尝试将处理从数据库层转移到应用程序层,因为“数据库上的工作成本很高”,但实际上由于“来回”的开销而使事情变得更糟。
附言。
表中包含 2.5 亿行并没有什么“坏处”。通常,您通过索引访问表。从索引的顶部到底部通常有 2 或 3 个跃点(然后再到表)。我有一个 2000 万行的表(BLEVEL 为 2)和一个 120+ 百万行的表(BLEVEL 为 3)。
索引意味着您很少会碰到超过一小部分数据块。经常使用的索引块(和数据块)缓存在数据库服务器的内存中。 DBA 将能够查看此内存区域对于工作负载(即大量物理磁盘 IO)而言是否太小。
如果您的应用程序获取了大量它并不真正需要的信息,这可能会给内存空间带来压力。不要贪心。如果您只需要一行中的三列,请不要抓取整行。
If the current app performs well and your new java app doesn't, the problem is not in the database layer, but in your application layer. If performance is as bad as you say, they should notice fairly early and have the option of going back to the Desktop application.
The DBA should be able to readily identify the additional workload on the database from your application. Assuming the logic hasn't changed it is unlikely to be doing more writes. It could be reads or it could be 'chattier' (moving the same amount of information but in smaller parcels). Chatty applications can use a lot of CPU. A lot of architects try to move processing from the database layer into the application layer because "work on the database is expensive" but actually make things worse due to the overhead of the "to-and-fro".
PS.
There's nothing 'bad' about having 250 million rows in a table. Generally you access a table through an index. There are typically 2 or 3 hops from the top of an index to the bottom (and then one more to the table). I've got a 20 million row table with a BLEVEL of 2 and a 120+ million row table with a BLEVEL of 3.
Indexing means that you rarely hit more than a small proportion of your data blocks. The frequently used index blocks (and data blocks) get cached in the database server's memory. The DBA would be able to see if this memory area is too small for the workload (ie a lot of physical disk IO).
If your app is getting a lot of information that it doesn't really need, this can put pressure on the memory space. Don't be greedy. if you only need three columns from a row, don't grab the whole row.
如果您拥有正确的设备和数据库设计,Oracle 应该能够轻松处理您所描述的内容。如果您的团队中有人是大型应用程序性能调优方面的专家,那么它应该可以很好地扩展。
从头开始重做数据库将花费大量资金,并且会引入新的错误,并且丢失关键信息的可能性是巨大的。此时重写数据库几乎从来不是一个更好的主意。通常,这类项目在给公司造成数千甚至数百万美元损失后惨遭失败。你们的建筑师做出了正确的选择。学会接受你想要的并不总是最好的方式。对于公司来说,数据远比应用程序重要。人们已经学会不要尝试从头开始重新设计数据库的原因有很多。
现在有一些方法可以提高数据库性能。对于这种大小的数据库,我首先考虑的是对数据进行分区。我还会考虑将旧数据存档到数据仓库并从中进行大部分报告。其他需要考虑的事情是将您的服务器改进为更高性能的模型,进行分析以查找运行最慢的查询并单独修复它们,查看索引,更新统计信息和索引(不确定这是否是您在 Oracle 上所做的,我是 SLQ服务器女孩,但你的数据库管理员会知道)。有一些关于重构旧遗留数据库的好书。下面的内容不是特定于数据库的。
http://www.amazon.com/Refactoring-Databases-Evolutionary-Database-Design/ dp/0321293533/ref=sr_1_1?ie=UTF8&s=books&qid=1275577997&sr=8-1
还有一些关于性能调优的好书(寻找专门针对 Oracle 的书,适用于 SQL Server 或 mySQL 的书并不适合 Oracle)
就我个人而言,在设计如何解决糟糕性能的计划之前,我会先获取这些内容并从头到尾阅读它们。我还将 DBA 纳入您的所有规划中,他们知道您不知道的关于数据库的事情以及为什么有些事情是这样设计的。
What you describe is something that Oracle should be capable of handling very easily if you have the right equipment and database design. It should scale well if you get someone on your team who is a specialist in performance tuning large applications.
Redoing the database from scratch would cost a fortune and would introduce new bugs and the potential for loss of critical information is huge. It almost never is a better idea to rewrite the database at this point. Usually those kinds of projects fail miserably after costing the company thousands or even millions of dollars. Your architects made the right choice. Learn to accept that what you want isn't always the best way. The data is far more important to the company than the app. There are many reasons why people have learned not to try to redesign the database from scratch.
Now there are ways to improve database performance. First thing I would consider with a database this size is partioning the data. I would also consider archiving old data to a data warehouse and doing most reporting from that. Other things to consider would be improving your servers to higher performing models, profiling to find slowest running queries and individually fixing them, looking at indexing, updating statistics and indexes (not sure if this is what you do on Oracle, I'm a SLQ Server gal but your dbas would know). There are some good books on refactoring old legacy databases. The one below is not datbase specific.
http://www.amazon.com/Refactoring-Databases-Evolutionary-Database-Design/dp/0321293533/ref=sr_1_1?ie=UTF8&s=books&qid=1275577997&sr=8-1
There are also some good books on performance tuning (look for ones specific to Oracle, what works for SQL Server or mySQL is not what is best for Oracle)
Personally I would get those and read them from cover to cover before designing a plan for how you are going to fix the poor performance. I would also include the DBAs in all your planning, they know things that you do not about the database and why some things are designed the way they are.
如果您有大量查找不在数据库中的项目,您可以使用布隆过滤器来减少数量。将数据库中的所有内容添加到布隆过滤器中,然后在进行查找之前先检查布隆。仅当Bloom报告它存在时,您才需要打扰数据库。绽放会导致误报,但您可以将其设计为最适合您的“大小与误报”权衡。
谷歌在他们的大表数据库中使用了该策略,他们报告说它显着提高了性能。
http://en.wikipedia.org/wiki/Bloom_filter
祝你好运,正在处理你的任务不相信很难。
If you have a lot of lookups that are for items not in the database you can reduce the number by using a bloom filter. Add everything in the database to the bloom filter then before you do a lookup check the bloom first. Only if the bloom reports it present do you need to bother the database. The bloom will result in false positives but you can design it to the 'size vs false positive' trade off that best suits you.
The strategy is used by Google in their big-table database and they have reported that it significantly improves performance.
http://en.wikipedia.org/wiki/Bloom_filter
Good luck, working on tasks you don't believe in is tough.
因此,您在功能齐全且快速的桌面应用程序上涂了一层新漆,然后系统变得很慢?
然后你说“不用说数据库扩展性不好”?
我不明白。我认为你的新油漆有问题,而不是数据库有问题。
So you put a fresh coat of paint on a functional and quick desktop application and then the system becomes slow?
And then you say that "it is needless to say that the database isn't scaling well"?
I don't get it. I think that there is something wrong with your fresh coat of paint, not with the database.
不要因为这种事而沮丧。将其视为一项挑战,而不是一件令人失眠的事情!我知道作为一名程序员,想要把所有东西都扔掉并重新开始是很诱人的,但从商业角度来看,这并不总是可行的。例如,通过使用相同的数据库,企业可以在开发新应用程序的同时继续使用旧应用程序并分组切换客户,而不必同时切换所有人。
至于你能对性能做些什么,这在很大程度上取决于使用模式。缓存对于大多数只读数据库有很大帮助。即使使用读/写数据库,如果设计正确,它仍然可以带来好处。 NoSQL 数据库可能有助于处理大量写入的内容,但如果数据最终必须存储在常规数据库中,那么它也可能会带来更多麻烦。
最后,这在很大程度上取决于应用程序的架构和使用模式。
祝你好运!
Don't be put down by this sort of thing. See it as a challenge, rather than something to be losing sleep over! I know it's tempting as a programmer to want to rip everything out and start over again, but from a business perspective, it's just not always viable. For example, by using the same database, the business can continue to use the old application while the new one is being developed and switch over customers in groups, rather than having to switch everyone over at the same time.
As for what you can do about performance, it depends a lot on the usage pattern. Caching can help greatly with mostly read-only databases. Even with read/write database, it can still be a boon if correctly designed. A NoSQL database might help with write-heavy stuff, but it might also be more trouble than it's worth if the data has to end up in a regular database anyway.
In the end, it all depends greatly on your application's architecture and usage patterns.
Good luck!
好吧,如果不太了解主要完成的查询类型(我认为查找更常见),也许您应该首先尝试缓存。并在不同的层进行缓存,如果可能的话,在应用程序服务器之前的层进行缓存,当然还有您建议在应用程序服务器和数据库之间的层进行缓存。
缓存对于读取数据效果很好,而且可能没有您想象的那么糟糕。
您看过 Terracotta 吗?他们确实有一些可能与您相关的缓存和扩展内容。
把它当作一个挑战!
Well without knowing too much about what kinds of queries that are mostly done (I would expact lookups to be more common) perhaps you should try caching first. And cache at different layers, at the layer before the app server if possible and of course what you suggested caching at the layer between the app server and the database.
Caching works well for read data and it might not be as bad as you think.
Have you looked at Terracotta ? They do have some caching and scaling stuff that might be relavant to you.
Take it as a challenge!
“减轻这场迫在眉睫的灾难”的方法就是做你应该做的事情。如果您遵循最佳实践,那么稍后阶段切换持久层的痛苦将会最小化。
在您拥有有效的性能基准并确定系统瓶颈之前,谈论性能还为时过早。无论如何,如果许多“中间层”策略尚未在数据库级别实现,我会感到惊讶。
The way to 'mitigate this impending disaster' is to do what you should be doing anyway. If you follow best practices the pain of switching out your persistence layer at a later stage will be minimal.
Up until the time that you have valid performance benchmarks and identified bottlenecks in the system talk of performance is premature. In any case I would be surprised if many of the 'intermediate layer' strategies aren't already implemented at the database level.
如果数据库是遗留的且庞大,那么
1) 不能以改变接口的方式对其进行更改,因为这会破坏太多现有应用程序。或者,如果您更改界面,则必须与修改具有相关测试的多个应用程序相协调。
2)如果问题是性能,那么可能可以进行许多更改来优化数据库,而无需更改界面。
3) 视图可用于维护现有接口,同时重组表以提高效率,或者可能允许将来更有效的访问。
4)标准的数据库优化,例如性能分析、索引、缓存,可能可以在不改变接口的情况下大大提高效率和性能。
还有很多事情可以做,但您已经明白了。它不可能真正通过一次重大更改来更新。更改必须是增量的,或者对于使用它的应用程序是透明的。
If the database is legacy and enormous, then
1) it cannot be changed in a way that will change the interface, as this will break too many existing applications. Or, if you change the interface, this has to be coordinated with modifying multiple applications with associated testing.
2) If the issue is performance, then there are probably many changes that can be made to optimize the database without changing the interface.
3) Views can be used to maintain the existing interfaces while restructuring tables for more efficiency, or possibly to allow more efficient access in the future.
4) Standard database optimizations, such as performance analysis, indexing, caching can probably greatly increase efficiency and performance without changing the interface.
There's a lot more that can be done, but you get the idea. It can't really be updated in one single big change. Changes have to be incremental, or transparent to the applications that use it.
数据库是应用程序的一部分。不要认为它们是分开的,事实并非如此。
作为开发人员,您需要根据需要自由地进行架构更改,并建议数据更改以提高生产中的性能/功能(例如归档旧数据)。
您的开发系统可能没有那么多数据,但具有完全相同的架构。
为了进行性能测试,您需要一个与生产系统具有相同硬件和相同大小数据(如果可能的话,相同数据)的系统。您应该向管理层解释性能测试是绝对必要的,因为您认为该应用程序无法执行。
当然,进行架构更改(添加/删除索引、拆分表等)可能会影响系统的其他部分 - 您应该将其视为系统的一部分 - 因此进行必要的回归测试和修复。
如果您需要修改数据库架构,并对桌面客户端进行相应的更改,以使 Web 应用程序正常运行,那么您必须做的就是向管理层证明您的设计决策的合理性。
The database is PART of the application. Don't consider them to be separate, it isn't.
As developer, you need to be free to make schema changes as necessary, and suggest data changes to improve performance / functionality in production (for example archiving old data).
Your development system presumably does not have that much data, but has the exact same schema.
In order to do performance testing, you will need a system with the same hardware and same size data (same data if possible) as production. You should explain to management that performance testing is absolutely necessary as you feel the app isn't going to perform.
Of course making schema changes (adding / removing indexes, splitting tables out etc) may affect other parts of the system - which you should consider as parts of a SYSTEM - and hence do the necessary regression testing and fixing.
If you need to modify the database schema, and make changes to the desktop client accordingly, to make the web app perform, that is what you have to do - justify your design decision to the management.