这是我的情况:我正在尽力遵循 3 层模式(即表示层、业务层和数据层)。 当我需要来自数据库的数据时,业务层调用返回信息的数据层。 数据层从不返回 SqlDataReader 或 DataTable 对象,但通常返回数据访问层已知的自定义对象的枚举。 当数据层必须返回包含少量对象的列表时,它工作得很好。
我现在面临这个问题,我的应用程序(业务层)必须处理500000条记录。 我可以简单地向数据层添加另一个方法并返回 IEnumerable,但这对我来说听起来很糟糕。 我不想在内存中加载五十万条记录。
我的问题是,考虑到三层模型,我应该如何处理这种情况? 如果我没有 3 层模式,我会简单地在我的业务类中使用 SqlDataReader。 有什么建议么?
UPDATE:数据不会显示,所以这不是分页问题(这里根本不涉及表示层)。 我只需分析每条记录,然后保留其中一些即可。
谢谢
Here is my situation: I am trying to follow as hard as I can the 3-tier pattern (i.e. Presentation, Business and Data layer). When I need data from the DB, the Business layer calls the Data layer which returns the information. The Data layer never return a SqlDataReader or DataTable object, but often an enumeration of custom object known by the Data Access Layer. It works pretty well when the Data layer has to return a list with few objects.
I am now facing this problem, my application (the business layer) must process 500000 records. I could simply add another method to my Data layer and return an IEnumerable but this sound very bad to me. I don't want to load half-million records in memory.
My question is, considering the 3-tier model, how should I handle this case? If I had no 3-tiers pattern, I would simply use SqlDataReader in my business classes. Any suggestions?
UPDATE: The data will not be displayed, so this is not a paging issue (the presentation layer is not involved at all here). I simply have to analyze each record and then keep some of them.
Thanks
发布评论
评论(9)
我假设你不会一次向前端显示 500,000 条记录? 您可能正在做一些分页,对吗? 因此,一次仅从数据库返回一页数据。
I assume you're not displaying 500,000 records to the front end at once? You're probably doing some pagination, right? So, only return one page worth of data from the database at one time.
是的,你的直觉是正确的。
我敢打赌,您的 UI 客户端不想一次查看 50 万条记录。 Google 不会返回单个页面中的所有点击; 你也不会。
您可以选择应用程序处理这 50 万条记录的地点和时间。 您可以将它们分成更小的工作单元; 您可以异步处理它们; 您可以编写一个存储过程并在数据库中处理它们,而无需将它们全部带到中间层。
MVC 模式很棒,但它并不是神圣不可侵犯的。 做出适合您的应用程序的选择。
Yes, your instinct is correct.
I'm betting that your UI client does not want to look at half a million records at once. Google doesn't return every hit in a single page; you won't, either.
You have a choice as to where and when your application processes those half a million records. You can chunk them into smaller units of work; you can process them asynchronously; you can write a stored procedure and process them in the database without bringing them all over to the middle tier.
The MVC pattern is wonderful, but it's not holy writ. Make the choices that work for your app.
一张纸永远无法战胜现实。 如果您的具体问题要求打破三层范式,那就去做吧。
A piece of paper can never trump reality. If your specific problem asks to break the 3-tier paradigm, do it.
在某些情况下,您必须打破 3 层界限。 但在此之前,您可以问自己:
当您“分析每条记录并保留其中一些记录”时,这真的是业务逻辑的一部分吗? 或者它是一个数据访问功能? 这可能属于数据访问层。
如果它是业务逻辑的一部分,您是否需要所有 500000 条记录才能决定是否“保留”任何单独的记录? 业务层可能应该一次处理一条记录。 进行 500000 次连续的数据库调用并不漂亮,但如果从概念的角度来看这是应用程序应该做的事情,那么有一些方法可以缓解这种情况。
我不建议为了保持三层分离而做任何愚蠢的事情。 但有时,当您认为必须越界时,是因为设计中有些东西需要重新审视。
--
BB
In some cases, you have to break the 3-tier boundaries. But before you do, you could ask yourself:
When you "analyze each record and then keep some of them," is that really part of the business logic? Or is it a data-access function? It might be the case that this belongs in the data access layer.
If it is part of the business logic, do you need all 500000 records in order to make a decision about whether to "keep" any individual record? It might be that the business layer should be processing one record at a time. Making 500000 consecutive database calls is not pretty, but if that is what the app should be doing from a conceptual standpoint, there are ways to mitigate that.
I don't recommend doing anything dumb just to keep the 3 tiers separate. But sometimes, when you think you have to cross the line, it's because there is something in the design that needs a second look.
--
bmb
您可以在 SqlReader 类之上构建抽象。 这样您就不必直接传递 SqlReader,但您仍然可以一次处理一个对象。
想想迭代器。
You can build a abstraction on top of the SqlReader class. That way you don't have to pass the SqlReader directly, but you can still process the objects one at a time.
Think Iterators.
在数据库中进行过滤。 没有必要携带超过 500000 条您要过滤掉的记录。 为什么要把它们全部带到中间层只是为了删除它们。 使用后端(存储过程)中的 SQL 引擎尽早处理操作(数据)。 最有效,类似于在发送到 IIS 之前在表示层上检查基本输入。
Do the filtering in the database. there is no need to bring over 500000 records that you are going to filter out anyway. Why bring them all to the middle tier just to remove them. Take care of the operation (data) as early as possible using the SQL Engine in the back end (sproc). Most efficient, similar to checking for basic input checks on the presentation layer before sending to IIS.
这并不是一个罕见的问题,并且在需要合并大量数据并向用户呈现摘要的情况下经常发生(报告是一个典型的示例)。 您的解决方案在设计时应考虑到这些因素。 当与某些特定架构模型的严格一致性导致应用程序效率低下时,忽略 sql 读取器(或类似工具)提供的效率是没有意义的。 通过调整架构模型以满足您的需求,通常可以克服其中一些问题。 通用架构模型很少可以开箱即用。 它们是应该适用于您的特定需求的指南。
This is not an uncommon problem and occurs frequently in situations where you need to consolidate large amounts of data and present summaries to the user (reports are a typical example). Your solution should be designed with these considerations in mind. It does not make sense to ignore the efficiencies offered by sql readers (or similar tools) when strict coherence to some particular architectural model makes your application inefficient. It is often possible to overcome some of these problems by adapting an architectural model to your needs. Generic architectural models are rarely applicable out of the box. They are guidelines that should be applied to your particular needs.
在数据库级别进行任何需要的分析并不可耻。 如果您可以使用存储过程对所需内容进行切片和切块,或者与存储过程进行必要的关联并使用应用程序进行更复杂的操作,那么您应该没问题。
问题是,用户是否期望按下按钮并处理所有 500K 记录并看到结果? 如果是这样,他们是否愿意坐下来观看旋转的 gif,或者在该过程完成后收到某种类型的通知是否令人满意? 如果处理 500K 数据至关重要,那么您的数据模型是否需要更改才能支持此过程? 有 Hadoop 和 消息队列 适合如此大的容量,但是您需要达到这种程度吗? 在为性能而烦恼之前,您也许可以先设定用户的期望。
There is no shame in doing whatever analysis you need to at the database level. If you can slice and dice what you need with stored procedure or make the necessary correlations with stored procedures and use an application for more complex operations you should be fine.
The question is, does the user expect to push a button and process all 500K records and see a result? If so, are they willing to sit and watch a spinning gif or will it be satisfactory to receive some type of notification when the process is complete? If processing the 500K is of the utmost importance, does your data model need alteration to support this process? There are processing methods such as Hadoop and message queues that are geared for this high volume, but do you need to go to this extent? You might able to set the expectations of your users before pulling you hair out over performance.
如果我理解正确,您想要“分析”记录,然后保留其中一些并删除其余的。 在这种情况下,我认为最好在数据库本身(PL/SQL 或 T/SQL)内处理这个问题。 像这样的需求应该是重中之重,而不是架构。 由于您不只是显示分析,因此最好在过程本身中进行。
If I am understanding this correctly you want to "Analyze" the records and then keep some of them and get rid of rest of them. Well in that case I think it will be best to handle this within the database itself (PL/SQL or T/SQL). Requirements like these should be a top priority and not the architecture. Since you are not displaying just analyzing, it's best to do in the procedure itself.