搜索有关构建大型企业系统的信息

发布于 2024-07-05 16:21:35 字数 458 浏览 10 评论 0原文

如果在一个会话中上传和处理 500000 条数据记录是正常操作(C# .NET 3.5 + MS SQL 2005),您如何组织信息管理系统的数据库层、业务逻辑和跨平台 API?

我对经过生产验证的分页模式特别感兴趣,这些模式在并发性、可扩展性和可靠性方面表现良好。

有人有什么想法,朝什么方向挖掘吗?

  • 开源项目(不关心语言或平台,只要不是 Ook 即可)
  • 书籍
  • 文章
  • Google 关键字
  • 论坛或新闻组

任何帮助将不胜感激!

更新:

  • 简单分页(即:行号 SQL 2005)不起作用,因为有 有很多并发更改 到数据库。 在页面请求之间删除或插入的项目会自动使当前页面索引无效。

How do you organize DB layer, business logic and cross-platform API of your information management system, if uploading and processing 500000 data records in one session is a normal operation (C# .NET 3.5 + MS SQL 2005)?

I’m specifically interested in production-proven paging patterns that behave well with the concurrency, scalability and reliability.

Does anybody have any ideas, in what direction to dig?

  • Open Source Projects (don’t care about the language or platform, as long as it is not Ook)
  • books
  • articles
  • Google keywords
  • forums or newsgroups

Any help would greatly appreciated!

Update:

  • simple paging (i.e.: rownumber in
    SQL 2005) does not work, since there
    are a lot of concurrent changes
    to the database. Item, that is deleted or inserted between the page requests, automatically makes current page index invalid.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

早乙女 2024-07-12 16:21:35

这是一本很好的入门书:

模式企业应用程序架构作者:Martin Fowler

This is a good book to start with:

Patterns of Enterprise Application Architecture by Martin Fowler

鹿港小镇 2024-07-12 16:21:35

当涉及到大量数据的数据库优化时,您很可能会从使用“BigTable”技术中受益。 我在这里找到了文章非常有用。 简而言之,这个想法是使用数据库非规范化来交换磁盘空间以获得更好的性能。

对于 MS SQL 2005 中的分页,您需要找到有关使用 ROW_NUMBER 函数的更多信息。 这只是一个简单的示例,您使用 google 可以找到大量此类内容(关键字:ROW_NUMBER paging SQL 2005)。 不过,不要挖掘太多——实现上并没有什么魔力,而是如何使用/呈现分页本身。 谷歌搜索就是一个很好的例子。

注意:我们发现 NHibernate 框架本机分页支持不足以满足我们的解决方案。

此外,您可能会对创建 FULLTEXT 索引和使用全文搜索感兴趣。 这里是有关创建全文索引的 MSDN 文章,以及 有关全文搜索的一些信息

祝你好运。

When it comes to DB optimization for huge amount of data you’ll most probably benefit from using “BigTable” technique. I found article here very useful. Shortly the idea is to use DB denormalization to trade disk space for better performance.

For paging in MS SQL 2005 you’ll want to find more info on using ROW_NUMBER function. Here is just a simple example, you’ll find tons of them using google (keywords: ROW_NUMBER paging SQL 2005). Do not dig to much though – there is no magic in implementation, rather in how are you going to use/present the paging itself. Google search is a good example.

Note: we found NHibernate framework native paging support not sufficient for our solution.

Also you’ll probably be interested in creating FULLTEXT index and using full text search. Here is MSDN article on creating full text index, and some info on full text search.

Good luck.

野侃 2024-07-12 16:21:35

完成了实施。 我最近获悉其中一项上传的记录约为 2148849 条。 Tiers 确实成功地处理了上传期间数据库级别的几个断开的连接和数十个死锁。

如果其他人需要一些信息:

Done the implementation. I have been informed recently that one of the uploads was about 2148849 records. Tiers did successfully deal with the a couple of broken connections and dozens of deadlocks at the DB level during this upload.

In case somebody else needs some info:

静赏你的温柔 2024-07-12 16:21:35

我负责管理一个企业数据仓库,该仓库上传数十万条记录的一些源。
我不确定这是否是您的情况,但我们:

  • 接收我们上传到 Sybase 数据库的文本文件。
  • 使用 awk 设置不同提要的格式,以便它们采用通用格式。
  • 使用 bcp 将它们加载到非规范化的中间表中。
  • 运行存储过程来填充规范化的数据库结构。
  • 从非规范化中间表中删除。

这运行得相当好,但我们强制我们按顺序上传。 即,当提要到达时,它们会进入队列,我们​​在查看其余部分之前完全处理队列头部的提要。

这些有帮助吗?

I look after an enterprise data warehouse which uploads some feeds of hundreds of thousands of records.
I'm not sure if this is your scenario, but we:

  • Receive text files whch we upload to a Sybase database.
  • Format the different feeds using awk so they're in a common format.
  • Load them into a denormalised intermediate table using bcp.
  • Running stored procedures to populate the normalised database structre.
  • Delete from the denormalised intermediate table.

This runs fairly well, but we force our uploads to be sequential. I.e. when feeds arrive they go into a queue, and we process the feed at the head of the queue entirely before looking at the rest.

Is any of that helpful?

森林迷了鹿 2024-07-12 16:21:35

dandikas,

感谢您提到部分非规范化。 是的,这就是我正在考虑提高某些查询性能的方法。

不幸的是,NHibernate ORM 不适合该解决方案,因为它增加了性能开销。 与 SQL 分页相同 - 它在大量并发编辑的情况下不起作用(由 压力测试

dandikas,

thank you for mentioning the partial denormalization. Yes, that's the approach I'm considering for improving performance of some queries.

Unfortunately, NHibernate ORM does not fit into the solution, due to the performance overhead it adds. Same with the SQL paging - it does not work in the scenario of numerous concurrent edits (as detected by the stress-testing)

杀手六號 2024-07-12 16:21:35

与 SQL 分页相同 - 它不适用于大量场景
并发编辑(通过压力测试检测到)

正如我所提到的,实现分页并没有什么神奇之处——您要么使用 ROW_NUMBER 要么使用临时表。 这里的魔力在于评估您最常见的现实世界使用场景。 使用临时表和用户跟踪可能有助于克服并发编辑场景。 尽管我感觉通过回答以下问题您会赢得更多:

  1. 用户在移动到另一页面之前在一个页面上停留多长时间?
  2. 用户从第一个页面移动到任何其他页面的频率是多少?
  3. 用户将浏览的常见页面数是多少?
  4. 当用户从一个页面移动到另一个页面并返回时,如果某些信息发生变化,那么这有多重要?
  5. 如果当用户在显示信息的页面上时某些信息被删除,这有多重要?

尽量不要专注于这样的问题:“如何在分页时处理任何可能的并发编辑场景?” 在您首先回答上述问题之前,然后只处理真正重要的情况。

另一个注意事项是用户界面。 检查尽可能多的分页 UI,因为有比左右箭头或排列页码更好的解决方案。 一些解决方案有助于隐藏/克服技术上无法解决的寻呼场景。

PS如果这个答案有用,我会将其与我的第一个答案结合起来。

Same with the SQL paging - it does not work in the scenario of numerous
concurrent edits (as detected by the stress-testing)

As I mentioned, there is no magic in implementing paging – you either use ROW_NUMBER or a temporary table. The magic here is in evaluating what is your most common real world usage scenario. Using temporary table along with user tracking might help a bit in overcoming concurrent edits scenario. Though I sense that you’ll win more by answering questions:

  1. How long user stays on one page before moving to another one?
  2. How often user moves from first to any other page?
  3. What is the common pages count that user will look through?
  4. How critical it is if some information changes while user is moving from one page to another and back?
  5. How critical it is if some information gets deleted while user is on the page that shows the information?

Try not to concentrate on question like: “How to handle any possible concurrent edits scenario while paging?” before you answer above questions first and then handle only situations that really matter.

Another note is UI. Check out as much paging UI as you can find, as there are much better solutions than just right and left arrows, or lined up page numbers. Some of solutions help to hide/overcome technically not solvable paging scenarios.

P.S. If this answer is useful I’ll combine it with my first one.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文