表访问性能
我们有一个完全用 C 编写的应用程序。对于代码内部的表访问,例如从表中获取一些值,我们使用 Pro*C。为了提高应用程序的性能,我们还预加载一些表来获取数据。一般来说,我们获取一些输入字段并从表中获取输出字段。
表中通常有大约 30000 个条目,有时最多可达 10 万个。
但如果表条目增加到大约 1000 万个条目,我认为这会对应用程序的性能产生危险的影响。
我有什么地方说错了吗?如果确实影响性能,有什么办法可以保持应用程序的性能稳定呢?
考虑到应用程序处理表的方式,如果表中的行数增加到 1000 万,可能的解决方法是什么?
We have an application which is completely written in C. For table access inside the code like fetching some values from a table we use Pro*C. And to increase the performance of the application we also preload some tables for fetching the data. We take some input fields and fetch the output fields from the table in general.
We usually have around 30000 entries in the table and max it reaches 0.1 million some times.
But if the table entries increase to around 10 million entries, I think it dangerously affects the performance of the application.
Am I wrong somewhere? If it really affects the performance, is there any way to keep the performance of the application stable?
What is the possible workaround if the number of rows in the table increases to 10 million considering the way the application works with tables?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您不对表格进行排序,您将获得按比例增加的搜索时间...如果您没有编写任何错误的代码,在您的示例中(30K 与 1M),您将获得 33 倍的搜索时间。我假设您正在增量迭代(i++ 样式)该表。
但是,如果可以以某种方式对表进行排序,那么您可以大大减少搜索时间。这是可能的,因为搜索排序信息的索引器算法不会解析每个元素,直到找到要查找的元素为止:它使用辅助表(树、散列等),通常搜索速度要快得多,然后精确定位正确的查找元素,或者至少可以更接近地估计它在主表中的位置。
当然,这将以必须对表进行排序为代价,无论是在其中插入或删除元素时,还是在执行搜索时。
If you are not sorting the table you'll get a proportional increase of search time... if you don't code anything wrong, in your example (30K vs 1M) you'll get 33X greater search times. I'm assumning you're incrementally iterating (i++ style) the table.
However, if it's somehow possible to sort the table, then you can greatly reduce search times. That is possible because an indexer algorithm that searchs sorted information will not parse every element till it gets to the sought one: it uses auxiliary tables (trees, hashes, etc), usually much faster to search, and then it pinpoints the correct sought element, or at least gets a much closer estimate of where it is in the master table.
Of course, that will come at the expense of having to sort the table, either when you insert or remove elements from it, or when you perform a search.
也许你可以去“google hash”看看他们的实现?虽然它是用C++写的
maybe you can go to 'google hash' and take a look at their implementation? although it is in C++
一旦您增加超过 1MB 或无论您的缓存大小是多少,您可能会遇到太多缓存未命中。
如果您多次迭代表或随机访问元素,您也可能会遇到大量缓存未命中。
http://en.wikipedia.org/wiki/CPU_cache#Cache_Misses
It might be that you have too many cache misses once you increase over 1MB or whatever your cache size is.
If you iterate table multiple times or you access elements randomly you can also hit lot of cache misses.
http://en.wikipedia.org/wiki/CPU_cache#Cache_Misses
嗯,这实际上取决于您对数据的处理方式。如果您必须将整个 kit-and-kabootle 加载到内存中,那么合理的方法是使用较大的批量大小,以便需要发生的 oracle 往返次数较少。
如果您确实没有内存资源来允许将整个结果集加载到内存中,那么大的批量大小仍然有助于减少 Oracle 开销。将合理大小的记录块放入内存中,处理它们,然后获取下一个块。
如果没有有关实际运行时环境和业务目标的更多信息,那么任何人都可以获得最具体的信息。
您能告诉我们更多有关这个问题的信息吗?
Well, it really depends on what you are doing with the data. If you have to load the whole kit-and-kabootle into memory, then a reasonable approach would be to use a large bulk size, so that the number of oracle round trips that need to occur is small.
If you don't really have the memory resources to allow the whole result set to be loaded into memory, then a large bulk size will still help with the Oracle overhead. Get a reasonable size chunk of records into memory, process them, then get the next chunk.
Without more information about your actual run time environment, and business goals, that is about as specific as anyone can get.
Can you tell us more about the issue?