如何获取最频繁的项目
我正在开发一个应用程序,该应用程序有一个包含数字行的大数组,
transNum[20000][200]//this is the 2d array containing the numbers and always keep track of the line numbers
我正在使用嵌套循环来查找最频繁的项目。当
for(int i=0/*,lineitems=0*/;i<lineCounter;i++)
{
for(int j=0,shows=1;j<lineitem1[i];j++)
{
for(int t=i+1;t<lineCounter;t++)
{
for(int s=0;s<lineitem1[t];s++)
{
if(transNum[i][j]==transNum[t][s])
shows++;
}
}
if(shows/lineCounter>=0.2)
{
freItem[i][lineitem2[i]]=transNum[i][j];
lineitem2[i]++;
}
}
}
我使用像 test[200][200] 这样的小输入数组进行测试时,这个循环工作正常并且计算时间是可以接受的,但是当我尝试处理包含 12000 行的数组时,计算时间太长,所以我在想是否有其他方法来计算频繁项而不是使用这个循环。我刚刚在 10688 行上进行了测试,获得所有频繁项的时间是 825805ms,这太昂贵了。
I am working on an application which has a large array containing lines of numbers,
transNum[20000][200]//this is the 2d array containing the numbers and always keep track of the line numbers
I am using a nested loop to look for the most frequent items. which is
for(int i=0/*,lineitems=0*/;i<lineCounter;i++)
{
for(int j=0,shows=1;j<lineitem1[i];j++)
{
for(int t=i+1;t<lineCounter;t++)
{
for(int s=0;s<lineitem1[t];s++)
{
if(transNum[i][j]==transNum[t][s])
shows++;
}
}
if(shows/lineCounter>=0.2)
{
freItem[i][lineitem2[i]]=transNum[i][j];
lineitem2[i]++;
}
}
}
when I was doing tests using small input arrays like test[200][200], this loop works fine and the computing time is acceptable, but when I try to process the array contains 12000 lines, the computing time is too long, so I am thinking if there are other ways to compute the frequent items rather than using this loop.I just ran a test on 10688 lines, and the time to get all the frequent item is 825805ms, which is way to expensive.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
请记住,这最多是一个 O(n^2) 算法,而且可能会更糟。这意味着操作数量与项目数量的平方成正比。经过一定数量的行后,性能将迅速下降,除了改进算法之外,您无能为力。
Bear in mind this is an O(n^2) algorithm at best and could be worse. That means the number of operations is proportional to the count of the items squared. After a certain number of lines, performance will degrade rapidly and there's nothing you can do about it except to improve the algorithm.
取决于您的输入。如果您还在同一代码中插入数据,那么您可以在插入时计算频繁项。
下面是一个伪 C 解决方案:
编辑 #2: 确保将数组中的所有项目初始化为零,以免出现意外结果。
Depends on your input. If you are also inserting the data in the same code then you can count frequent items as you insert them.
Here is a pseudo-C solution:
EDIT #2: Make sure, so you don't get unexpected results, to initialize all the items in the array to zero.
Google 的 Multiset 实现Guava 项目在这种情况下可能会很有用。您可以将项目存储在那里,然后检索包含每次出现次数的值集。
The Multiset implementation from Google Guava project might be useful in such cases. You could store items there and then retrieve set of values with count of each occurrence.
对这个算法进行了一些思考。这是我想出的解决方案:
Gave the algorithm for this one some thought. Here's the solution I came up with: