计算客户数组中每个客户的中位销售额（新手）

发布于 2024-11-30 14:51:04 字数 187 浏览 2 评论 0原文

我有一个从 csv 文件生成的客户对象数组：

日期、名称、销售额
03/01，阿尔法，110
03/23，阿尔法，25
01/02，测试版，135
...
并需要一种有效的方法来创建一系列具有中位数销售额的新独特客户并将其导出回 csv。可能有多达 500,000 条记录和 100,000 个唯一客户！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

锦欢 2024-12-07 14:51:04

- 将源数据拆分为每个客户的集合。

对于每个客户：

--按销售额排序

--如果记录数为奇数，则返回中间索引处的销售额

--如果记录数为偶数，则返回中间两侧记录的平均值

-将返回的记录放入你的结果数组。

回复收藏 0 原文

嘿看小鸭子会跑 2024-12-07 14:51:04

在这种情况下，我会使用：

Dictionary<string, List<int>> dict;

键是客户名称（假设它们是唯一的，否则分配某种唯一的 ID？）
这些值是每个客户的销售额列表。填充此数组后，您可以继续排序并获取中间元素（如上所述）~~或求和并除以元素数量以获得中位数。~~（这是错误的）

排序（使用比较的方法需要 O(nlog n) 时间，其中 n 是要排序的列表的长度。

有一些选择算法可以返回 O(n) 中的第 k 个最小值，请查看下面的维基百科链接

In cases like this I would use:

Dictionary<string, List<int>> dict;

The keys are the customer names (assuming they are unique, otherwise assign a unique ID of some sort?)
The values are lists of sales for each customer. After you have filled this array you may procceed in either sorting and getting the middle element (as mentioned above) ~~or summing and dividing by the number of elements to get the median.~~ (this is wrong)

Sorting (using a method which compares) takes O(nlog n) time where n is the length of the list to be sorted.

There are selection algorithms which can return the kth smallest value in O(n), check wikipedia link below

回复收藏 0 原文

~没有更多了~