价格过滤分组算法
我正在创建一个电子商务网站,但我在开发一种好的算法来将从数据库中提取的产品分类到适当的组中时遇到困难。我尝试过将最高价格简单地分为 4 份,然后将每组都以此为基础。我还尝试了基于平均值的标准差。两者都可能导致任何产品都不会落入的价格范围,这不是一个有用的过滤选项。
我也尝试过对产品进行四分位数,但我的问题是价格范围从 1 美元到 4,000 美元不等。 4,000 美元几乎永远不会卖出去,而且也远没有那么重要,但它们一直在扭曲我的结果。
有什么想法吗?我应该在统计类中更加注意...
更新:
我最终结合了一些方法。我使用了四分位数/桶方法,但通过对某些范围进行硬编码来对其进行了一些修改,在这些范围内会出现更多数量的价格组。
//Price range algorithm
sort($prices);
//Divide the number of prices into four groups
$quartilelength = count($prices)/4;
//Round to the nearest ...
$simplifier = 10;
//Get the total range of the prices
$range = max($prices)-min($prices);
//Assuming we actually are working with multiple prices
if ($range>0 )
{
// If there is a decent spread in price, and there are a decent number of prices, give more price groups
if ($range>20 && count($prices) > 10)
{
$priceranges[0] = floor($prices[floor($quartilelength)]/$simplifier)*$simplifier;
}
// Always grab the median price
$priceranges[1] = floor($prices[floor($quartilelength*2)]/$simplifier)*$simplifier;
// If there is a decent spread in price, and there are a decent number of prices, give more price groups
if ($range>20 && count($this->data->prices) > 10)
{
$priceranges[2] = floor($prices[floor($quartilelength*3)]/$simplifier)*$simplifier;
}
}
I am creating an ecommerce site, and I am having trouble developing a good algorithm to sort a products that are pulled from the database into halfway appropriate groups. I have tried simply dividing the highest price into 4, and basing each group off that. I also tried standard deviations based around the mean. Both could result with price ranges that no product would fall into, which isn't a useful filtering option.
I also tried take quartiles of the products, but my problem is that the price ranges from $1 items to $4,000. The $4,000 almost never sell, and are far less important, but they keep skewing my results.
Any thoughts? I should have paid more attention in stats class ...
Update:
I ended up combining methods a bit. I used the quartile/bucket method, but hacked it a bit by hardcoding certain ranges within which a greater number of price groups would appear.
//Price range algorithm
sort($prices);
//Divide the number of prices into four groups
$quartilelength = count($prices)/4;
//Round to the nearest ...
$simplifier = 10;
//Get the total range of the prices
$range = max($prices)-min($prices);
//Assuming we actually are working with multiple prices
if ($range>0 )
{
// If there is a decent spread in price, and there are a decent number of prices, give more price groups
if ($range>20 && count($prices) > 10)
{
$priceranges[0] = floor($prices[floor($quartilelength)]/$simplifier)*$simplifier;
}
// Always grab the median price
$priceranges[1] = floor($prices[floor($quartilelength*2)]/$simplifier)*$simplifier;
// If there is a decent spread in price, and there are a decent number of prices, give more price groups
if ($range>20 && count($this->data->prices) > 10)
{
$priceranges[2] = floor($prices[floor($quartilelength*3)]/$simplifier)*$simplifier;
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是一个想法:基本上,您可以将价格分为 10 个桶,每个价格作为数组中的键,该值是给定价格点有多少产品的计数:
从结果中,您可以使用重置和end 获取每个桶的最小/最大
有点蛮力,但可能有用......
Here is an idea: basically you would sort the price into buckets of 10, each price as the key in the array, the value is a count of how many products are at the given price point:
From the result, you can use reset and end to get the min/max of each bucket
Kinda brute force, but might be useful...
按照我的评论思路,这是一个想法:
我假设您有一组产品,每个产品都标有价格和销量估计值(占总销售额的百分比)。首先,按价格对所有产品进行排序。接下来开始拆分:遍历有序列表,累加销量。每次达到大约 25% 时,就从那里削减。如果这样做 3 次,将导致 4 个子集具有不相交的价格范围和相似的销量。
Here is an idea, following the line of thought of my comment:
I assume you have a set of products, each of them tagged by a price and a sales volume estimate (as a percent from the total sales). First, sort all products by their price. Next, start splitting: traverse the ordered list, and accumulate sales volume. Each time you reach about 25%, cut there. If you do so 3 times, it will result in 4 subsets having disjoint price ranges, and a similar sales volume.
您到底想要什么作为最终结果(您能给我们一个分组示例)吗?如果您的唯一目标是让所有群体拥有大量足够重要的产品,那么,即使您想出了适用于当前数据集的完美算法,也不意味着它适用于明天的数据集。根据您需要的组集数量,我将简单地创建适合您需求的任意组,而不是使用算法。前任。 (1 美元 - 25 美元、25-100 美元、100 美元以上)。从消费者的角度来看,我的头脑自然地将产品分为 3 个不同的价格类别(廉价、中档和昂贵)。
What exactly are you looking for as your end result (could you give us an example grouping)? If your only goal is for all groups to have a significant number of important enough products, then, even if you come up with the perfect algorithm that works for your current data set that does not mean it will work with tomorrow's dataset. Depending on the number of sets of groups you need I would simply make arbitrary groups that fit your needs instead of using an algorithm. Ex. ($1 - $25, $25-100, $100+). From a consumer's perspective my mind naturally distributes products into 3 difference price categories (cheap, midrange and expensive).
我觉得你想太多了。
如果您了解您的产品,并且喜欢细粒度的结果,我会简单地对这些价格范围进行硬编码。
如果您认为 1 到 10 美元对您所销售的产品有意义,请将其放入,您不需要算法。只需进行检查,以便仅显示有结果的范围。
如果你不了解你的产品,我会按照价格对所有产品进行排序,并将其分为 4 组,产品数量相等。
I think you're thinking too much.
If you know your products, and you like fine grained results, I would simply hard code those price ranges.
If you think $1 to $10 makes sense for what you are selling, put it in, you don't need an algorithm. Just do a check so that you only show ranges that have results.
If you don't know your products, I would just sort all the products by price, and divide it into 4 groups of equal number of products.