当股价彼此相差 0.5% 以内时将它们分组
感谢您的回答,我以前没有使用过 StackOverflow,所以我对答案的数量和速度感到惊讶 - 太棒了。
我还没有正确地完成答案,但我认为我应该在问题规范中添加一些信息。请参见下图。
我无法在此发布图像,因为我没有足够的积分,但您可以看到图像 在 http://journal.acquitane.com/2010-01-20/image003 .jpg
这张图片可能更准确地描述了我想要实现的目标。因此,您可以在页面上的水平线上看到图表上的价格点。现在,您可以得到每条线的 0.5% 以内的聚类,这被认为是一件好事,也是我为什么要自动识别这些聚类的原因。您可以在图表上看到 S2 和 S2 处有一个簇。 MR1、R2 和WPP1。
因此,我每天都会生成这些价格点,然后我可以手动识别那些在 0.5% 以内的价格点。 - 但这个问题的目的是如何用 python 例程来做到这一点。
我再次用标签复制了该列表(见下文)。请注意,标价点与图像中的价格点不匹配,因为它们来自不同的两天。
[YR3,175.24,8] [SR3,147.85,6] [YR2,144.13,8] [SR2,130.44,6] [YR1,127.79,8] [QR3,127.42,5] [SR1,120.94,6] [QR2,120.22,5] [MR3,118.10,3] [WR3,116.73,2] [DR3,116.23,1] [WR2,115.93,2] [QR1,115.83,5] [MR2,115.56,3] [DR2,115.53,1] [WR1,114.79,2] [DR1,114.59,1] [WPP,113.99,2] [民进党,113.89,1] [MR1,113.50,3] [DS1,112.95,1] [WS1,112.85,2] [DS2,112.25,1] [WS2,112.05,2] [DS3,111.31,1] [MPP,110.97,3] [WS3,110.91,2] [50MA,110.87,4] [MS1,108.91,3] [QPP,108.64,5] [MS2,106.37,3] [MS3,104.31,3] [QS1,104.25,5] [SPP,103.53,6] [200MA,99.42,7] [QS2,97.05,5] [YPP,96.68,8] [SS1,94.03,6] [QS3,92.66,5] [YS1,80.34,8] [SS2,76.62,6] [SS3,67.12,6] [YS2,49.23,8] [YS3,32.89,8]
我确实在原始列表中犯了一个错误,C 组是错误的,不应该被包括在内。感谢您指出这一点。
此外,0.5% 并不是固定的,该值会每天发生变化,但我只是使用 0.5% 作为说明问题的示例。
再次感谢。 马克
PS。我现在就开始检查答案。
你好:
我需要对股价进行一些操纵。我刚刚开始使用Python(但我认为用任何语言实现它都会遇到困难)。我正在寻找一些关于如何在 python 中很好地实现这一点的想法。
谢谢 标记
问题: 我有一个列表列表(FloorLevels(见下文)),其中子列表有两个项目(股票价格、重量)。我想当股价彼此相差 0.5% 以内时将它们分组。一个团体的实力将由其总重量决定。例如:
Group-A
115.93,2
115.83,5
115.56,3
115.53,1
-------------
TotalWeight:12
-------------
Group-B
113.50,3
112.95,1
112.85,2
-------------
TotalWeight:6
-------------
FloorLevels[
[175.24,8]
[147.85,6]
[144.13,8]
[130.44,6]
[127.79,8]
[127.42,5]
[120.94,6]
[120.22,5]
[118.10,3]
[116.73,2]
[116.23,1]
[115.93,2]
[115.83,5]
[115.56,3]
[115.53,1]
[114.79,2]
[114.59,1]
[113.99,2]
[113.89,1]
[113.50,3]
[112.95,1]
[112.85,2]
[112.25,1]
[112.05,2]
[111.31,1]
[110.97,3]
[110.91,2]
[110.87,4]
[108.91,3]
[108.64,5]
[106.37,3]
[104.31,3]
[104.25,5]
[103.53,6]
[99.42,7]
[97.05,5]
[96.68,8]
[94.03,6]
[92.66,5]
[80.34,8]
[76.62,6]
[67.12,6]
[49.23,8]
[32.89,8]
]
Thanks for the answers, I have not used StackOverflow before so I was suprised by the number of answers and the speed of them - its fantastic.
I have not been through the answers properly yet, but thought I should add some information to the problem specification. See the image below.
I can't post an image in this because i don't have enough points but you can see an image
at http://journal.acquitane.com/2010-01-20/image003.jpg
This image may describe more closely what I'm trying to achieve. So you can see on the horizontal lines across the page are price points on the chart. Now where you get a clustering of lines within 0.5% of each, this is considered to be a good thing and why I want to identify those clusters automatically. You can see on the chart that there is a cluster at S2 & MR1, R2 & WPP1.
So everyday I produce these price points and then I can identify manually those that are within 0.5%. - but the purpose of this question is how to do it with a python routine.
I have reproduced the list again (see below) with labels. Just be aware that the list price points don't match the price points in the image because they are from two different days.
[YR3,175.24,8]
[SR3,147.85,6]
[YR2,144.13,8]
[SR2,130.44,6]
[YR1,127.79,8]
[QR3,127.42,5]
[SR1,120.94,6]
[QR2,120.22,5]
[MR3,118.10,3]
[WR3,116.73,2]
[DR3,116.23,1]
[WR2,115.93,2]
[QR1,115.83,5]
[MR2,115.56,3]
[DR2,115.53,1]
[WR1,114.79,2]
[DR1,114.59,1]
[WPP,113.99,2]
[DPP,113.89,1]
[MR1,113.50,3]
[DS1,112.95,1]
[WS1,112.85,2]
[DS2,112.25,1]
[WS2,112.05,2]
[DS3,111.31,1]
[MPP,110.97,3]
[WS3,110.91,2]
[50MA,110.87,4]
[MS1,108.91,3]
[QPP,108.64,5]
[MS2,106.37,3]
[MS3,104.31,3]
[QS1,104.25,5]
[SPP,103.53,6]
[200MA,99.42,7]
[QS2,97.05,5]
[YPP,96.68,8]
[SS1,94.03,6]
[QS3,92.66,5]
[YS1,80.34,8]
[SS2,76.62,6]
[SS3,67.12,6]
[YS2,49.23,8]
[YS3,32.89,8]
I did make a mistake with the original list in that Group C is wrong and should not be included. Thanks for pointing that out.
Also the 0.5% is not fixed this value will change from day to day, but I have just used 0.5% as an example for spec'ing the problem.
Thanks Again.
Mark
PS. I will get cracking on checking the answers now now.
Hi:
I need to do some manipulation of stock prices. I have just started using Python, (but I think I would have trouble implementing this in any language). I'm looking for some ideas on how to implement this nicely in python.
Thanks
Mark
Problem:
I have a list of lists (FloorLevels (see below)) where the sublist has two items (stockprice, weight). I want to put the stockprices into groups when they are within 0.5% of each other. A groups strength will be determined by its total weight. For example:
Group-A
115.93,2
115.83,5
115.56,3
115.53,1
-------------
TotalWeight:12
-------------
Group-B
113.50,3
112.95,1
112.85,2
-------------
TotalWeight:6
-------------
FloorLevels[
[175.24,8]
[147.85,6]
[144.13,8]
[130.44,6]
[127.79,8]
[127.42,5]
[120.94,6]
[120.22,5]
[118.10,3]
[116.73,2]
[116.23,1]
[115.93,2]
[115.83,5]
[115.56,3]
[115.53,1]
[114.79,2]
[114.59,1]
[113.99,2]
[113.89,1]
[113.50,3]
[112.95,1]
[112.85,2]
[112.25,1]
[112.05,2]
[111.31,1]
[110.97,3]
[110.91,2]
[110.87,4]
[108.91,3]
[108.64,5]
[106.37,3]
[104.31,3]
[104.25,5]
[103.53,6]
[99.42,7]
[97.05,5]
[96.68,8]
[94.03,6]
[92.66,5]
[80.34,8]
[76.62,6]
[67.12,6]
[49.23,8]
[32.89,8]
]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我建议重复使用 k-means 聚类——我们简称为 KMC 。 KMC 是一种简单而强大的聚类算法...但它需要“被告知”您的目标是多少个聚类,
k
。您事先并不知道这一点(如果我理解正确的话)——您只需要最小的k
,这样“聚集在一起”的两个项目就不会超过X%
> 彼此分开。因此,从k
等于1
开始——所有东西都聚集在一起,不需要聚类过程;-)——并检查簇的直径(簇的“直径”,从几何学术语的使用来看,是簇中任意两个成员之间的最大距离)。如果直径是
> X%
,设置k += 1
,以k
为簇数执行KMC,并迭代地重复检查。在伪代码中:
当然假设我们有合适的
diameter
和Kmc
Python 函数。这听起来是你想要的吗?如果是这样,那么我们可以继续向您展示如何编写
diameter
和Kmc
(如果您的items
数量相对有限,则使用纯 Python > 来处理,否则也许可以通过利用强大的第三方附加框架,例如 numpy )——但是如果您确实想要一些完全不同的东西,那么就不值得这么麻烦,因此进行此检查!-)I suggest a repeated use of k-means clustering -- let's call it KMC for short. KMC is a simple and powerful clustering algorithm... but it needs to "be told" how many clusters,
k
, you're aiming for. You don't know that in advance (if I understand you correctly) -- you just want the smallestk
such that no two items "clustered together" are more thanX%
apart from each other. So, start withk
equal1
-- everything bunched together, no clustering pass needed;-) -- and check the diameter of the cluster (a cluster's "diameter", from the use of the term in geometry, is the largest distance between any two members of a cluster).If the diameter is
> X%
, setk += 1
, perform KMC withk
as the number of clusters, and repeat the check, iteratively.In pseudo-code:
assuming of course we have suitable
diameter
andKmc
Python functions.Does this sound like the kind of thing you want? If so, then we can move on to show you how to write
diameter
andKmc
(in pure Python if you have a relatively limited number ofitems
to deal with, otherwise maybe by exploiting powerful third-party add-on frameworks such asnumpy
) -- but it's not worthwhile to go to such trouble if you actually want something pretty different, whence this check!-)如果对于
G
中的每只股票t
,s
,则股票s
属于组G
code> * 1.05 >=t
和s
/ 1.05 <=t
,对吧?我们如何将股票添加到每个组中?如果我们有股票 95、100、101 和 105,并且我们以 100 开始一组,然后添加 101,我们最终将得到 {100, 101, 105}。如果我们在 100 之后执行 95,我们最终会得到 {100, 95}。
我们只需要考虑所有可能的排列吗?如果是这样,你的算法将会效率低下。
A stock
s
belong in a groupG
if for each stockt
inG
,s
* 1.05 >=t
ands
/ 1.05 <=t
, right?How do we add the stocks to each group? If we have the stocks 95, 100, 101, and 105, and we start a group with 100, then add 101, we will end up with {100, 101, 105}. If we did 95 after 100, we'd end up with {100, 95}.
Do we just need to consider all possible permutations? If so, your algorithm is going to be inefficient.
您需要更详细地说明您的问题。 “当股价相差0.5%以内时进行分组”到底是什么意思?
可能性:
(1) 组中的每个成员与组中其他每个成员的误差在 0.5% 以内
(2) 对列表进行排序,并在差距大于 0.5% 的地方进行拆分
请注意,116.23 与 115.93 的误差在 0.5% 以内 -- <代码>abs((116.23 / 115.93 - 1) * 100) < 0.5 -- 但是您在 A 组中放入了一个数字,在 C 组中放入了一个数字。
简单的示例:
a, b, c = (0.996, 1, 1.004)
... 请注意a 和 b 适合,b 和 c 适合,但 a 和 c 不适合。您希望如何将它们分组,为什么?输入列表中的顺序相关吗?可能性 (1) 产生 ab,c 或 a,bc ...打破平局规则,请
可能性 (2) 产生 abc (没有大间隙,所以只有一组)
You need to specify your problem in more detail. Just what does "put the stockprices into groups when they are within 0.5% of each other" mean?
Possibilities:
(1) each member of the group is within 0.5% of every other member of the group
(2) sort the list and split it where the gap is more than 0.5%
Note that 116.23 is within 0.5% of 115.93 --
abs((116.23 / 115.93 - 1) * 100) < 0.5
-- but you have put one number in Group A and one in Group C.Simple example:
a, b, c = (0.996, 1, 1.004)
... Note that a and b fit, b and c fit, but a and c don't fit. How do you want them grouped, and why? Is the order in the input list relevant?Possibility (1) produces ab,c or a,bc ... tie-breaking rule, please
Possibility (2) produces abc (no big gaps, so only one group)
你将无法将它们分类为严格的“组”。如果您有价格 (1.0,1.05, 1.1),那么第一个和第二个应该在同一组中,第二个和第三个应该在同一组中,但第一个和第三个不应该在同一组中。
一种快速但肮脏的方法来做一些你可能会觉得有用的事情:
用法:
You won't be able to classify them into hard "groups". If you have prices (1.0,1.05, 1.1) then the first and second should be in the same group, and the second and third should be in the same group, but not the first and third.
A quick, dirty way to do something that you might find useful:
Usage:
对于一组给定的股票价格,可能有不止一种方法可以对彼此相差在 0.5% 以内的股票进行分组。如果没有一些额外的价格分组规则,就无法确保答案能够满足您的真正需求。
For a given set of stock prices, there is probably more than one way to group stocks that are within 0.5% of each other. Without some additional rules for grouping the prices, there's no way to be sure an answer will do what you really want.
除了选择适合的值的正确方法之外,这是一个问题,只要加入一点面向对象就可以使它更容易处理。
我在这里创建了两个类,具有最少的理想行为,但这可以使分类变得更加容易 - 您可以在“组”类上使用它的一个点。
我可以看到下面的代码是不正确的,从某种意义上说,组包含的限制随着新成员的添加而变化——即使分离标准保持不变,您也必须重写 get_groups 方法以使用多遍方法。这应该不会很难——但是代码太长了,在这里没有什么帮助,我认为这个片段足以让你继续下去:
测试:
apart from the proper way to pick which values fit together, this is a problem where a little Object Orientation dropped in can make it a lot easier to deal with.
I made two classes here, with a minimum of desirable behaviors, but which can make the classification a lot easier -- you get a single point to play with it on the Group class.
I can see the code bellow is incorrect, in the sense the limtis for group inclusion varies as new members are added -- even it the separation crieteria remaisn teh same, you heva e torewrite the get_groups method to use a multi-pass approach. It should nto be hard -- but the code would be too long to be helpfull here, and i think this snipped is enoguh to get you going:
testing:
对于分组元素,您可以使用 itertools.groupby() 吗?当数据被排序时,很多分组工作已经完成,然后您可以测试迭代中的当前值是否与上一个值不同 <0.5%,并让 itertools.groupby() 进入每次你的函数返回 false 时都会创建一个新组。
For the grouping element, could you use itertools.groupby()? As the data is sorted, a lot of the work of grouping it is already done, and then you could test if the current value in the iteration was different to the last by <0.5%, and have itertools.groupby() break into a new group every time your function returned false.