Spark 构建聚类模型
聚类算法有很多种,MLlib 目前提供了 K-means 聚类算法,该算法将一系列样本分割成 K 个不同的类族(其中 K 是模型的输入参数),目的是最小化所有类族中的方差之和,其形式化的目标函数称为类族内的方差和(within cluster sum of squared errors, WCSS)。
val PATH = "file:///Users/lzz/work/SparkML/"
本章节分为以下几个部分
- 一、从数据中提取正确的特征
- 二、训练聚类模型
- 三、使用聚类模型进行预测
- 四、评估聚类模型的性能
- 五、聚类模型参数调优
一、从数据中提取正确的特征
类似大多数机器学习模型,K-均值聚类需要数值向量作为输入,于是用于分类和回归的特征提取和变换方法也适用于聚类。
K-均值和最小方差回归一样使用方差函数作为优化目标,因此容易受到离群值(outlier)和较大方差的特征影响。
对于回归和分类问题来说,上述问题可以通过特征的归一化和标准化来解决,同时可能有助于提升性能。但是某些情况我们可能不希望数据被标准化,比如根据某个特定的特征找到对应的类族。
从 MovieLens 数据集提取特征
我们使用 ml-100k 电影评分数据集,第一个是电影的打分的数据集(u.data),第二个是用户数据(u.user),第三个是电影数据(u.item)。除此之外,我们从题材文件中获取来每个电影的题材(u.genre)。
输出电影数据集的首行:
val movies = sc.textFile( PATH+"data/ml-100k/u.item")
println( movies.first )
1|Toy Story (1995)|01-Jan-1995|| http://us.imdb.com/M/title-exact?Toy%20Story%20(1995) |0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
1、提取电影的题材标签
在进一步处理之前,我们先从 u.genre 文件中提取题材的映射关系。根据之前对数据集的输出结果来看,需要将题材的数字编号映射到可读的文字版本。查看 u.genre 开始几行数据:
val genres = sc.textFile( PATH + "data/ml-100k/u.genre")
genres.take(5).foreach(println)
unknown|0
Action|1
Adventure|2
Animation|3
Children's|4
上面输出的数字表示相关题材的索引,比如 0 是 unknown 的索引。索引对应来每部电影关于题材的特征二值子向量(既前面数据中 0 和 1)。
为例提取题材的映射关系,我们对每一行数据进行分割,得到具体的<题材,索引>键值对。注意处理过程中需要处理最后的空行,不然会抛出异常
val genreMap = genres.filter( !_.isEmpty ).map( line => line.split("\\|")).
map(array => (array(1),array(0))).collectAsMap
println(genreMap)
Map(2 -> Adventure, 5 -> Comedy, 12 -> Musical, 15 -> Sci-Fi, 8 -> Drama, 18 -> Western, 7 -> Documentary, 17 -> War, 1 -> Action, 4 -> Children's, 11 -> Horror, 14 -> Romance, 6 -> Crime, 0 -> unknown, 9 -> Fantasy, 16 -> Thriller, 3 -> Animation, 10 -> Film-Noir, 13 -> Mystery)
接下来,我们需要为电影数据和题材映射关系创建新的 RDD,其中包含电影 ID,标题和题材。当我们用聚类模型评估每个电影的类别时,可以哟过生成的 RDD 得到可读的输出。
接下来,我们对每部电影提取相应的题材(是 String 形式而不是 Int 索引)。使用 zipWithIndex 方法统计包含题材索引的集合,这样就能将集合中的索引映射到对应的文本信息。最后,输出 RDD 第一条记录:
val titlesAndGenres = movies.map(_.split("\\|")).map{ array =>
val genres = array.toSeq.slice(5, array.size)
val genresAssigned = genres.zipWithIndex.filter{
case (g, idx) =>
g == "1"
}.map{
case (g, idx) => genreMap(idx.toString)
}
(array(0).toInt, (array(1), genresAssigned))
}
println(titlesAndGenres.first)
(1,(Toy Story (1995),ArrayBuffer(Animation, Children's, Comedy)))
2、训练推荐模型
要获取用户和电影因素向量,首先需要训练一个新的推荐模型。
import org.apache.spark.mllib.recommendation.ALS
import org.apache.spark.mllib.recommendation.Rating
val rawData = sc.textFile( PATH + "data/ml-100k/u.data" )
val rawRatings = rawData.map( _.split("\t").take(3) )
val ratings = rawRatings.map{ case Array(user, movie, rating) =>
Rating(user.toInt, movie.toInt, rating.toDouble)
}
ratings.cache
val alsModel = ALS.train(ratings, 50, 10, 0.1)
最小二乘法(Alternating Least Squares,ALS)模型返回例两个键值 RDD(user-Features 和 productFeatures)。这两个 RDD 的键为用户 ID 或者电影 ID,值为相关因素。我们还需要提取相关的因素并转化到 MLlib 的 Vector 中作为聚类模型的训练输入。
import org.apache.spark.mllib.linalg.Vectors
val movieFactors = alsModel.productFeatures.map{
case(id, factor) => (id, Vectors.dense(factor) )
}
val movieVectors = movieFactors.map( _._2 )
val userFactors = alsModel.userFeatures.map {
case( id, factor ) => (id, Vectors.dense(factor) )
}
val userVectors = userFactors.map( _._2 )
3、归一化
观察输入数据的相关因素特征向量的分布,这可以分析出是否需要对训练模型进行归一化,具体做法是用 MLlib 的 RowMatrix 进行各种统计
import org.apache.spark.mllib.linalg.distributed.RowMatrix
val movieMatrix = new RowMatrix( movieVectors )
val movieMatrixSummary = movieMatrix.computeColumnSummaryStatistics()
val userMatrix = new RowMatrix(userVectors)
val userMatrixSummary = userMatrix.computeColumnSummaryStatistics()
println("Movie factors mean: " + movieMatrixSummary.mean )
println( "Movie factors variance: " + movieMatrixSummary.variance )
println( "User factors mean: " + userMatrixSummary.mean )
println( "User factors variance: " + userMatrixSummary.variance )
Movie factors mean: [-0.010956087400287078,-0.3503618170650178,0.06974321778051942,-0.1316383504373148,-0.3175018092607278,-0.1322647106693895,0.2537778433328245,-0.2029801309890879,-0.12961386425209565,-0.07082069156194899,0.06205439440201192,0.29001069000651375,-0.18597369739464903,-0.23937735319862308,-0.14342491847545685,0.18101149542654485,-0.13804024072643173,0.2850367666048985,0.12990755598051892,-0.2940282667527131,0.10936497810207288,-0.056700820289192294,0.008779148148335511,-0.09952976911271119,-0.12124526842150213,-0.1047346330659357,-0.01800057373980534,0.20805219400947975,-0.025770092278983504,-0.16276854660068052,0.14352873442556296,-0.04190886803719547,0.4564002303815088,-0.13991199983838468,0.24892268343949558,0.03683881854085137,-0.12161360237334293,0.05101734977006077,-0.1716588720612823,-0.25528384724181186,-0.24128689111316304,0.3161922110050902,-0.37518932943231786,-0.07739395342787778,-0.033145745157435264,-0.3182618436521061,0.31724803911912525,-0.2905448332232923,0.36738315872593574,0.13193467412994847]
Movie factors variance: [0.02847547379860743,0.034429323937631894,0.03541099641104011,0.02779076987367977,0.0321276449712961,0.029947444675825843,0.03242868550798178,0.039347960853505906,0.02815699974277408,0.02692815457119609,0.02488491712004288,0.02559692541139792,0.023945636981118366,0.031020952075181397,0.03132315027548169,0.035870256785388265,0.02851751964788822,0.04620547992724916,0.03380289844777866,0.031091692410413388,0.021186030887494347,0.029213350564186794,0.050307866701349374,0.044956598276744766,0.0424446420079256,0.03194608330673911,0.026328461762162834,0.0253256641155015,0.031652973406873494,0.030222699930634506,0.031900114062967,0.03872073047068598,0.04991808865595217,0.044734612835475605,0.03364714898105391,0.038311443739419325,0.02410161805983497,0.0401048595214041,0.025789107830994574,0.04632346173485563,0.03552631766769574,0.03783401339895872,0.03159678936979922,0.02683464941234709,0.030330439234670475,0.03772209007708639,0.02665627405364145,0.02749866564144796,0.03200456692029321,0.03650790161620346]
User factors mean: [-0.05751559450488312,-0.4448120511129047,0.0713609331130042,-0.17917024957318325,-0.3826748083004184,-0.19223372122788382,0.3632476031933324,-0.2364812957264952,-0.19962428454376366,-0.07632151798228856,0.08903429789422984,0.383942200858422,-0.27293308765248614,-0.26623289142040263,-0.233437972079052,0.2538256827495698,-0.15522545645197883,0.349404562342147,0.1806732257073824,-0.38893391578094744,0.16593499820393504,-0.06710115633257581,-0.017345290067732368,-0.20949683323916607,-0.1601489291121825,-0.16986250662740388,-0.05781946879504935,0.2865412557388491,-0.024831910570243948,-0.20662699206032092,0.18652401495406898,-0.06649108603970134,0.6535680813987444,-0.21079021362263867,0.3260026183902151,0.06555009494738517,-0.1735054115813684,0.07794975765240933,-0.2775845974124522,-0.29566121177475246,-0.32616511866519926,0.38253079266173795,-0.47239610716929675,-0.12113794657958946,-0.07352200683403164,-0.4026863060385226,0.43411897430897606,-0.38576911478713244,0.48377577518582077,0.2066876481924257]
User factors variance: [0.03006117333724318,0.03354006827825398,0.039108302046688186,0.032201992301106534,0.04506693196383176,0.039036736840331855,0.034931968883547486,0.0505435451564481,0.030504822000355265,0.03324296029782829,0.027961905370123507,0.027063883815182134,0.02615328600544832,0.04179091702487263,0.032054714672808446,0.04922635221829571,0.029109522334038112,0.04271209764723774,0.03775251320871111,0.03389380373914219,0.030460527394892786,0.03476278969008775,0.04752899062613328,0.04007252902950609,0.03922446417861663,0.037339192133838,0.0347846709705567,0.03237472204890549,0.040638649535702114,0.04341519347865484,0.03636010330315978,0.04543226010052758,0.0351695955172831,0.0517739986187769,0.028613794956775195,0.04422945825213233,0.030347073787900896,0.04410405326573856,0.0297825291988751,0.04543785362555905,0.04285539737783578,0.03525584701651021,0.03145131386719714,0.03110984108499151,0.030893034980592086,0.031203931086231408,0.031215879098869487,0.029136075370307876,0.0373928223800029,0.03745186152634357]
二、训练聚类模型
在 Mllib 中训练 K-均值的方法和其他模型类似,只要把包含训练数据的 RDD 传入 KMeans 对象的 train 方法即可。注意,因为聚类不需要标签,所以不用 LabeldPoint 实例,而是使用特征向量接口,既 RDD 的 Vector 数组即可。
用 MovieLens 数据集训练聚类模型
MLlib 的 K-均值提供了随机和 K-means||两种初始化方法,后者是默认的初始化。因为两种方法都试随机选择,所以每次模型训练的结果都不一样。
K-均值通常不能收敛到全局最优解,所以实际应用中需要多次训练并选择最优模型。MLlib 提供了完成多次模型训练的方法。经过损失函数的评估,将性能最好的一次训练选定为最终的模型。
设置模型参数:K(numClusters),最大迭代次数(numIteration),和训练次数(numRuns):
import org.apache.spark.mllib.clustering.KMeans
val numClusters = 5
val numiterations = 10
val numRuns = 3
val movieClusterModel = KMeans.train( movieVectors, numClusters, numiterations, numRuns)
val movieClusterModelConverged = KMeans.train( movieVectors, numClusters, 100)
val userClusterModel = KMeans.train( userVectors, numClusters, numiterations, numRuns)
三、使用聚类模型进行预测
使用训练的 K-means 模型进行预测,下面对单独对样本进行预测:
val movie1 = movieVectors.first
val movieCluster = movieClusterModel.predict(movie1)
println( movieCluster )
3
也可以通过传入一个 RDD[Vector]数组对多个输入样本进行预测:
val predictions = movieClusterModel.predict( movieVectors )
println( predictions.take(10).mkString(",") )
3,1,0,1,0,1,1,0,0,1
用 MovieLens 数据集解释类别预测
前面我们已经介绍了如何对一系列输入数据进行预测,但是如何对预测结果进行评估呢?
解释电影类族
因为 K-means 最小化对目标函数是样本到其类中心对欧拉距离之和,我们便可以将“最靠近类中心”定义为最小的欧拉距离。下面我们定义这个度量函数,注意引入 Breeze 库(MLlib 的一个依赖库)用于线性代数和向量运算:
import breeze.linalg._
import breeze.numerics.pow
def computeDistance( v1: DenseVector[Double], v2: DenseVector[Double] ) = pow( v1 - v2, 2 ).sum
下面我们利用上面的函数对每个电影计算其特征向量与所属类族中心向量的距离。为了让结果具有可读性,输出结果中添加了电影的标题和题材数据:
val titlesWithFactors = titlesAndGenres.join(movieFactors)
val moviesAssigned = titlesWithFactors.map{
case ( id, ((title, genres), vector)) =>
val pred = movieClusterModel.predict( vector )
val clusterCentre = movieClusterModel.clusterCenters( pred )
val dist = computeDistance( DenseVector( clusterCentre.toArray), DenseVector(vector.toArray) )
(id, title, genres.mkString(" "), pred, dist)
}
val clusterAssignments = moviesAssigned.groupBy{
case (id, title, genres, cluster, dist) => cluster
}.collectAsMap
我们得到一个 RDD,其中每个元素是关于某个类族的键值对,健是类族的标识,值是若干电影和相关信息组成的集合。电影的信息为:电影 ID,标题,题材,类别索引,以及电影的特征向量和类中心的距离。
for( (k, v) <- clusterAssignments.toSeq.sortBy(_._1) ){
println( s"Cluster $k:" )
val m = v.toSeq.sortBy( _._5 )
println( m.take(20).map{
case( _, title, genres, _, d) => (title, genres, d)
}.mkString("\n"))
println("---------\n")
}
Cluster 0:
(King of the Hill (1993),Drama,0.16150193866972645)
(Witness (1985),Drama Romance Thriller,0.2274559199018781)
(All Over Me (1997),Drama,0.2629440718963916)
(Scream of Stone (Schrei aus Stein) (1991),Drama,0.26750415112992426)
(Ed's Next Move (1996),Comedy,0.33834019947892024)
(I Can't Sleep (J'ai pas sommeil) (1994),Drama Thriller,0.34065424343733147)
(Wings of Courage (1995),Adventure Romance,0.3574027323494375)
(Silence of the Palace, The (Saimt el Qusur) (1994),Drama,0.3667946097617337)
(Land and Freedom (Tierra y libertad) (1995),War,0.3667946097617337)
(Normal Life (1996),Crime Drama,0.3667946097617337)
(Eighth Day, The (1996),Drama,0.3667946097617337)
(Two Friends (1986) ,Drama,0.3667946097617337)
(Dadetown (1995),Documentary,0.3667946097617337)
(Girls Town (1996),Drama,0.3667946097617337)
(Big One, The (1997),Comedy Documentary,0.3667946097617337)
(Hana-bi (1997),Comedy Crime Drama,0.3667946097617337)
(� k�ldum klaka (Cold Fever) (1994),Comedy Drama,0.3667946097617337)
(All Things Fair (1996),Drama,0.3707388429089414)
(Love and Other Catastrophes (1996),Romance,0.38414798812685813)
(Gate of Heavenly Peace, The (1995),Documentary,0.38636059836541414)
---------
Cluster 1:
(Last Time I Saw Paris, The (1954),Drama,0.14822758457217547)
(Substance of Fire, The (1996),Drama,0.24585534548590937)
(Beans of Egypt, Maine, The (1994),Drama,0.3540939044139249)
(Mamma Roma (1962),Drama,0.36333562926761115)
(Casablanca (1942),Drama Romance War,0.36927649682568275)
(Angel and the Badman (1947),Western,0.3902508632571331)
(African Queen, The (1951),Action Adventure Romance War,0.4238761044379039)
(Cosi (1996),Comedy,0.42555389008515493)
(Wife, The (1995),Comedy Drama,0.42949266835003536)
(They Made Me a Criminal (1939),Crime Drama,0.4296124182245654)
(Spellbound (1945),Mystery Romance Thriller,0.4312065777958864)
(Quiz Show (1994),Drama,0.44127164623983645)
(Vertigo (1958),Mystery Thriller,0.44392755316299)
(Farewell to Arms, A (1932),Romance War,0.4468163574719917)
(Third Man, The (1949),Mystery Thriller,0.4542871822986375)
(Sleepover (1995),Comedy Drama,0.4816254556546338)
(Love Is All There Is (1996),Comedy Drama,0.4816254556546338)
(Century (1993),Drama,0.4816254556546338)
(Blue Angel, The (Blaue Engel, Der) (1930),Drama,0.504428036163756)
(Rear Window (1954),Mystery Thriller,0.5086655758108783)
---------
Cluster 2:
(Amityville: A New Generation (1993),Horror,0.07174667008615719)
(Amityville 1992: It's About Time (1992),Horror,0.07174667008615719)
(Machine, The (1994),Comedy Horror,0.07470914528196525)
(Gordy (1995),Comedy,0.08500223031709787)
(Amityville: Dollhouse (1996),Horror,0.08530715248740668)
(Venice/Venice (1992),Drama,0.09074685321791381)
(Somebody to Love (1994),Drama,0.10815213233906609)
(Boys in Venice (1996),Drama,0.10815213233906609)
(Falling in Love Again (1980),Comedy,0.11821965345110176)
(Babyfever (1994),Comedy Drama,0.11834879322559375)
(Beyond Bedlam (1993),Drama Horror,0.12035274976413142)
(Mighty, The (1998),Drama,0.13176301477426255)
(Getting Away With Murder (1996),Comedy,0.1317752552860245)
(3 Ninjas: High Noon At Mega Mountain (1998),Action Children's,0.1344905921446417)
(Police Story 4: Project S (Chao ji ji hua) (1993),Action,0.13713067224632922)
(Johnny 100 Pesos (1993),Action Drama,0.13871130533890016)
(New Age, The (1994),Drama,0.14310064847546822)
(King of New York (1990),Action Crime,0.14377979158343374)
(Homage (1995),Drama,0.14524644167678138)
(JLG/JLG - autoportrait de d�cembre (1994),Documentary Drama,0.14524644167678138)
---------
Cluster 3:
(Angela (1995),Drama,0.274256147783538)
(Outlaw, The (1943),Western,0.3265778001579165)
(Intimate Relations (1996),Comedy,0.3579099409760045)
(Mr. Wonderful (1993),Comedy Romance,0.3604207967631547)
(Outbreak (1995),Action Drama Thriller,0.39346831918517666)
(Johns (1996),Drama,0.41252121942197745)
(River Wild, The (1994),Action Thriller,0.43155043466730697)
(Commandments (1997),Romance,0.43795404881512345)
(Moonlight and Valentino (1995),Drama Romance,0.44736686806438075)
(Wedding Gift, The (1994),Drama,0.467745563566281)
(Blue Chips (1994),Drama,0.47356275872784537)
(Mr. Jones (1993),Drama Romance,0.4806250366393131)
(Mirage (1995),Action Thriller,0.49324309668351984)
(Abyss, The (1989),Action Adventure Sci-Fi Thriller,0.5104370326093443)
(Prefontaine (1997),Drama,0.512270686088461)
(Touch (1997),Romance,0.5167337205926995)
(Sword in the Stone, The (1963),Animation Children's,0.5179883110738394)
(Courage Under Fire (1996),Drama War,0.5186235113720746)
(Target (1995),Action Drama,0.5208891762845684)
(Apollo 13 (1995),Action Drama Thriller,0.5308890920780177)
---------
Cluster 4:
(Fausto (1993),Comedy,0.4513242477311543)
(Ill Gotten Gains (1997),Drama,0.45505659971631696)
(Stag (1997),Action Thriller,0.4598472078701221)
(Robocop 3 (1993),Sci-Fi Thriller,0.5562657311152144)
(For Love or Money (1993),Comedy,0.5609453108521848)
(Chasers (1994),Comedy,0.5925102836478976)
(Pagemaster, The (1994),Action Adventure Animation Children's Fantasy,0.5944746207862281)
(Day the Sun Turned Cold, The (Tianguo niezi) (1994),Drama,0.6013975218563021)
(Cliffhanger (1993),Action Adventure Crime,0.6089288478833363)
(House Party 3 (1994),Comedy,0.6167182299081162)
(Cops and Robbersons (1994),Comedy,0.6198779188072274)
(Air Up There, The (1994),Comedy,0.6402214987304738)
(Shooter, The (1995),Action,0.6417839012588517)
(Life with Mikey (1993),Comedy,0.6820798678883125)
(Scarlet Letter, The (1926),Drama,0.6847617954346426)
(Farmer & Chase (1995),Comedy,0.706420417137171)
(Window to Paris (1994),Comedy,0.706420417137171)
(Big Bully (1996),Comedy Drama,0.7204497741848093)
(Truth or Consequences, N.M. (1997),Action Crime Romance,0.723805949405134)
(Scout, The (1994),Drama,0.7469999547125613)
---------
四、评估模型的性能
聚类的评估通常分为两个部分:内部评估和外部评估。内部评估表示评估过程使用训练模型时使用的训练数据,外部评估则使用训练数据之外的数据。
内部评估指标
通常的内部评价指标包括 WCSS,Davies-Bouldin 指数、Dunns 指数和轮廓系数(silhouette coefficient)。所有这些度量指标都是使用聚类内部的样本距离尽可能接近,不同类族的样本相对较远。
外部评价指标
因为聚类被认为是无监督分类,如果有一些带标注的数据,便可以用这些标签来评估聚类模型。可以使用聚类模型预测类族(类标签),使用分类模型中类似的方法评估预测值和真实标签的误差(既真假阳性率和真假阴性率)。
具体方法包括 Rand measure、 F-measure、雅卡尔系数(Jaccard index)等。
在 MovieLens 数据集计算性能
MLlib 提供的函数 computeCost 可以方便的计算出给定输入数据的 RDD[Vector]的 WCSS。下面我们使用这个方法计算电影和用户训练数据的性能:
val movieCost = movieClusterModel.computeCost( movieVectors )
val userCost = userClusterModel.computeCost( userVectors )
println( "WCSS for movies: " + movieCost )
println( "WCSS for movies: " + userCost )
WCSS for movies: 2282.7384462397395
WCSS for movies: 1476.204833283481
五、聚类模型参数调优
通过交叉验证选择 K
我们按 6:4 将数据集分割为训练集和测试集,然后在训练集上训练模型,在测试集上评估感兴趣的指标的性能
val trainTestSplitMovies = movieVectors.randomSplit( Array(0.6, 0.4), 123 )
val trainMovies = trainTestSplitMovies(0)
val testMovies = trainTestSplitMovies(1)
val costsMovies = Seq(2 ,3 ,4 ,5 ,10 ,20).map{
k => (k, KMeans.train(trainMovies, numiterations, k, numRuns).computeCost(testMovies) )
}
println( "Movie clustering cross-validation:")
costsMovies.foreach{ case( k, cost) => println(f"WCSS for K=$k id $cost%2.2f") }
Movie clustering cross-validation:
WCSS for K=2 id 884.89
WCSS for K=3 id 867.80
WCSS for K=4 id 872.01
WCSS for K=5 id 879.41
WCSS for K=10 id 863.81
WCSS for K=20 id 860.85
从结果可以看出,随着类中心数目增加,WCSS 值会出现下降,然后又开始增大。另外一个现象,K-means 在交叉验证的情况,WCSS 随着 K 的增大持续减小,但是达到某个值后,下降的速度突然会变得很平缓。这时的 K 通常为最优的 K 值(拐点)。
根据预测结果,我们选择最优的 K=10.需要说明是,模型计算的类族需要人工解释。尽管较大的 K 值从数学的角度可以得到更优的解,但是类族太多就会变得难以理解和解释。
为了实验的完整性,我们还计算了用户聚类在交叉验证下的性能:
val trainTestSplitUsers = userVectors.randomSplit( Array(0.6, 0.4), 123)
val trainUsers = trainTestSplitUsers(0)
val testUsers = trainTestSplitUsers(1)
val costsUsers = Seq( 2, 3, 4, 5 , 10, 20).map{
k => (k, KMeans.train(trainUsers, numiterations, k, numRuns).computeCost(testUsers))
}
println( "User clustering cross-validation:")
costsUsers.foreach{
case(k, cost) => println( f"WCSS for K=$k id $cost%2.2f")
}
User clustering cross-validation:
WCSS for K=2 id 591.78
WCSS for K=3 id 594.46
WCSS for K=4 id 598.01
WCSS for K=5 id 594.43
WCSS for K=10 id 596.87
WCSS for K=20 id 594.06
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
上一篇: FP-Growth 频繁模式增长算法
下一篇: 彻底找到 Tomcat 启动速度慢的元凶
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论