如何执行K-均值聚类和可视化标签

发布于 2025-02-09 06:18:46 字数 1367 浏览 1 评论 0 原文

我有两个构造的数据框,如以下示例。我想通过将电台代码设置为标签,以将每个土壤站与相对气站相关联并绘制结果,并将其合并并执行K-均值聚类。是否可以?我应该怎么办?

我正在与Python合作,

这是第一个DF


          Air Station Code  Humidity  Temperature
time  
00:06:00        St.1           20         10
00:06:00        St.2            4         15
00:06:00        St.3           16         21
00:06:00        St.4           38          8
00:07:00        St.1           10         18
00:07:00        St.2           40          4
00:07:00        St.3           10         13
00:07:00        St.4           46         11
00:08:00        St.1           28          9
00:08:00        St.2           14         22
00:08:00        St.3            5         40
00:08:00        St.4           11         10
00:09:00        St.1           61         35
00:09:00        St.2           23         29
00:09:00        St.3           35         12
00:09:00        St.4           31          7

,这是第二个

          Soil Station Code  Soil Moisture 
time  
00:06:00        St.1             21         
00:06:00        St.2             40         
00:07:00        St.1             10        
00:07:00        St.2             47         
00:08:00        St.1             18          
00:08:00        St.2             34         
00:09:00        St.1             16       
00:09:00        St.2             30 

I have two dataframes structured like the examples below. I'd like to merge them and perform K-Means clustering by setting the station codes as labels in order to associate each Soil Station to the relative Air Stations and plot the results. Is it possible? What should I do?

I'm working with Python

This is the first df


          Air Station Code  Humidity  Temperature
time  
00:06:00        St.1           20         10
00:06:00        St.2            4         15
00:06:00        St.3           16         21
00:06:00        St.4           38          8
00:07:00        St.1           10         18
00:07:00        St.2           40          4
00:07:00        St.3           10         13
00:07:00        St.4           46         11
00:08:00        St.1           28          9
00:08:00        St.2           14         22
00:08:00        St.3            5         40
00:08:00        St.4           11         10
00:09:00        St.1           61         35
00:09:00        St.2           23         29
00:09:00        St.3           35         12
00:09:00        St.4           31          7

and this is the second

          Soil Station Code  Soil Moisture 
time  
00:06:00        St.1             21         
00:06:00        St.2             40         
00:07:00        St.1             10        
00:07:00        St.2             47         
00:08:00        St.1             18          
00:08:00        St.2             34         
00:09:00        St.1             16       
00:09:00        St.2             30 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

爱*していゐ 2025-02-16 06:18:47

您想合并还是串联?

import numpy as np
import pandas as pd

df1 = pd.DataFrame({
'A':[1,2,3,4],
'B':[True,False,True,True],
'C':['C1','C2','C3','C4']
})

df2 = pd.DataFrame({
'A':[5,7,8,5],
'B':[False,False,True,False],
'C':['C1','C3','C5','C8']
})
    
concat = pd.concat([df1,df2],axis=1)
print(concat)

merge = df1.merge(df2, on='C', how='inner')
print(merge)

3键差异 - 与 - 与数字 - 连接函数之间的

这是一个简单的,可能是规范的,有关如何进行聚类的示例。

import statsmodels.api as sm
import numpy as np
import pandas as pd

mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data
df_cars = pd.DataFrame(mtcars)
df_cars.head()

from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import KMeans
from matplotlib import pyplot

# define dataset
X = df_cars[['mpg','hp']]


# define the model
model = KMeans(n_clusters=8)
# fit the model
model.fit(X)

# assign a cluster to each example
yhat = model.predict(X)

X['kmeans']=yhat

pyplot.scatter(X['mpg'], X['hp'], c=X['kmeans'], cmap='rainbow', s=50, alpha=0.8)

有关其他详细信息,请参见此链接。

Do you want to do a merge, or a concatenation?

import numpy as np
import pandas as pd

df1 = pd.DataFrame({
'A':[1,2,3,4],
'B':[True,False,True,True],
'C':['C1','C2','C3','C4']
})

df2 = pd.DataFrame({
'A':[5,7,8,5],
'B':[False,False,True,False],
'C':['C1','C3','C5','C8']
})
    
concat = pd.concat([df1,df2],axis=1)
print(concat)

enter image description here

merge = df1.merge(df2, on='C', how='inner')
print(merge)

enter image description here

See this link for more info.

https://towardsdatascience.com/3-key-differences-between-merge-and-concat-functions-of-pandas-ab2bab224b59

Here's a simple, and probably canonical, example of how to do clustering.

import statsmodels.api as sm
import numpy as np
import pandas as pd

mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data
df_cars = pd.DataFrame(mtcars)
df_cars.head()

from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import KMeans
from matplotlib import pyplot

# define dataset
X = df_cars[['mpg','hp']]


# define the model
model = KMeans(n_clusters=8)
# fit the model
model.fit(X)

# assign a cluster to each example
yhat = model.predict(X)

X['kmeans']=yhat

pyplot.scatter(X['mpg'], X['hp'], c=X['kmeans'], cmap='rainbow', s=50, alpha=0.8)

See this link for additional details.

https://github.com/ASH-WICUS/Notebooks/blob/master/Clustering%20Algorithms%20Compared.ipynb

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文