RFM讲解——Plotly Python

发布于 2025-01-11 14:41:09 字数 2211 浏览 0 评论 0 原文

我正在kaggle中学习RFM分析,发现了一些有趣的东西。这个用于 RFM 分析的“树形图”结果不是由绘图生成的,不知何故,每个类别的框的大小是由“rfm_坐标”确定的,我不知道它来自哪里。我尝试交换 rfm_坐标值,例如:冠军与风险交换,描述尚未更改,但框的大小没有变化。那么这个预制的“树形图”图表来自哪里?

rfm_coordinates = {"Champions": [3, 5, 0.8, 1],
                   "Loyal Customers": [3, 5, 0.4, 0.8],
                   "Cannot lose them": [4, 5, 0, 0.4],
                   "At-Risk": [2, 4, 0, 0.4],
                   "Hibernating": [0, 2, 0, 0.4],
                   "About To Sleep": [0, 2, 0.4, 0.6],
                   "Promising": [0, 1, 0.6, 0.8],
                   "New Customers": [0, 1, 0.8, 1],
                   "Potential Loyalists": [1, 3, 0.6, 1],
                   "Need Attention": [2, 3, 0.4, 0.6]}

fig, ax = plt.subplots(figsize = (19, 15))

ax.set_xlim([0, 5])
ax.set_ylim([0, 5])

plt.rcParams["axes.facecolor"] = "white"
palette = ["#282828", "#04621B", "#971194", "#F1480F",  "#4C00FF", 
           "#FF007B", "#9736FF", "#8992F3", "#B29800", "#80004C"]

for key, color in zip(rfm_coordinates.keys(), palette[:10]):
    
    coordinates = rfm_coordinates[key]
    ymin, ymax, xmin, xmax = coordinates[0], coordinates[1], coordinates[2], coordinates[3]
    
    ax.axhspan(ymin = ymin, ymax = ymax, xmin = xmin, xmax = xmax, facecolor = color)
    
    users = rfm_table[rfm_table.Segment == key].shape[0]
    users_percentage = (rfm_table[rfm_table.Segment == key].shape[0] / rfm_table.shape[0]) * 100
    avg_monetary = rfm_table[rfm_table.Segment == key]["Monetary"].mean()
    
    user_txt = "\n\nTotal Users: " + str(users) + "(" +  str(round(users_percentage, 2)) + "%)"
    monetary_txt = "\n\n\n\nAverage Monetary: " + str(round(avg_monetary, 2))
    
    x = 5 * (xmin + xmax) / 2
    y = (ymin + ymax) / 2
    
    plt.text(x = x, y = y, s = key, ha = "center", va = "center", fontsize = 18, color = "white", fontweight = "bold")
    plt.text(x = x, y = y, s = user_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    plt.text(x = x, y = y, s = monetary_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    
    ax.set_xlabel("Recency Score")
    ax.set_ylabel("Frequency Score")
    
sns.despine(left = True, bottom = True)
plt.show()

I'm learning RFM Analysis in kaggle, and found something interesting. This 'Treemap' for RFM analysis turn out not generated by plotly, somehow the size of box for each category is determined by 'rfm_coordinates' which is I don't know where it is come from. I try to swap rfm_coordinates value ex: Champions swap with At-Risk, Description yet changing but the size of box is not. So where this premade 'treemap' chart came from?

rfm_coordinates = {"Champions": [3, 5, 0.8, 1],
                   "Loyal Customers": [3, 5, 0.4, 0.8],
                   "Cannot lose them": [4, 5, 0, 0.4],
                   "At-Risk": [2, 4, 0, 0.4],
                   "Hibernating": [0, 2, 0, 0.4],
                   "About To Sleep": [0, 2, 0.4, 0.6],
                   "Promising": [0, 1, 0.6, 0.8],
                   "New Customers": [0, 1, 0.8, 1],
                   "Potential Loyalists": [1, 3, 0.6, 1],
                   "Need Attention": [2, 3, 0.4, 0.6]}

fig, ax = plt.subplots(figsize = (19, 15))

ax.set_xlim([0, 5])
ax.set_ylim([0, 5])

plt.rcParams["axes.facecolor"] = "white"
palette = ["#282828", "#04621B", "#971194", "#F1480F",  "#4C00FF", 
           "#FF007B", "#9736FF", "#8992F3", "#B29800", "#80004C"]

for key, color in zip(rfm_coordinates.keys(), palette[:10]):
    
    coordinates = rfm_coordinates[key]
    ymin, ymax, xmin, xmax = coordinates[0], coordinates[1], coordinates[2], coordinates[3]
    
    ax.axhspan(ymin = ymin, ymax = ymax, xmin = xmin, xmax = xmax, facecolor = color)
    
    users = rfm_table[rfm_table.Segment == key].shape[0]
    users_percentage = (rfm_table[rfm_table.Segment == key].shape[0] / rfm_table.shape[0]) * 100
    avg_monetary = rfm_table[rfm_table.Segment == key]["Monetary"].mean()
    
    user_txt = "\n\nTotal Users: " + str(users) + "(" +  str(round(users_percentage, 2)) + "%)"
    monetary_txt = "\n\n\n\nAverage Monetary: " + str(round(avg_monetary, 2))
    
    x = 5 * (xmin + xmax) / 2
    y = (ymin + ymax) / 2
    
    plt.text(x = x, y = y, s = key, ha = "center", va = "center", fontsize = 18, color = "white", fontweight = "bold")
    plt.text(x = x, y = y, s = user_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    plt.text(x = x, y = y, s = monetary_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    
    ax.set_xlabel("Recency Score")
    ax.set_ylabel("Frequency Score")
    
sns.despine(left = True, bottom = True)
plt.show()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

囍笑 2025-01-18 14:41:09
  • 你说你正在使用kaggle,所以我基于 https://www.kaggle.com/regivm/rfm-analysis-tutorial 并使用了您标记为 plotly 的示例数据,
  • 因此您正在寻找情节解决
  • 这个https://futurice.com/ blog/know-your-customers-with-rfm 参考了如何将 RFM 策略映射到文本分类
  • 所有有关如何将新近度、频率和货币映射到四分位数的示例都使用复杂的策略而不是简单的策略pandas qcut()
  • 块可以通过修改 rfm_coordinates 来移动
    • 但是这意味着轴不再代表 RFM 轴
    • 您需要记住的是列表代表两件事
      1. 射频值
      2. 网格中块的位置ymin、ymax、xmin、xmax

在图中生成图形,

  • 这取决于正在运行的数据源和 dict rfm_coordinates 已定义
  • 使用 https://plotly.com/python/shapes/ 定义策略框类似的方式 我的ma​​tplotlib
  • 模式 的散点图
import plotly.graph_objects as go

fig = go.Figure()

df_shp = pd.DataFrame(rfm_coordinates).T.rename(
    columns={0: "y0", 1: "y1", 2: "x0", 3: "x1"}
)
df_shp["fillcolor"] = palette
df_shp.loc[:, ["x0", "x1"]] = df_shp.loc[:, ["x0", "x1"]] * 5


for segment, r in df_shp.iterrows():
    fig.add_shape(**r.to_dict(), opacity=0.6)
fig.update_layout(
    xaxis=dict(range=[0, 5], dtick=1, showgrid=False),
    yaxis=dict(range=[0, 5], showgrid=False),
    margin={"l": 0, "r": 0, "b": 0, "t": 0},
    paper_bgcolor="rgba(0,0,0,0)",
    plot_bgcolor="rgba(0,0,0,0)",
)

df_txt = (
    rfm_table.groupby("Segment")
    .agg(avg_monetary=("Monetary", "mean"), number=("Monetary", "size"))
    .join(df_shp, how="right")
    .fillna(0)
)
fig.add_trace(
    go.Scatter(
        x=df_txt.loc[:, ["x0", "x1"]].mean(axis=1),
        y=df_txt.loc[:, ["y0", "y1"]].mean(axis=1),
        text=df_txt.index,
        customdata=df_txt.loc[:, ["avg_monetary", "number"]].astype(int).values,
        mode="text",
        texttemplate="<b>%{text}</b><br>Total Users:%{customdata[1]}<br>Average Monetary:%{customdata[0]}",
    )
)

文本使用文本 >在此处输入图像描述

数据源

import requests
import io
import pandas as pd

# use requests so that there are no issues with utf-8 encoding
df = pd.read_csv(
    io.StringIO(
        requests.get(
            "https://raw.githubusercontent.com/joaolcorreia/RFM-analysis/master/sample-orders.csv"
        ).text
    )
)
df["order_date"] = pd.to_datetime(df["order_date"])
NOW = df["order_date"].max()

rfm = df.groupby("customer").agg(
    recency=("order_date", lambda x: (NOW - x.max()).days),
    frequency=("order_id", "size"),
    Monetary=("grand_total", "sum"),
)

segt_map = {
    r"[1-2][1-2]": "Hibernating",
    r"[1-2][2-3]": "At risk",
    r"[1-2]4": "Can't loose them",
    r"2[1-2]": "About to sleep",
    r"22": "Need attention",
    r"[2-3][3-4]": "Loyal customers",
    r"31": "Promising",
    r"41": "New customers",
    r"[3-4][1-2]": "Potential loyalists",
    r"4[3-4]": "Champions",
}

q = [0, 0.25, 0.5, 0.75, 1]
rfm["R"] = pd.qcut(rfm["recency"], q=q, labels=[1, 2, 3, 4]).astype(int)
rfm["F"] = pd.qcut(rfm["frequency"], q=q, labels=[4, 3, 2, 1]).astype(int)
rfm["M"] = pd.qcut(rfm["Monetary"], q=q, labels=[4, 3, 2, 1]).astype(int)

rfm["Segment"] = (rfm["R"].map(str) + rfm["F"].map(str)).replace(segt_map, regex=True)
rfm_table = rfm
rfm_table

matplotlib 代码

import matplotlib.pyplot as plt
import seaborn as sns

rfm_coordinates = {"Champions": [3, 5, 0.8, 1],
                   "Loyal Customers": [3, 5, 0.4, 0.8],
                   "Cannot lose them": [4, 5, 0, 0.4],
                   "At-Risk": [2, 4, 0, 0.4],
                   "Hibernating": [0, 2, 0, 0.4],
                   "About To Sleep": [0, 2, 0.4, 0.6],
                   "Promising": [0, 1, 0.6, 0.8],
                   "New Customers": [0, 1, 0.8, 1],
                   "Potential Loyalists": [1, 3, 0.6, 1],
                   "Need Attention": [2, 3, 0.4, 0.6]}

fig, ax = plt.subplots(figsize = (19, 15))

ax.set_xlim([0, 5])
ax.set_ylim([0, 5])

plt.rcParams["axes.facecolor"] = "white"
palette = ["#282828", "#04621B", "#971194", "#F1480F",  "#4C00FF", 
           "#FF007B", "#9736FF", "#8992F3", "#B29800", "#80004C"]

for key, color in zip(rfm_coordinates.keys(), palette[:10]):
    
    coordinates = rfm_coordinates[key]
    ymin, ymax, xmin, xmax = coordinates[0], coordinates[1], coordinates[2], coordinates[3]
    
    ax.axhspan(ymin = ymin, ymax = ymax, xmin = xmin, xmax = xmax, facecolor = color)
    
    users = rfm_table[rfm_table.Segment == key].shape[0]
    users_percentage = (rfm_table[rfm_table.Segment == key].shape[0] / rfm_table.shape[0]) * 100
    avg_monetary = rfm_table[rfm_table.Segment == key]["Monetary"].mean()
    
    user_txt = "\n\nTotal Users: " + str(users) + "(" +  str(round(users_percentage, 2)) + "%)"
    monetary_txt = "\n\n\n\nAverage Monetary: " + str(round(avg_monetary, 2))
    
    x = 5 * (xmin + xmax) / 2
    y = (ymin + ymax) / 2
    
    plt.text(x = x, y = y, s = key, ha = "center", va = "center", fontsize = 18, color = "white", fontweight = "bold")
    plt.text(x = x, y = y, s = user_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    plt.text(x = x, y = y, s = monetary_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    
    ax.set_xlabel("Recency Score")
    ax.set_ylabel("Frequency Score")
    
sns.despine(left = True, bottom = True)
plt.show()
  • you say you are using kaggle so I've based this on https://www.kaggle.com/regivm/rfm-analysis-tutorial and used sample data referenced there
  • you have tagged plotly so you are looking for a plotly solution
  • this https://futurice.com/blog/know-your-customers-with-rfm references how to map an RFM strategy to a text classification
  • all examples of how to map recency, frequency and monetary to a quartile all use a complex strategy rather that simple pandas qcut()
  • blocks can be moved around by modifying rfm_coordinates
    • however this will mean axis no longer represent RFM axes
    • all you have to remember is that the list represents two things
      1. RF values
      2. position of block in grid ymin, ymax, xmin, xmax

generate figure in plotly

  • this depends on data sourcing being run and dict rfm_coordinates being defined
  • have uses https://plotly.com/python/shapes/ to define boxes for strategies in similar way to matplotlib
  • text I have uses a scatter with mode of text
import plotly.graph_objects as go

fig = go.Figure()

df_shp = pd.DataFrame(rfm_coordinates).T.rename(
    columns={0: "y0", 1: "y1", 2: "x0", 3: "x1"}
)
df_shp["fillcolor"] = palette
df_shp.loc[:, ["x0", "x1"]] = df_shp.loc[:, ["x0", "x1"]] * 5


for segment, r in df_shp.iterrows():
    fig.add_shape(**r.to_dict(), opacity=0.6)
fig.update_layout(
    xaxis=dict(range=[0, 5], dtick=1, showgrid=False),
    yaxis=dict(range=[0, 5], showgrid=False),
    margin={"l": 0, "r": 0, "b": 0, "t": 0},
    paper_bgcolor="rgba(0,0,0,0)",
    plot_bgcolor="rgba(0,0,0,0)",
)

df_txt = (
    rfm_table.groupby("Segment")
    .agg(avg_monetary=("Monetary", "mean"), number=("Monetary", "size"))
    .join(df_shp, how="right")
    .fillna(0)
)
fig.add_trace(
    go.Scatter(
        x=df_txt.loc[:, ["x0", "x1"]].mean(axis=1),
        y=df_txt.loc[:, ["y0", "y1"]].mean(axis=1),
        text=df_txt.index,
        customdata=df_txt.loc[:, ["avg_monetary", "number"]].astype(int).values,
        mode="text",
        texttemplate="<b>%{text}</b><br>Total Users:%{customdata[1]}<br>Average Monetary:%{customdata[0]}",
    )
)

enter image description here

data sourcing

import requests
import io
import pandas as pd

# use requests so that there are no issues with utf-8 encoding
df = pd.read_csv(
    io.StringIO(
        requests.get(
            "https://raw.githubusercontent.com/joaolcorreia/RFM-analysis/master/sample-orders.csv"
        ).text
    )
)
df["order_date"] = pd.to_datetime(df["order_date"])
NOW = df["order_date"].max()

rfm = df.groupby("customer").agg(
    recency=("order_date", lambda x: (NOW - x.max()).days),
    frequency=("order_id", "size"),
    Monetary=("grand_total", "sum"),
)

segt_map = {
    r"[1-2][1-2]": "Hibernating",
    r"[1-2][2-3]": "At risk",
    r"[1-2]4": "Can't loose them",
    r"2[1-2]": "About to sleep",
    r"22": "Need attention",
    r"[2-3][3-4]": "Loyal customers",
    r"31": "Promising",
    r"41": "New customers",
    r"[3-4][1-2]": "Potential loyalists",
    r"4[3-4]": "Champions",
}

q = [0, 0.25, 0.5, 0.75, 1]
rfm["R"] = pd.qcut(rfm["recency"], q=q, labels=[1, 2, 3, 4]).astype(int)
rfm["F"] = pd.qcut(rfm["frequency"], q=q, labels=[4, 3, 2, 1]).astype(int)
rfm["M"] = pd.qcut(rfm["Monetary"], q=q, labels=[4, 3, 2, 1]).astype(int)

rfm["Segment"] = (rfm["R"].map(str) + rfm["F"].map(str)).replace(segt_map, regex=True)
rfm_table = rfm
rfm_table

matplotlib code

import matplotlib.pyplot as plt
import seaborn as sns

rfm_coordinates = {"Champions": [3, 5, 0.8, 1],
                   "Loyal Customers": [3, 5, 0.4, 0.8],
                   "Cannot lose them": [4, 5, 0, 0.4],
                   "At-Risk": [2, 4, 0, 0.4],
                   "Hibernating": [0, 2, 0, 0.4],
                   "About To Sleep": [0, 2, 0.4, 0.6],
                   "Promising": [0, 1, 0.6, 0.8],
                   "New Customers": [0, 1, 0.8, 1],
                   "Potential Loyalists": [1, 3, 0.6, 1],
                   "Need Attention": [2, 3, 0.4, 0.6]}

fig, ax = plt.subplots(figsize = (19, 15))

ax.set_xlim([0, 5])
ax.set_ylim([0, 5])

plt.rcParams["axes.facecolor"] = "white"
palette = ["#282828", "#04621B", "#971194", "#F1480F",  "#4C00FF", 
           "#FF007B", "#9736FF", "#8992F3", "#B29800", "#80004C"]

for key, color in zip(rfm_coordinates.keys(), palette[:10]):
    
    coordinates = rfm_coordinates[key]
    ymin, ymax, xmin, xmax = coordinates[0], coordinates[1], coordinates[2], coordinates[3]
    
    ax.axhspan(ymin = ymin, ymax = ymax, xmin = xmin, xmax = xmax, facecolor = color)
    
    users = rfm_table[rfm_table.Segment == key].shape[0]
    users_percentage = (rfm_table[rfm_table.Segment == key].shape[0] / rfm_table.shape[0]) * 100
    avg_monetary = rfm_table[rfm_table.Segment == key]["Monetary"].mean()
    
    user_txt = "\n\nTotal Users: " + str(users) + "(" +  str(round(users_percentage, 2)) + "%)"
    monetary_txt = "\n\n\n\nAverage Monetary: " + str(round(avg_monetary, 2))
    
    x = 5 * (xmin + xmax) / 2
    y = (ymin + ymax) / 2
    
    plt.text(x = x, y = y, s = key, ha = "center", va = "center", fontsize = 18, color = "white", fontweight = "bold")
    plt.text(x = x, y = y, s = user_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    plt.text(x = x, y = y, s = monetary_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    
    ax.set_xlabel("Recency Score")
    ax.set_ylabel("Frequency Score")
    
sns.despine(left = True, bottom = True)
plt.show()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文