当前位置：文江博客话题详情

Python plotly

RFM讲解——Plotly Python

发布于 2025-01-11 14:41:09 字数 2211 浏览 0 评论 0 原文

我正在kaggle中学习RFM分析，发现了一些有趣的东西。这个用于 RFM 分析的“树形图”结果不是由绘图生成的，不知何故，每个类别的框的大小是由“rfm_坐标”确定的，我不知道它来自哪里。我尝试交换 rfm_坐标值，例如：冠军与风险交换，描述尚未更改，但框的大小没有变化。那么这个预制的“树形图”图表来自哪里？

rfm_coordinates = {"Champions": [3, 5, 0.8, 1],
                   "Loyal Customers": [3, 5, 0.4, 0.8],
                   "Cannot lose them": [4, 5, 0, 0.4],
                   "At-Risk": [2, 4, 0, 0.4],
                   "Hibernating": [0, 2, 0, 0.4],
                   "About To Sleep": [0, 2, 0.4, 0.6],
                   "Promising": [0, 1, 0.6, 0.8],
                   "New Customers": [0, 1, 0.8, 1],
                   "Potential Loyalists": [1, 3, 0.6, 1],
                   "Need Attention": [2, 3, 0.4, 0.6]}

fig, ax = plt.subplots(figsize = (19, 15))

ax.set_xlim([0, 5])
ax.set_ylim([0, 5])

plt.rcParams["axes.facecolor"] = "white"
palette = ["#282828", "#04621B", "#971194", "#F1480F",  "#4C00FF", 
           "#FF007B", "#9736FF", "#8992F3", "#B29800", "#80004C"]

for key, color in zip(rfm_coordinates.keys(), palette[:10]):
    
    coordinates = rfm_coordinates[key]
    ymin, ymax, xmin, xmax = coordinates[0], coordinates[1], coordinates[2], coordinates[3]
    
    ax.axhspan(ymin = ymin, ymax = ymax, xmin = xmin, xmax = xmax, facecolor = color)
    
    users = rfm_table[rfm_table.Segment == key].shape[0]
    users_percentage = (rfm_table[rfm_table.Segment == key].shape[0] / rfm_table.shape[0]) * 100
    avg_monetary = rfm_table[rfm_table.Segment == key]["Monetary"].mean()
    
    user_txt = "\n\nTotal Users: " + str(users) + "(" +  str(round(users_percentage, 2)) + "%)"
    monetary_txt = "\n\n\n\nAverage Monetary: " + str(round(avg_monetary, 2))
    
    x = 5 * (xmin + xmax) / 2
    y = (ymin + ymax) / 2
    
    plt.text(x = x, y = y, s = key, ha = "center", va = "center", fontsize = 18, color = "white", fontweight = "bold")
    plt.text(x = x, y = y, s = user_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    plt.text(x = x, y = y, s = monetary_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    
    ax.set_xlabel("Recency Score")
    ax.set_ylabel("Frequency Score")
    
sns.despine(left = True, bottom = True)
plt.show()

原文

I'm learning RFM Analysis in kaggle, and found something interesting. This 'Treemap' for RFM analysis turn out not generated by plotly, somehow the size of box for each category is determined by 'rfm_coordinates' which is I don't know where it is come from. I try to swap rfm_coordinates value ex: Champions swap with At-Risk, Description yet changing but the size of box is not. So where this premade 'treemap' chart came from?

rfm_coordinates = {"Champions": [3, 5, 0.8, 1],
                   "Loyal Customers": [3, 5, 0.4, 0.8],
                   "Cannot lose them": [4, 5, 0, 0.4],
                   "At-Risk": [2, 4, 0, 0.4],
                   "Hibernating": [0, 2, 0, 0.4],
                   "About To Sleep": [0, 2, 0.4, 0.6],
                   "Promising": [0, 1, 0.6, 0.8],
                   "New Customers": [0, 1, 0.8, 1],
                   "Potential Loyalists": [1, 3, 0.6, 1],
                   "Need Attention": [2, 3, 0.4, 0.6]}

fig, ax = plt.subplots(figsize = (19, 15))

ax.set_xlim([0, 5])
ax.set_ylim([0, 5])

plt.rcParams["axes.facecolor"] = "white"
palette = ["#282828", "#04621B", "#971194", "#F1480F",  "#4C00FF", 
           "#FF007B", "#9736FF", "#8992F3", "#B29800", "#80004C"]

for key, color in zip(rfm_coordinates.keys(), palette[:10]):
    
    coordinates = rfm_coordinates[key]
    ymin, ymax, xmin, xmax = coordinates[0], coordinates[1], coordinates[2], coordinates[3]
    
    ax.axhspan(ymin = ymin, ymax = ymax, xmin = xmin, xmax = xmax, facecolor = color)
    
    users = rfm_table[rfm_table.Segment == key].shape[0]
    users_percentage = (rfm_table[rfm_table.Segment == key].shape[0] / rfm_table.shape[0]) * 100
    avg_monetary = rfm_table[rfm_table.Segment == key]["Monetary"].mean()
    
    user_txt = "\n\nTotal Users: " + str(users) + "(" +  str(round(users_percentage, 2)) + "%)"
    monetary_txt = "\n\n\n\nAverage Monetary: " + str(round(avg_monetary, 2))
    
    x = 5 * (xmin + xmax) / 2
    y = (ymin + ymax) / 2
    
    plt.text(x = x, y = y, s = key, ha = "center", va = "center", fontsize = 18, color = "white", fontweight = "bold")
    plt.text(x = x, y = y, s = user_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    plt.text(x = x, y = y, s = monetary_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    
    ax.set_xlabel("Recency Score")
    ax.set_ylabel("Frequency Score")
    
sns.despine(left = True, bottom = True)
plt.show()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

囍笑 2025-01-18 14:41:09

你说你正在使用kaggle，所以我基于 https://www.kaggle.com/regivm/rfm-analysis-tutorial 并使用了您标记为 plotly 的示例数据，
因此您正在寻找情节解决
这个https://futurice.com/ blog/know-your-customers-with-rfm 参考了如何将 RFM 策略映射到文本分类
所有有关如何将新近度、频率和货币映射到四分位数的示例都使用复杂的策略而不是简单的策略pandas qcut()
块可以通过修改 rfm_coordinates 来移动
- 但是这意味着轴不再代表 RFM 轴
- 您需要记住的是列表代表两件事
  1. 射频值
  2. 网格中块的位置ymin、ymax、xmin、xmax

在图中生成图形，

这取决于正在运行的数据源和 dict rfm_coordinates 已定义
使用 https://plotly.com/python/shapes/ 定义策略框类似的方式我的matplotlib
模式的散点图

import plotly.graph_objects as go

fig = go.Figure()

df_shp = pd.DataFrame(rfm_coordinates).T.rename(
    columns={0: "y0", 1: "y1", 2: "x0", 3: "x1"}
)
df_shp["fillcolor"] = palette
df_shp.loc[:, ["x0", "x1"]] = df_shp.loc[:, ["x0", "x1"]] * 5


for segment, r in df_shp.iterrows():
    fig.add_shape(**r.to_dict(), opacity=0.6)
fig.update_layout(
    xaxis=dict(range=[0, 5], dtick=1, showgrid=False),
    yaxis=dict(range=[0, 5], showgrid=False),
    margin={"l": 0, "r": 0, "b": 0, "t": 0},
    paper_bgcolor="rgba(0,0,0,0)",
    plot_bgcolor="rgba(0,0,0,0)",
)

df_txt = (
    rfm_table.groupby("Segment")
    .agg(avg_monetary=("Monetary", "mean"), number=("Monetary", "size"))
    .join(df_shp, how="right")
    .fillna(0)
)
fig.add_trace(
    go.Scatter(
        x=df_txt.loc[:, ["x0", "x1"]].mean(axis=1),
        y=df_txt.loc[:, ["y0", "y1"]].mean(axis=1),
        text=df_txt.index,
        customdata=df_txt.loc[:, ["avg_monetary", "number"]].astype(int).values,
        mode="text",
        texttemplate="<b>%{text}</b><br>Total Users:%{customdata[1]}<br>Average Monetary:%{customdata[0]}",
    )
)

文本使用文本 >

数据源

import requests
import io
import pandas as pd

# use requests so that there are no issues with utf-8 encoding
df = pd.read_csv(
    io.StringIO(
        requests.get(
            "https://raw.githubusercontent.com/joaolcorreia/RFM-analysis/master/sample-orders.csv"
        ).text
    )
)
df["order_date"] = pd.to_datetime(df["order_date"])
NOW = df["order_date"].max()

rfm = df.groupby("customer").agg(
    recency=("order_date", lambda x: (NOW - x.max()).days),
    frequency=("order_id", "size"),
    Monetary=("grand_total", "sum"),
)

segt_map = {
    r"[1-2][1-2]": "Hibernating",
    r"[1-2][2-3]": "At risk",
    r"[1-2]4": "Can't loose them",
    r"2[1-2]": "About to sleep",
    r"22": "Need attention",
    r"[2-3][3-4]": "Loyal customers",
    r"31": "Promising",
    r"41": "New customers",
    r"[3-4][1-2]": "Potential loyalists",
    r"4[3-4]": "Champions",
}

q = [0, 0.25, 0.5, 0.75, 1]
rfm["R"] = pd.qcut(rfm["recency"], q=q, labels=[1, 2, 3, 4]).astype(int)
rfm["F"] = pd.qcut(rfm["frequency"], q=q, labels=[4, 3, 2, 1]).astype(int)
rfm["M"] = pd.qcut(rfm["Monetary"], q=q, labels=[4, 3, 2, 1]).astype(int)

rfm["Segment"] = (rfm["R"].map(str) + rfm["F"].map(str)).replace(segt_map, regex=True)
rfm_table = rfm
rfm_table

matplotlib 代码

import matplotlib.pyplot as plt
import seaborn as sns

rfm_coordinates = {"Champions": [3, 5, 0.8, 1],
                   "Loyal Customers": [3, 5, 0.4, 0.8],
                   "Cannot lose them": [4, 5, 0, 0.4],
                   "At-Risk": [2, 4, 0, 0.4],
                   "Hibernating": [0, 2, 0, 0.4],
                   "About To Sleep": [0, 2, 0.4, 0.6],
                   "Promising": [0, 1, 0.6, 0.8],
                   "New Customers": [0, 1, 0.8, 1],
                   "Potential Loyalists": [1, 3, 0.6, 1],
                   "Need Attention": [2, 3, 0.4, 0.6]}

fig, ax = plt.subplots(figsize = (19, 15))

ax.set_xlim([0, 5])
ax.set_ylim([0, 5])

plt.rcParams["axes.facecolor"] = "white"
palette = ["#282828", "#04621B", "#971194", "#F1480F",  "#4C00FF", 
           "#FF007B", "#9736FF", "#8992F3", "#B29800", "#80004C"]

for key, color in zip(rfm_coordinates.keys(), palette[:10]):
    
    coordinates = rfm_coordinates[key]
    ymin, ymax, xmin, xmax = coordinates[0], coordinates[1], coordinates[2], coordinates[3]
    
    ax.axhspan(ymin = ymin, ymax = ymax, xmin = xmin, xmax = xmax, facecolor = color)
    
    users = rfm_table[rfm_table.Segment == key].shape[0]
    users_percentage = (rfm_table[rfm_table.Segment == key].shape[0] / rfm_table.shape[0]) * 100
    avg_monetary = rfm_table[rfm_table.Segment == key]["Monetary"].mean()
    
    user_txt = "\n\nTotal Users: " + str(users) + "(" +  str(round(users_percentage, 2)) + "%)"
    monetary_txt = "\n\n\n\nAverage Monetary: " + str(round(avg_monetary, 2))
    
    x = 5 * (xmin + xmax) / 2
    y = (ymin + ymax) / 2
    
    plt.text(x = x, y = y, s = key, ha = "center", va = "center", fontsize = 18, color = "white", fontweight = "bold")
    plt.text(x = x, y = y, s = user_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    plt.text(x = x, y = y, s = monetary_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    
    ax.set_xlabel("Recency Score")
    ax.set_ylabel("Frequency Score")
    
sns.despine(left = True, bottom = True)
plt.show()

you say you are using kaggle so I've based this on https://www.kaggle.com/regivm/rfm-analysis-tutorial and used sample data referenced there
you have tagged plotly so you are looking for a plotly solution
this https://futurice.com/blog/know-your-customers-with-rfm references how to map an RFM strategy to a text classification
all examples of how to map recency, frequency and monetary to a quartile all use a complex strategy rather that simple pandas qcut()
blocks can be moved around by modifying rfm_coordinates
- however this will mean axis no longer represent RFM axes
- all you have to remember is that the list represents two things
  1. RF values
  2. position of block in grid ymin, ymax, xmin, xmax

generate figure in plotly

this depends on data sourcing being run and dict rfm_coordinates being defined
have uses https://plotly.com/python/shapes/ to define boxes for strategies in similar way to matplotlib
text I have uses a scatter with mode of text

import plotly.graph_objects as go

fig = go.Figure()

df_shp = pd.DataFrame(rfm_coordinates).T.rename(
    columns={0: "y0", 1: "y1", 2: "x0", 3: "x1"}
)
df_shp["fillcolor"] = palette
df_shp.loc[:, ["x0", "x1"]] = df_shp.loc[:, ["x0", "x1"]] * 5


for segment, r in df_shp.iterrows():
    fig.add_shape(**r.to_dict(), opacity=0.6)
fig.update_layout(
    xaxis=dict(range=[0, 5], dtick=1, showgrid=False),
    yaxis=dict(range=[0, 5], showgrid=False),
    margin={"l": 0, "r": 0, "b": 0, "t": 0},
    paper_bgcolor="rgba(0,0,0,0)",
    plot_bgcolor="rgba(0,0,0,0)",
)

df_txt = (
    rfm_table.groupby("Segment")
    .agg(avg_monetary=("Monetary", "mean"), number=("Monetary", "size"))
    .join(df_shp, how="right")
    .fillna(0)
)
fig.add_trace(
    go.Scatter(
        x=df_txt.loc[:, ["x0", "x1"]].mean(axis=1),
        y=df_txt.loc[:, ["y0", "y1"]].mean(axis=1),
        text=df_txt.index,
        customdata=df_txt.loc[:, ["avg_monetary", "number"]].astype(int).values,
        mode="text",
        texttemplate="<b>%{text}</b><br>Total Users:%{customdata[1]}<br>Average Monetary:%{customdata[0]}",
    )
)

data sourcing

import requests
import io
import pandas as pd

# use requests so that there are no issues with utf-8 encoding
df = pd.read_csv(
    io.StringIO(
        requests.get(
            "https://raw.githubusercontent.com/joaolcorreia/RFM-analysis/master/sample-orders.csv"
        ).text
    )
)
df["order_date"] = pd.to_datetime(df["order_date"])
NOW = df["order_date"].max()

rfm = df.groupby("customer").agg(
    recency=("order_date", lambda x: (NOW - x.max()).days),
    frequency=("order_id", "size"),
    Monetary=("grand_total", "sum"),
)

segt_map = {
    r"[1-2][1-2]": "Hibernating",
    r"[1-2][2-3]": "At risk",
    r"[1-2]4": "Can't loose them",
    r"2[1-2]": "About to sleep",
    r"22": "Need attention",
    r"[2-3][3-4]": "Loyal customers",
    r"31": "Promising",
    r"41": "New customers",
    r"[3-4][1-2]": "Potential loyalists",
    r"4[3-4]": "Champions",
}

q = [0, 0.25, 0.5, 0.75, 1]
rfm["R"] = pd.qcut(rfm["recency"], q=q, labels=[1, 2, 3, 4]).astype(int)
rfm["F"] = pd.qcut(rfm["frequency"], q=q, labels=[4, 3, 2, 1]).astype(int)
rfm["M"] = pd.qcut(rfm["Monetary"], q=q, labels=[4, 3, 2, 1]).astype(int)

rfm["Segment"] = (rfm["R"].map(str) + rfm["F"].map(str)).replace(segt_map, regex=True)
rfm_table = rfm
rfm_table

matplotlib code

import matplotlib.pyplot as plt
import seaborn as sns

rfm_coordinates = {"Champions": [3, 5, 0.8, 1],
                   "Loyal Customers": [3, 5, 0.4, 0.8],
                   "Cannot lose them": [4, 5, 0, 0.4],
                   "At-Risk": [2, 4, 0, 0.4],
                   "Hibernating": [0, 2, 0, 0.4],
                   "About To Sleep": [0, 2, 0.4, 0.6],
                   "Promising": [0, 1, 0.6, 0.8],
                   "New Customers": [0, 1, 0.8, 1],
                   "Potential Loyalists": [1, 3, 0.6, 1],
                   "Need Attention": [2, 3, 0.4, 0.6]}

fig, ax = plt.subplots(figsize = (19, 15))

ax.set_xlim([0, 5])
ax.set_ylim([0, 5])

plt.rcParams["axes.facecolor"] = "white"
palette = ["#282828", "#04621B", "#971194", "#F1480F",  "#4C00FF", 
           "#FF007B", "#9736FF", "#8992F3", "#B29800", "#80004C"]

for key, color in zip(rfm_coordinates.keys(), palette[:10]):
    
    coordinates = rfm_coordinates[key]
    ymin, ymax, xmin, xmax = coordinates[0], coordinates[1], coordinates[2], coordinates[3]
    
    ax.axhspan(ymin = ymin, ymax = ymax, xmin = xmin, xmax = xmax, facecolor = color)
    
    users = rfm_table[rfm_table.Segment == key].shape[0]
    users_percentage = (rfm_table[rfm_table.Segment == key].shape[0] / rfm_table.shape[0]) * 100
    avg_monetary = rfm_table[rfm_table.Segment == key]["Monetary"].mean()
    
    user_txt = "\n\nTotal Users: " + str(users) + "(" +  str(round(users_percentage, 2)) + "%)"
    monetary_txt = "\n\n\n\nAverage Monetary: " + str(round(avg_monetary, 2))
    
    x = 5 * (xmin + xmax) / 2
    y = (ymin + ymax) / 2
    
    plt.text(x = x, y = y, s = key, ha = "center", va = "center", fontsize = 18, color = "white", fontweight = "bold")
    plt.text(x = x, y = y, s = user_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    plt.text(x = x, y = y, s = monetary_txt, ha = "center", va = "center", fontsize = 14, color = "white")    
    
    ax.set_xlabel("Recency Score")
    ax.set_ylabel("Frequency Score")
    
sns.despine(left = True, bottom = True)
plt.show()

回复收藏 0 原文

~没有更多了~

关于作者

久而酒知

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

RFM讲解——Plotly Python

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

在图中生成图形，

数据源

matplotlib 代码

generate figure in plotly

data sourcing

matplotlib code

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

RFM讲解——Plotly Python

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

在图中生成图形，

数据源

matplotlib 代码

generate figure in plotly

data sourcing

matplotlib code

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。