如何优化此 Bokeh CustomJS 图以便更快地加载更改？

发布于 2025-01-09 12:18:41 字数 5167 浏览 2 评论 0原文

这是我使用 Bokeh 2.3.2 制作的绘图的 MRE。实际数据集包含大约一百个项目，每个项目最多有 10 个任务，并存储在大约 40k 行的 df 中。当我将此图保存为 .html 时，文件大小接近 100k KB。每次用户通过下拉菜单选择一个新项目时，加载时间可能会超过 10 秒。我花了相当多的时间让 CustomJS 代码按现在的方式工作，但希望尽可能减少加载时间。关于如何做到这一点有什么想法吗？ Bokeh 服务器目前不是一个选项。

import itertools
from time import strftime
from datetime import datetime
import pandas as pd
from pandas import Timestamp
from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.models import GroupFilter, CDSView
from bokeh.models import CustomJS
from bokeh.models import Legend, LegendItem
from bokeh.models.widgets import Select
from bokeh.layouts import row, column
from bokeh.palettes import Set1 as palette

data = {'date': {0: Timestamp('2021-09-26 00:00:00'),
  1: Timestamp('2021-09-26 00:00:00'),
  2: Timestamp('2021-09-26 00:00:00'),
  3: Timestamp('2021-10-03 00:00:00'),
  4: Timestamp('2021-10-03 00:00:00'),
  5: Timestamp('2021-10-03 00:00:00'),
  6: Timestamp('2021-10-10 00:00:00'),
  7: Timestamp('2021-10-10 00:00:00'),
  8: Timestamp('2021-10-10 00:00:00'),
  9: Timestamp('2021-10-03 00:00:00'),
  10: Timestamp('2021-10-03 00:00:00'),
  11: Timestamp('2021-09-26 00:00:00'),
  12: Timestamp('2021-09-26 00:00:00'),
  13: Timestamp('2021-10-10 00:00:00'),
  14: Timestamp('2021-10-10 00:00:00')},
 'TB': {0: 4.1,
  1: 8.5,
  2: 0.2,
  3: 0.2,
  4: 5.1,
  5: 8.5,
  6: 8.5,
  7: 6.1,
  8: 0.2,
  9: 7.0,
  10: 12.5,
  11: 5.9,
  12: 10.1,
  13: 6.9,
  14: 12.6},
 'project': {0: 'Project_A',
  1: 'Project_A',
  2: 'Project_A',
  3: 'Project_A',
  4: 'Project_A',
  5: 'Project_A',
  6: 'Project_A',
  7: 'Project_A',
  8: 'Project_A',
  9: 'Project_B',
  10: 'Project_B',
  11: 'Project_B',
  12: 'Project_B',
  13: 'Project_B',
  14: 'Project_B'},
 'project_sub': {0: 'TASK_1',
  1: 'TASK_2',
  2: 'TASK_3',
  3: 'TASK_3',
  4: 'TASK_1',
  5: 'TASK_2',
  6: 'TASK_2',
  7: 'TASK_1',
  8: 'TASK_3',
  9: 'TASK_2',
  10: 'TASK_1',
  11: 'TASK_2',
  12: 'TASK_1',
  13: 'TASK_2',
  14: 'TASK_1'}}

df = pd.DataFrame(data)

dfp_dict = {}

for p in df['project'].unique().tolist():
    df_slice = df.loc[df['project'].isin([p])]
    dfp = df_slice.pivot_table(values='TB', index=df_slice.date, columns='project_sub', aggfunc='max')
    dfp.columns = [p + ', ' + x for x in dfp.columns]
    dfp['project'] = p
    dfp_dict[p] = dfp

# make one big df where each col is 'project name, task'
df_cat = pd.concat([dfp_dict[x] for x in list(dfp_dict)], axis=0)

# this is used for varea plot
df_cat['y1'] = 0

# move project column to end, making it easier to iterate over only columns we want to plot
df_cat = df_cat[[ col for col in df_cat.columns if col != 'project'] + ['project']]

# bokeh datetime axis misbehaving, so give it date string for y axis 
df_cat["date_string"] = df_cat.index.strftime("%Y-%m-%d")

#make a nice list of contrasting colors that is big enough for all possible tasks
colors = itertools.cycle(palette[9])    

color_list = []

for color in zip(range(df_cat.shape[1]), colors):
    color_list.append(color[1])

y1 = range(df_cat.shape[0])

source = ColumnDataSource(df_cat)

p = figure(plot_height = 500,
               plot_width = 900,
           x_axis_type="datetime",
           sizing_mode="scale_both",
           active_drag="pan")

project_filter = GroupFilter(column_name='project',
                             group=df_cat['project'].unique().tolist()[0])

# this creates a subset of df_cat by selecting rows from the project column that match dropdown
view = CDSView(source=source, filters=[project_filter])

custom_js_dict = {}
legend_list = []

for i, col in enumerate(df_cat.columns[0:-3]):
    x = p.varea(x='date', y1='y1', y2=col, source=source, view=view, name=col, alpha=0.5, color=color_list[i])
    # save the column name and its renderer so we can create bokeh Legend items
    legend_list.append((col, [x]))


p.yaxis.axis_label="TB"
p.xaxis.axis_label = "Date"

custom_js_dict['f'] = view.filters[0]
custom_js_dict['p'] = p
custom_js_dict['view'] = view
custom_js_dict['projects'] = df_cat['project'].unique().tolist()


# add each project's legend items to the custom_js_dict as bokeh legend item objects
for x in df_cat['project'].unique():
    custom_js_dict[x] = Legend(items=[y for y in legend_list if y[0].split(',')[0] == x])


p.add_layout(custom_js_dict[df_cat['project'].unique().tolist()[0]])
p.legend.click_policy="hide"
p.legend.label_text_font_size = "10pt"


for f in view.filters:
    f.js_on_change('group', CustomJS(args=custom_js_dict,
                                     code="""

            const allProjects = projects

            for (var element of allProjects) {
              if(element === f.group){
              eval(element).visible = true;
              p.add_layout(eval(f.group));
              eval(f.group).click_policy="hide"
              } else {eval(element).visible = false;
              }
                }

            view.properties.filters.change.emit()                     
            """))

project_select = Select(title="Project:", value="", options=df['project'].unique().tolist())
project_select.js_link('value', project_filter, 'group')


show(row(p, column(project_select)))

原文

Here is a MRE for a plot I made using Bokeh 2.3.2. The actual data set contains around a hundred projects which each have up to 10 tasks and are stored in a df with apprixmately 40k rows. When I save this plot to .html, the file sizes are near 100k KB. Each time users pick a new project via the dropdown, load times can be greater than 10 seconds. I spent quite a bit of time getting the CustomJS code to work how it is now, but would like to reduce load time if possible. Any ideas on how to do this? Bokeh Server is not an option at this time.

import itertools
from time import strftime
from datetime import datetime
import pandas as pd
from pandas import Timestamp
from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.models import GroupFilter, CDSView
from bokeh.models import CustomJS
from bokeh.models import Legend, LegendItem
from bokeh.models.widgets import Select
from bokeh.layouts import row, column
from bokeh.palettes import Set1 as palette

data = {'date': {0: Timestamp('2021-09-26 00:00:00'),
  1: Timestamp('2021-09-26 00:00:00'),
  2: Timestamp('2021-09-26 00:00:00'),
  3: Timestamp('2021-10-03 00:00:00'),
  4: Timestamp('2021-10-03 00:00:00'),
  5: Timestamp('2021-10-03 00:00:00'),
  6: Timestamp('2021-10-10 00:00:00'),
  7: Timestamp('2021-10-10 00:00:00'),
  8: Timestamp('2021-10-10 00:00:00'),
  9: Timestamp('2021-10-03 00:00:00'),
  10: Timestamp('2021-10-03 00:00:00'),
  11: Timestamp('2021-09-26 00:00:00'),
  12: Timestamp('2021-09-26 00:00:00'),
  13: Timestamp('2021-10-10 00:00:00'),
  14: Timestamp('2021-10-10 00:00:00')},
 'TB': {0: 4.1,
  1: 8.5,
  2: 0.2,
  3: 0.2,
  4: 5.1,
  5: 8.5,
  6: 8.5,
  7: 6.1,
  8: 0.2,
  9: 7.0,
  10: 12.5,
  11: 5.9,
  12: 10.1,
  13: 6.9,
  14: 12.6},
 'project': {0: 'Project_A',
  1: 'Project_A',
  2: 'Project_A',
  3: 'Project_A',
  4: 'Project_A',
  5: 'Project_A',
  6: 'Project_A',
  7: 'Project_A',
  8: 'Project_A',
  9: 'Project_B',
  10: 'Project_B',
  11: 'Project_B',
  12: 'Project_B',
  13: 'Project_B',
  14: 'Project_B'},
 'project_sub': {0: 'TASK_1',
  1: 'TASK_2',
  2: 'TASK_3',
  3: 'TASK_3',
  4: 'TASK_1',
  5: 'TASK_2',
  6: 'TASK_2',
  7: 'TASK_1',
  8: 'TASK_3',
  9: 'TASK_2',
  10: 'TASK_1',
  11: 'TASK_2',
  12: 'TASK_1',
  13: 'TASK_2',
  14: 'TASK_1'}}

df = pd.DataFrame(data)

dfp_dict = {}

for p in df['project'].unique().tolist():
    df_slice = df.loc[df['project'].isin([p])]
    dfp = df_slice.pivot_table(values='TB', index=df_slice.date, columns='project_sub', aggfunc='max')
    dfp.columns = [p + ', ' + x for x in dfp.columns]
    dfp['project'] = p
    dfp_dict[p] = dfp

# make one big df where each col is 'project name, task'
df_cat = pd.concat([dfp_dict[x] for x in list(dfp_dict)], axis=0)

# this is used for varea plot
df_cat['y1'] = 0

# move project column to end, making it easier to iterate over only columns we want to plot
df_cat = df_cat[[ col for col in df_cat.columns if col != 'project'] + ['project']]

# bokeh datetime axis misbehaving, so give it date string for y axis 
df_cat["date_string"] = df_cat.index.strftime("%Y-%m-%d")

#make a nice list of contrasting colors that is big enough for all possible tasks
colors = itertools.cycle(palette[9])    

color_list = []

for color in zip(range(df_cat.shape[1]), colors):
    color_list.append(color[1])

y1 = range(df_cat.shape[0])

source = ColumnDataSource(df_cat)

p = figure(plot_height = 500,
               plot_width = 900,
           x_axis_type="datetime",
           sizing_mode="scale_both",
           active_drag="pan")

project_filter = GroupFilter(column_name='project',
                             group=df_cat['project'].unique().tolist()[0])

# this creates a subset of df_cat by selecting rows from the project column that match dropdown
view = CDSView(source=source, filters=[project_filter])

custom_js_dict = {}
legend_list = []

for i, col in enumerate(df_cat.columns[0:-3]):
    x = p.varea(x='date', y1='y1', y2=col, source=source, view=view, name=col, alpha=0.5, color=color_list[i])
    # save the column name and its renderer so we can create bokeh Legend items
    legend_list.append((col, [x]))


p.yaxis.axis_label="TB"
p.xaxis.axis_label = "Date"

custom_js_dict['f'] = view.filters[0]
custom_js_dict['p'] = p
custom_js_dict['view'] = view
custom_js_dict['projects'] = df_cat['project'].unique().tolist()


# add each project's legend items to the custom_js_dict as bokeh legend item objects
for x in df_cat['project'].unique():
    custom_js_dict[x] = Legend(items=[y for y in legend_list if y[0].split(',')[0] == x])


p.add_layout(custom_js_dict[df_cat['project'].unique().tolist()[0]])
p.legend.click_policy="hide"
p.legend.label_text_font_size = "10pt"


for f in view.filters:
    f.js_on_change('group', CustomJS(args=custom_js_dict,
                                     code="""

            const allProjects = projects

            for (var element of allProjects) {
              if(element === f.group){
              eval(element).visible = true;
              p.add_layout(eval(f.group));
              eval(f.group).click_policy="hide"
              } else {eval(element).visible = false;
              }
                }

            view.properties.filters.change.emit()                     
            """))

project_select = Select(title="Project:", value="", options=df['project'].unique().tolist())
project_select.js_link('value', project_filter, 'group')


show(row(p, column(project_select)))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

幸福％小乖 2025-01-16 12:18:41

我会以不同的方式组织代码，以避免在 CustomJS 中添加布局和 eval 调用。这是代码的相关部分：

projects = list(df_cat['project'].unique())

legends = {
    project : Legend(items=[y for y in legend_list if y[0].split(',')[0] == project],
                     click_policy="hide", visible=False, label_text_font_size = "10pt")
    for project in projects
}

for legend in legends.values():
    p.add_layout(legend)
legends[projects[0]].visible = True

custom_js_dict['legends'] = legends
custom_js_dict['f'] = view.filters[0]
custom_js_dict['view'] = view
custom_js_dict['projects'] = projects

for f in view.filters:
    f.js_on_change('group', CustomJS(args=custom_js_dict,
                                     code="""
            for (const project of projects) {
              legends[project].visible = project === f.group;
            }
            view.properties.filters.change.emit()
            """))

project_select = Select(title="Project:", value=projects[0], options=projects)
project_select.js_link('value', project_filter, 'group')

这与原始代码的工作方式相同，AFAICT，并且与我能想象的一样精简。如果在完整数据集上仍然表现不佳，那么您需要以某种方式合成（或链接到）可用于在本地实际重现情况的数据（和/或更新 MRE 以包括丢失的任何重要的相关细节）。

I would organize the code differently, to avoid the adding layouts and eval calls in the CustomJS. Here is the relevant portion of the code:

projects = list(df_cat['project'].unique())

legends = {
    project : Legend(items=[y for y in legend_list if y[0].split(',')[0] == project],
                     click_policy="hide", visible=False, label_text_font_size = "10pt")
    for project in projects
}

for legend in legends.values():
    p.add_layout(legend)
legends[projects[0]].visible = True

custom_js_dict['legends'] = legends
custom_js_dict['f'] = view.filters[0]
custom_js_dict['view'] = view
custom_js_dict['projects'] = projects

for f in view.filters:
    f.js_on_change('group', CustomJS(args=custom_js_dict,
                                     code="""
            for (const project of projects) {
              legends[project].visible = project === f.group;
            }
            view.properties.filters.change.emit()
            """))

project_select = Select(title="Project:", value=projects[0], options=projects)
project_select.js_link('value', project_filter, 'group')

This works the same as the original code does, AFAICT, and is about as stripped-down as I can imagine making it. If this still performs poorly with the full data set then you would need to somehow synthesize (or link to) data that can be used to actually reproduce the situation locally (and/or update the MRE to include whatever important pertinent details are missing).

回复收藏 0 原文

~没有更多了~