如何优化此 Bokeh CustomJS 图以便更快地加载更改?
这是我使用 Bokeh 2.3.2 制作的绘图的 MRE。实际数据集包含大约一百个项目,每个项目最多有 10 个任务,并存储在大约 40k 行的 df 中。当我将此图保存为 .html 时,文件大小接近 100k KB。每次用户通过下拉菜单选择一个新项目时,加载时间可能会超过 10 秒。我花了相当多的时间让 CustomJS 代码按现在的方式工作,但希望尽可能减少加载时间。关于如何做到这一点有什么想法吗? Bokeh 服务器目前不是一个选项。
import itertools
from time import strftime
from datetime import datetime
import pandas as pd
from pandas import Timestamp
from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.models import GroupFilter, CDSView
from bokeh.models import CustomJS
from bokeh.models import Legend, LegendItem
from bokeh.models.widgets import Select
from bokeh.layouts import row, column
from bokeh.palettes import Set1 as palette
data = {'date': {0: Timestamp('2021-09-26 00:00:00'),
1: Timestamp('2021-09-26 00:00:00'),
2: Timestamp('2021-09-26 00:00:00'),
3: Timestamp('2021-10-03 00:00:00'),
4: Timestamp('2021-10-03 00:00:00'),
5: Timestamp('2021-10-03 00:00:00'),
6: Timestamp('2021-10-10 00:00:00'),
7: Timestamp('2021-10-10 00:00:00'),
8: Timestamp('2021-10-10 00:00:00'),
9: Timestamp('2021-10-03 00:00:00'),
10: Timestamp('2021-10-03 00:00:00'),
11: Timestamp('2021-09-26 00:00:00'),
12: Timestamp('2021-09-26 00:00:00'),
13: Timestamp('2021-10-10 00:00:00'),
14: Timestamp('2021-10-10 00:00:00')},
'TB': {0: 4.1,
1: 8.5,
2: 0.2,
3: 0.2,
4: 5.1,
5: 8.5,
6: 8.5,
7: 6.1,
8: 0.2,
9: 7.0,
10: 12.5,
11: 5.9,
12: 10.1,
13: 6.9,
14: 12.6},
'project': {0: 'Project_A',
1: 'Project_A',
2: 'Project_A',
3: 'Project_A',
4: 'Project_A',
5: 'Project_A',
6: 'Project_A',
7: 'Project_A',
8: 'Project_A',
9: 'Project_B',
10: 'Project_B',
11: 'Project_B',
12: 'Project_B',
13: 'Project_B',
14: 'Project_B'},
'project_sub': {0: 'TASK_1',
1: 'TASK_2',
2: 'TASK_3',
3: 'TASK_3',
4: 'TASK_1',
5: 'TASK_2',
6: 'TASK_2',
7: 'TASK_1',
8: 'TASK_3',
9: 'TASK_2',
10: 'TASK_1',
11: 'TASK_2',
12: 'TASK_1',
13: 'TASK_2',
14: 'TASK_1'}}
df = pd.DataFrame(data)
dfp_dict = {}
for p in df['project'].unique().tolist():
df_slice = df.loc[df['project'].isin([p])]
dfp = df_slice.pivot_table(values='TB', index=df_slice.date, columns='project_sub', aggfunc='max')
dfp.columns = [p + ', ' + x for x in dfp.columns]
dfp['project'] = p
dfp_dict[p] = dfp
# make one big df where each col is 'project name, task'
df_cat = pd.concat([dfp_dict[x] for x in list(dfp_dict)], axis=0)
# this is used for varea plot
df_cat['y1'] = 0
# move project column to end, making it easier to iterate over only columns we want to plot
df_cat = df_cat[[ col for col in df_cat.columns if col != 'project'] + ['project']]
# bokeh datetime axis misbehaving, so give it date string for y axis
df_cat["date_string"] = df_cat.index.strftime("%Y-%m-%d")
#make a nice list of contrasting colors that is big enough for all possible tasks
colors = itertools.cycle(palette[9])
color_list = []
for color in zip(range(df_cat.shape[1]), colors):
color_list.append(color[1])
y1 = range(df_cat.shape[0])
source = ColumnDataSource(df_cat)
p = figure(plot_height = 500,
plot_width = 900,
x_axis_type="datetime",
sizing_mode="scale_both",
active_drag="pan")
project_filter = GroupFilter(column_name='project',
group=df_cat['project'].unique().tolist()[0])
# this creates a subset of df_cat by selecting rows from the project column that match dropdown
view = CDSView(source=source, filters=[project_filter])
custom_js_dict = {}
legend_list = []
for i, col in enumerate(df_cat.columns[0:-3]):
x = p.varea(x='date', y1='y1', y2=col, source=source, view=view, name=col, alpha=0.5, color=color_list[i])
# save the column name and its renderer so we can create bokeh Legend items
legend_list.append((col, [x]))
p.yaxis.axis_label="TB"
p.xaxis.axis_label = "Date"
custom_js_dict['f'] = view.filters[0]
custom_js_dict['p'] = p
custom_js_dict['view'] = view
custom_js_dict['projects'] = df_cat['project'].unique().tolist()
# add each project's legend items to the custom_js_dict as bokeh legend item objects
for x in df_cat['project'].unique():
custom_js_dict[x] = Legend(items=[y for y in legend_list if y[0].split(',')[0] == x])
p.add_layout(custom_js_dict[df_cat['project'].unique().tolist()[0]])
p.legend.click_policy="hide"
p.legend.label_text_font_size = "10pt"
for f in view.filters:
f.js_on_change('group', CustomJS(args=custom_js_dict,
code="""
const allProjects = projects
for (var element of allProjects) {
if(element === f.group){
eval(element).visible = true;
p.add_layout(eval(f.group));
eval(f.group).click_policy="hide"
} else {eval(element).visible = false;
}
}
view.properties.filters.change.emit()
"""))
project_select = Select(title="Project:", value="", options=df['project'].unique().tolist())
project_select.js_link('value', project_filter, 'group')
show(row(p, column(project_select)))
Here is a MRE for a plot I made using Bokeh 2.3.2. The actual data set contains around a hundred projects which each have up to 10 tasks and are stored in a df with apprixmately 40k rows. When I save this plot to .html, the file sizes are near 100k KB. Each time users pick a new project via the dropdown, load times can be greater than 10 seconds. I spent quite a bit of time getting the CustomJS code to work how it is now, but would like to reduce load time if possible. Any ideas on how to do this? Bokeh Server is not an option at this time.
import itertools
from time import strftime
from datetime import datetime
import pandas as pd
from pandas import Timestamp
from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.models import GroupFilter, CDSView
from bokeh.models import CustomJS
from bokeh.models import Legend, LegendItem
from bokeh.models.widgets import Select
from bokeh.layouts import row, column
from bokeh.palettes import Set1 as palette
data = {'date': {0: Timestamp('2021-09-26 00:00:00'),
1: Timestamp('2021-09-26 00:00:00'),
2: Timestamp('2021-09-26 00:00:00'),
3: Timestamp('2021-10-03 00:00:00'),
4: Timestamp('2021-10-03 00:00:00'),
5: Timestamp('2021-10-03 00:00:00'),
6: Timestamp('2021-10-10 00:00:00'),
7: Timestamp('2021-10-10 00:00:00'),
8: Timestamp('2021-10-10 00:00:00'),
9: Timestamp('2021-10-03 00:00:00'),
10: Timestamp('2021-10-03 00:00:00'),
11: Timestamp('2021-09-26 00:00:00'),
12: Timestamp('2021-09-26 00:00:00'),
13: Timestamp('2021-10-10 00:00:00'),
14: Timestamp('2021-10-10 00:00:00')},
'TB': {0: 4.1,
1: 8.5,
2: 0.2,
3: 0.2,
4: 5.1,
5: 8.5,
6: 8.5,
7: 6.1,
8: 0.2,
9: 7.0,
10: 12.5,
11: 5.9,
12: 10.1,
13: 6.9,
14: 12.6},
'project': {0: 'Project_A',
1: 'Project_A',
2: 'Project_A',
3: 'Project_A',
4: 'Project_A',
5: 'Project_A',
6: 'Project_A',
7: 'Project_A',
8: 'Project_A',
9: 'Project_B',
10: 'Project_B',
11: 'Project_B',
12: 'Project_B',
13: 'Project_B',
14: 'Project_B'},
'project_sub': {0: 'TASK_1',
1: 'TASK_2',
2: 'TASK_3',
3: 'TASK_3',
4: 'TASK_1',
5: 'TASK_2',
6: 'TASK_2',
7: 'TASK_1',
8: 'TASK_3',
9: 'TASK_2',
10: 'TASK_1',
11: 'TASK_2',
12: 'TASK_1',
13: 'TASK_2',
14: 'TASK_1'}}
df = pd.DataFrame(data)
dfp_dict = {}
for p in df['project'].unique().tolist():
df_slice = df.loc[df['project'].isin([p])]
dfp = df_slice.pivot_table(values='TB', index=df_slice.date, columns='project_sub', aggfunc='max')
dfp.columns = [p + ', ' + x for x in dfp.columns]
dfp['project'] = p
dfp_dict[p] = dfp
# make one big df where each col is 'project name, task'
df_cat = pd.concat([dfp_dict[x] for x in list(dfp_dict)], axis=0)
# this is used for varea plot
df_cat['y1'] = 0
# move project column to end, making it easier to iterate over only columns we want to plot
df_cat = df_cat[[ col for col in df_cat.columns if col != 'project'] + ['project']]
# bokeh datetime axis misbehaving, so give it date string for y axis
df_cat["date_string"] = df_cat.index.strftime("%Y-%m-%d")
#make a nice list of contrasting colors that is big enough for all possible tasks
colors = itertools.cycle(palette[9])
color_list = []
for color in zip(range(df_cat.shape[1]), colors):
color_list.append(color[1])
y1 = range(df_cat.shape[0])
source = ColumnDataSource(df_cat)
p = figure(plot_height = 500,
plot_width = 900,
x_axis_type="datetime",
sizing_mode="scale_both",
active_drag="pan")
project_filter = GroupFilter(column_name='project',
group=df_cat['project'].unique().tolist()[0])
# this creates a subset of df_cat by selecting rows from the project column that match dropdown
view = CDSView(source=source, filters=[project_filter])
custom_js_dict = {}
legend_list = []
for i, col in enumerate(df_cat.columns[0:-3]):
x = p.varea(x='date', y1='y1', y2=col, source=source, view=view, name=col, alpha=0.5, color=color_list[i])
# save the column name and its renderer so we can create bokeh Legend items
legend_list.append((col, [x]))
p.yaxis.axis_label="TB"
p.xaxis.axis_label = "Date"
custom_js_dict['f'] = view.filters[0]
custom_js_dict['p'] = p
custom_js_dict['view'] = view
custom_js_dict['projects'] = df_cat['project'].unique().tolist()
# add each project's legend items to the custom_js_dict as bokeh legend item objects
for x in df_cat['project'].unique():
custom_js_dict[x] = Legend(items=[y for y in legend_list if y[0].split(',')[0] == x])
p.add_layout(custom_js_dict[df_cat['project'].unique().tolist()[0]])
p.legend.click_policy="hide"
p.legend.label_text_font_size = "10pt"
for f in view.filters:
f.js_on_change('group', CustomJS(args=custom_js_dict,
code="""
const allProjects = projects
for (var element of allProjects) {
if(element === f.group){
eval(element).visible = true;
p.add_layout(eval(f.group));
eval(f.group).click_policy="hide"
} else {eval(element).visible = false;
}
}
view.properties.filters.change.emit()
"""))
project_select = Select(title="Project:", value="", options=df['project'].unique().tolist())
project_select.js_link('value', project_filter, 'group')
show(row(p, column(project_select)))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我会以不同的方式组织代码,以避免在
CustomJS
中添加布局和eval
调用。这是代码的相关部分:这与原始代码的工作方式相同,AFAICT,并且与我能想象的一样精简。如果在完整数据集上仍然表现不佳,那么您需要以某种方式合成(或链接到)可用于在本地实际重现情况的数据(和/或更新 MRE 以包括丢失的任何重要的相关细节)。
I would organize the code differently, to avoid the adding layouts and
eval
calls in theCustomJS
. Here is the relevant portion of the code:This works the same as the original code does, AFAICT, and is about as stripped-down as I can imagine making it. If this still performs poorly with the full data set then you would need to somehow synthesize (or link to) data that can be used to actually reproduce the situation locally (and/or update the MRE to include whatever important pertinent details are missing).