如何使用文本热图可视化注意向量?
我正在研究一个NLP研究项目,我想可视化注意向量的输出。
例如,数据看起来像这样:
def sample_data():
sent = '''the USS Ronald Reagan - an aircraft carrier docked in Japan - during his tour of the region, vowing to "defeat any attack and meet any use of conventional or nuclear weapons with an overwhelming and effective American response".'''
words = sent.split()
word_num = len(words)
attention = [(x+1.)/word_num*100 for x in range(word_num)]
return {'text': words, 'attention': attention}
看起来像这样:
{'text': ['the', 'USS', 'Ronald', 'Reagan', '-', 'an', 'aircraft', 'carrier', 'docked', 'in', 'Japan', '-', 'during', 'his', 'tour', 'of', 'the', 'region,', 'vowing', 'to', '"defeat', 'any', 'attack', 'and', 'meet', 'any', 'use', 'of', 'conventional', 'or', 'nuclear', 'weapons', 'with', 'an', 'overwhelming', 'and', 'effective', 'American', 'response".'], 'attention': [2.564102564102564, 5.128205128205128, 7.6923076923076925, 10.256410256410255, 12.82051282051282, 15.384615384615385, 17.94871794871795, 20.51282051282051, 23.076923076923077, 25.64102564102564, 28.205128205128204, 30.76923076923077, 33.33333333333333, 35.8974358974359, 38.46153846153847, 41.02564102564102, 43.58974358974359, 46.15384615384615, 48.717948717948715, 51.28205128205128, 53.84615384615385, 56.41025641025641, 58.97435897435898, 61.53846153846154, 64.1025641025641, 66.66666666666666, 69.23076923076923, 71.7948717948718, 74.35897435897436, 76.92307692307693, 79.48717948717949, 82.05128205128204, 84.61538461538461, 87.17948717948718, 89.74358974358975, 92.3076923076923, 94.87179487179486, 97.43589743589743, 100.0]}
每个令牌分配给一个浮点值(注意分数)。可视化这些数据的选项是什么?任何语言r/python/js的库/工具可用吗?
I am working on an NLP research project and I want to visualize the output of the attention vector.
For example, the data looks like this:
def sample_data():
sent = '''the USS Ronald Reagan - an aircraft carrier docked in Japan - during his tour of the region, vowing to "defeat any attack and meet any use of conventional or nuclear weapons with an overwhelming and effective American response".'''
words = sent.split()
word_num = len(words)
attention = [(x+1.)/word_num*100 for x in range(word_num)]
return {'text': words, 'attention': attention}
which looks like this:
{'text': ['the', 'USS', 'Ronald', 'Reagan', '-', 'an', 'aircraft', 'carrier', 'docked', 'in', 'Japan', '-', 'during', 'his', 'tour', 'of', 'the', 'region,', 'vowing', 'to', '"defeat', 'any', 'attack', 'and', 'meet', 'any', 'use', 'of', 'conventional', 'or', 'nuclear', 'weapons', 'with', 'an', 'overwhelming', 'and', 'effective', 'American', 'response".'], 'attention': [2.564102564102564, 5.128205128205128, 7.6923076923076925, 10.256410256410255, 12.82051282051282, 15.384615384615385, 17.94871794871795, 20.51282051282051, 23.076923076923077, 25.64102564102564, 28.205128205128204, 30.76923076923077, 33.33333333333333, 35.8974358974359, 38.46153846153847, 41.02564102564102, 43.58974358974359, 46.15384615384615, 48.717948717948715, 51.28205128205128, 53.84615384615385, 56.41025641025641, 58.97435897435898, 61.53846153846154, 64.1025641025641, 66.66666666666666, 69.23076923076923, 71.7948717948718, 74.35897435897436, 76.92307692307693, 79.48717948717949, 82.05128205128204, 84.61538461538461, 87.17948717948718, 89.74358974358975, 92.3076923076923, 94.87179487179486, 97.43589743589743, 100.0]}
Each token is assigned to one float value (attention score). What are the options to visualize this data? Any library/tools available in any language R/Python/Js?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
可以处理长期句子的解决方案是在控制台中打印一个有色句子。您可以通过在控制台中打印逃生字符:
\ 033 [38; 2; 2; 255; 0; 0m test \ 033 [0M
将在此处打印一个红色test
控制台(RGB代码(255,0,0))。通过使用这个想法,我们可以将渐变从绿色到红色(低到高度注意)并打印文本:
此解决方案将在控制台上输出类似的内容:
我发现这是一个可视化器有效的,但是它在控制台中进行可视化的事实可能不是一件好事。
另一种方法是进行热图:
这给出以下结果(您可以在其上修改某些参数,例如高度,宽度,颜色等...)
如果您的句子很长,那么这可能不是最佳解决方案,因为刻度标签将很难看到。
A solution that would handle the long sentences great would be to print a colored sentence in the console. You can do so by printing escape characters in the console:
\033[38;2;255;0;0m test \033[0m
will print a redtest
in the console (rgb code (255, 0, 0)).By using this idea, we can make a gradient from green to red (low to high attention) and print the text:
This solution would output something like this on the console:
I find this to be effective as a visualizer, but the fact that it does the visualization in the console might not be a good thing.
Another way would be to do a heatmap:
This gives the following result (upon which you can modify some parameters such as the height, width, colors, etc...)
If you have long sentences, this might not be the optimal solution though, as the ticks labels will be harder to see.