使用 python 的 Google 应用程序引擎数据存储标签云
我们的应用程序引擎数据存储中有一些非结构化文本数据。我想在数据存储对象的子集上创建一个属性的“一次性”标签云。环顾四周后,我看不到任何框架可以让我在不自己编写的情况下做到这一点。
我想到的方法是:
- 编写一个映射(如映射减少)函数来遍历数据存储中特定类型的每个对象,
- 将文本字符串拆分为单词
- 对于每个单词递增一个计数器
- 使用最终计数来生成标签云与一些第三方软件(离线 - 欢迎任何建议)
因为我以前从未这样做过,所以我想知道是否首先有一些框架可以为我做到这一点(请)如果不是,我是否以正确的方式处理它。即,请随时指出计划中的漏洞。
We have some unstructured textual data in our app engine datastore. I wanted to create a 'one off' tag cloud of one property on a subset of the datastore objects. After a look around, I can't see any framework that will allow me to do this without writing it myself.
The way I had in mind was:
- Write a map (as in map reduce) function to go over every object of the particular type in a datastore,
- Split the text string into words
- For each word increment a counter
- Use the final counts to generate the tag cloud with some third party software (offline - any suggestions here welcome)
As I've never done this before, I was wandering if firstly there is some framework around that does this for me (please) of if not am I approaching it in the right way. i.e please feel free to point out gaping holes in the plan.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Feed TagCloud 和 PyTagCloud 是两种可能性。
Feed TagCloud 生成器小工具
Google App Engine可能适合您
需要。不幸的是,这是
无证的。幸运的是
相当简单,虽然我不确定
它有多适合您的需求。
它在提要上运行,并出现
有点灵活,所以如果你
有您网站的提要,它可能
不会太麻烦
整合,尽管所有处理
将会上线。
PyTagCloud 也值得一试
看。你将能够做到
离线处理,并生成
云朵比较帅。
你需要做的就是得到这个
正在工作,正在导出您的数据存储;
计数和分割将是
PyTagCloud 可以为您完成
对文本文件进行操作。下列的
App Engine 中的说明
有关上传和
下载数据会告诉你
如何将数据存储导出到您的
本地机器。你会想写
“出口商类别”,并且有
PyTagCloud 对输出进行操作。
如果您决定自行推出,则可能需要跳过在线处理并使用 上传和下载数据,除非您想要动态更新的云。迭代整个数据存储并进行在线计数是该任务中最烦人和最昂贵的部分。仅当您想要或需要动态标签云时才有意义。如上所述,我建议编写一个“导出器类”,并在本地对其进行操作。
Feed TagCloud and PyTagCloud are two possibilities.
Feed TagCloud Generator Gadget for
Google App Engine might fit your
needs. Unfortunately, it's
undocumented. Fortunately it's
rather simple, though I'm not sure
how well-suited it is to your needs.
It operates on a feed, and appears
to be somewhat flexible, so if you
have an feed of your site, it might
not be too much trouble to
integrate, though all processing
will be online.
PyTagCloud is also worth a
look. You'll be able to do the
processing offline, and it generates
rather handsome clouds.
All you'll have to do to get this
working, is export your datastore;
the counts and splitting will be
done for you, as PyTagCloud can
operate on text files. Following
the instructions in the App Engine
docs about Uploading and
Downloading Data will show you
how to export the datastore to your
local machine. You'll want to write
an "Exporter Class", and have
PyTagCloud operate on the output.
If you decide to roll your own, you probably want to skip the online processing and use the offline method of Uploading and Downloading Data above, unless you want a dynamically-updated cloud. Iterating over your entire data store, and doing online counts is the most annoying and expensive part of the task. It only makes sense to do this if you want or need a dynamic tag-cloud. As above, I'd recommend writing an "Exporter Class", and operating on that locally.