我如何编写一个程序来从 Google 代码中提取我的数据?
我即将开始编写一个程序,该程序将尝试从 Google 代码网站< /a> 以便可以将其导入到另一个项目管理站点。具体来说,我需要从网站中提取完整的问题详细信息(描述、评论等)。
不幸的是,Google 没有为此提供 API,也没有导出功能,所以对我来说,唯一的选择就是从实际的 HTML 中提取数据(恶心)。有人对尝试从 HTML 解析数据的“最佳实践”有什么建议吗?我知道这不太理想,但我认为我没有太多选择。其他人能想到更好的方法,或者也许其他人已经这样做了?
另外,我知道问题页面上有 CSV 导出功能,但这并没有提供有关问题的完整数据(但可能是一个有用的起点)。
I'm about to start writing a program which will attempt to extract data from a Google Code site so that it may be imported in to another project management site. Specifically, I need to extract the full issue detail from the site (description, comments, and so on).
Unfortunately Google don't provide an API for this, nor do they have an export feature, so to me the only option looks to be extracting the data from the actual HTML (yuck). Does any one have any suggestions on "best practice" from attempting to parse data out of HTML? I'm aware that this is less than ideal, but I don't think I have much choice. Can anyone else think of a better way, or maybe someone else has already done this?
Also, I'm aware of the CSV export feature on the issue page, however this does not give complete data about issues (but could be a useful starting point).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我刚刚完成了一个名为 google-code-export 的程序(托管在 Github 上)。这允许您将 Google 代码项目导出到 XML 文件,例如:
... 将创建一个名为 synergy-plus_google 的文件-code-export.xml。
I just finished a program called google-code-export (hosted on Github). This allows you to export your Google Code project to an XML file, for example:
... will create a file named synergy-plus_google-code-export.xml.