报告的版本控制 (git)
我有一份特别的报告,有时会被要求运行。每次的细节都略有不同——不同的日期范围、不同的选择标准——但从结构上看,报告相当稳定。不过,我确实会时不时地做出一些结构性的改变。
我对这些报告有两个希望:
- 能够在以后重现任何报告。
- 能够审查随着时间的推移对报告所做的结构变化。
现在,我只有一个包含主脚本的文件夹,我会为报告的每次迭代进行修改,以及保存主脚本快照和每次运行的数据的子文件夹。
也许这已经足够好了。但我已经开始使用 git 来管理我的(更复杂的)数据分析脚本,我想知道是否有一种方法可以在这里使用它(以及无数类似的报告),从而实现更强大的版本控制。
我可以想到几种不同的方法来做到这一点:为每个报告创建一个分支,但仅将结构更改合并回主报告;将主文件克隆到新报告的子文件夹中,在那里进行更改,推迟结构更改;但我真的不知道如何区分疯狂的想法和合理的想法,更不用说好的想法了。你怎么认为?
I have a particular report that I am asked to run from time to time. The details are slightly different each time - different date ranges, different selection criteria - but structurally, the report is fairly stable. I do make some structural changes from time to time, however.
I have two hopes for these reports:
- To be able to reproduce any report at a later date.
- To be able to review the structural changes made to the report over time.
Right now, I just have a folder with a master script, which I modify for every iteration of the report, and subfolders where I save a snapshot of the master script and the data for each run.
Maybe that's good enough. But I've started using git to manage my (much more complex) data analysis scripts, and I was wondering if there was a way to use it here (and for myriad similar reports) that would allow for more robust version control.
I can think of a few different ways to do so: make a branch for each report, but only merge structural changes back onto the master; clone the master into the subfolder for a new report, make changes there, push back structural changes; etc. But I really don't even know enough to be able to separate insane ideas from plausible ones, much less good ones. What do you think?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这显然取决于报告以及它将如何变化,但根据您所说的,在我看来,您可以编写一个良好且有意义的 SAS 宏程序,该程序可以将您的所有选择标准作为参数。
然后,您可以在 SAS 宏代码中评估参数并根据需要进行结构更改。
因此,一个 .sas 文件中只有一个大宏,根据您用来调用该宏的参数,它可以重现您想要的所有报告。
这对你来说有意义吗?如果它不让我知道,如果您不熟悉它,我可以提供一些 SAS 宏示例来帮助您入门。
It depends on the report obviously and how it would change but following what you say it does seem to me you can write a good and meaningful SAS Macro program that can have as parameters all your selection criteria.
In the SAS macro code you can then evaluate the parameters and make the structural change, if necessary.
So one .sas file with just one big macro in it, depending on the parameters you use to call the macro it can reproduce all the reports you want.
This makes sense to you? If it doesn't let me know and I could provide some examples of SAS Macro to get you started if you are not familiar with it.
我个人会采纳你的第一个建议:
最简单的概念,通过将结构更改合并到头部修订版中,您可以根据需要将它们应用到其他分支(当需要时)。唯一的缺点是你会留下大量的分支,这听起来像是一个不常见的请求,一个好的命名方案应该可以解决这个问题。
I'd personally go for your first suggestion:
This is by far the easiest conceptually, and it by merging the structural changes into the head revision, you can apply them as and when required to the other branches (when requested). The only downside is the amount of branches you'll leave lying around, it sounds like an infrequent request and a good naming scheme should sort that out.
如果您可以预测哪些字段每次都会发生变化,我会说制作一个通用报告,每次运行报告时都会提示您输入此数据。您应该能够在几乎任何报告软件中执行此操作。报告本身可以在 git 中跟踪,您不必担心存储库中有 50,000 个分支。
如果无法预测每次需要自定义哪些字段,请为大多数字段提供有用的默认值。
如果您经常运行此报告,并且对跟踪各种结果集特别感兴趣,我建议采用不同的方法。我不知道你的报告会生成什么,但假设它是 PDF。我会在某处创建一个目录结构,您可以将每次运行存储在
results/year/month/date.pdf
中。这样您将获得 2010 年 5 月 5 日提取的数据记录(或以 2010 年 5 月 5 日作为参数)。编辑:对于那些无法合并到单个报告中的内容,您可以考虑使用标签而不是分支。如果您认为需要快速访问某个版本,请对其进行标记。任何时候您需要返回时,只需检查标签并运行报告即可。
If you can anticipate which fields change each time, I would say make a generic report that prompts you for this data each time the report is run. You should be able to do this in just about any reporting software. The report itself can be tracked in git, and you won't have to worry about having 50,000 branches in your repository.
If it's unpredictable what fields need to be custom each time, give most of the fields useful default values.
If you run this report a lot, and are specifically interested in keeping track of the various result sets, I'd suggest a different approach. I don't know what your report generates, but let's say it's a PDF. I would make a directory structure somewhere, and you could store each run in
results/year/month/date.pdf
. This way you will have a record of the data pulled on May 5, 2010 (or with May 5, 2010 as a parameter).Edit: You might consider tags instead of branches for those things you can't combine into a single report. If you have a version you think you're going to need quick access to, tag it. Any time you need to get back to it, just check out the tag and run the report.