使用组织模式构建分析

发布于 2024-10-05 18:25:49 字数 2061 浏览 6 评论 0原文

我正在尝试在我的项目中更好地利用组织模式。我认为文学编程特别适用于数据分析领域,而组织模式让我们可以进行一些非常棒的文学编程。

我想你们大多数人都会同意我的观点,即编写分析的工作流程与大多数其他类型的编程不同。我不只是编写程序,我还探索数据。而且,虽然其中许多探索都是死胡同,但我不想完全删除/忽略它们。我只是不想每次执行 org 文件时都重新运行它们。我还倾向于查找或开发有用的代码块,我想将其放入分析模板中,但其中一些块并不与每个项目相关,我想知道如何使组织模式忽略这些代码块当我执行整个缓冲区时。这是一个简化的示例。

* Import
  - I want org-mode to ignore import-sql.
#+srcname: import-data
#+begin_src R :exports none :noweb yes
<<import-csv>>
#+end_src

#+srcname: import-csv
#+begin_src R :exports none
data <- read.csv("foo-clean.csv")
#+end_src

#+srcname: import-sql
#+begin_src R :exports none
library(RSQLite)
blah blah blah
#+end_src

* Clean
  - This is run on foo.csv, producing foo-clean.csv
  - Fixes the mess of -9 and -13 to NA for my sanity.
  - This only needs to be run once, and after that, reference.
  - How can I tell org-mode to skip this?
#+srcname: clean-csv
#+begin_src sh :exports none
sed .....
#+end_src

* Explore

** Explore by a factor (1)
   - Dead end. Did not pan out. Ignore.
   - Produces a couple of charts showing there is not interaction.
#+srcname: explore-by-a-factor-1
#+begin_src R :exports none :noweb yes
#+end_src

** Explore by a factor (2)
   - A useful exploration that I will reference later in a report.
   - Produces a couple of charts showing the interaction of my variables.
#+srcname: explore-by-a-factor-2
#+begin_src R :exports none :noweb yes
#+end_src

我希望能够使用 org-babel-execute-buffer 并让 org-mode 以某种方式知道跳过代码块 import-sql、clean-csv 和 explore-by-a-factor-1。我希望它们出现在 org 文件中,因为它们与项目相关。毕竟,明天有人可能想知道为什么我如此确定 explore-by-a-factor-1 没有用。我想保留该代码,这样我就可以敲出情节或分析或其他任何内容并继续,但不要每次重新运行所有内容时都运行它,因为没有理由运行它。 clean-csv 的内容也是如此。我想要它,记录我对数据所做的事情(以及原因),但我不想每次都重新运行它。我将只导入 foo-clean.csv。

我在谷歌上搜索了所有这些内容并阅读了一堆组织模式邮件列表档案,我找到了一些想法,但不是我想要的。导出文件时,EXPORT_SELECT_TAGS、EXPORT_EXCLUDE_TAGS 非常有用。在创建实际的源文件时, :tangle 标头效果很好。我不想做这两件事。我只想执行缓冲区。我希望能够以类似的方式定义要执行或忽略的代码块。我想我想找到一种方法来拥有一个组织变量,例如:

EXECUTE_SELECT_TAGS

这样我就可以简单地标记我的各种代码块并完成它。如果我可以仅使用带有特定标签的源块来运行该文件,那就更好了。我找不到办法做到这一点,我想在请求/乞求组织模式的新功能之前我会先问一下。

I am trying to make better use of org-mode for my projects. I think literate programming is especially applicable to the realm of data analysis and org-mode lets us do some pretty awesome literate programming.

I think most of you will agree with me that the workflow for writing an analysis is different than most other types of programming. I don't just write a program, I explore the data. And, while many of these explorations are dead-ends, I don't want to delete/ignore them completely. I just don't want to re-run them every time I execute the org file. I also tend to find or develop chunks of useful code that I would like to put into an analytic template, but some of these chunks won't be relevant for every project and I'd like to know how to make org-mode ignore these chunks when I am executing the entire buffer. Here's a simplified example.

* Import
  - I want org-mode to ignore import-sql.
#+srcname: import-data
#+begin_src R :exports none :noweb yes
<<import-csv>>
#+end_src

#+srcname: import-csv
#+begin_src R :exports none
data <- read.csv("foo-clean.csv")
#+end_src

#+srcname: import-sql
#+begin_src R :exports none
library(RSQLite)
blah blah blah
#+end_src

* Clean
  - This is run on foo.csv, producing foo-clean.csv
  - Fixes the mess of -9 and -13 to NA for my sanity.
  - This only needs to be run once, and after that, reference.
  - How can I tell org-mode to skip this?
#+srcname: clean-csv
#+begin_src sh :exports none
sed .....
#+end_src

* Explore

** Explore by a factor (1)
   - Dead end. Did not pan out. Ignore.
   - Produces a couple of charts showing there is not interaction.
#+srcname: explore-by-a-factor-1
#+begin_src R :exports none :noweb yes
#+end_src

** Explore by a factor (2)
   - A useful exploration that I will reference later in a report.
   - Produces a couple of charts showing the interaction of my variables.
#+srcname: explore-by-a-factor-2
#+begin_src R :exports none :noweb yes
#+end_src

I would like to be able to use org-babel-execute-buffer and have org-mode somehow know to skip over the code blocks import-sql, clean-csv and explore-by-a-factor-1. I want them in the org file, because they are relevant to the project. After-all, tomorrow someone might want to know why I was so sure explore-by-a-factor-1 was not useful. I want to keep that code around, so I can bang out the plot or the analysis or what-ever and go on, but not have it run every-time I rerun everything because there's no reason to run it. Ditto with the clean-csv stuff. I want it around, to document what I did to the data (and why), but I don't want to re-run it every time. I'll just import foo-clean.csv.

I Googled all over this and read a bunch of org-mode mailing list archives and I was able to find a couple of ideas, but not what I want. EXPORT_SELECT_TAGS, EXPORT_EXCLUDE_TAGS are great, when exporting the file. And the :tangle header works well, when creating the actual source files. I don't want to do either of these. I just want to execute the buffer. I would like to be able to define code blocks in a similar fashion to be executed or ignored. I guess I would like to find a way to have an org variable such as:

EXECUTE_SELECT_TAGS

This way I could simply tag my various code blocks and be done with it. It would be even nicer if I could then run the file, using only source blocks with specific tags. I can't find a way to do this and I thought I would ask before asking/begging for a new feature in org-mode.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

淡莣 2024-10-12 18:25:50

虽然我从未得到问题的答案,但讨论很有趣,显然基于组织模式的 R 模板让一些人觉得这是一个有趣的想法。我将源代码下载到 org-mode 并查看了 org-babel-execute-buffer。正如我担心的那样,这是一个幼稚的函数,它的作用完全按照它所说的那样,仅此而已。 (当前)不可能向其传递任何附加参数来影响其行为。 (除非我严重误读了 lisp,这是完全有可能的。)

最终,我决定 org-babel-execute-buffer 对于一个有用的 R 模板系统来说并不是必需的。 Babel 的 noweb 功能非常灵活,我认为可以使用 noweb 构建一个可行的解决方案,而不是尝试开发一个复杂的标记模式来定义如何/何时运行事物。

对于缠结/导出,仍然应该可以使用标签来创建可用/合理的输出。

对于任何感兴趣的人: LiterateR

使用此线程将其放在那里可能有点粗鲁,但是这个这就是我首先问这个问题的原因。 TemplateR 是我让 R 更容易使用的尝试。现在它只是一个具有两个简单功能的模板。我认为此时它是一个概念证明。最终,我想开发一些能够帮助人们更快地开发 R 项目的东西。 TemplateR 将通过以下方式完成此任务:
1. 提供强大的开发结构。
2. 提供内置功能,为常见任务提供支持,特别是在可重复研究领域。
3. 提供经过测试的代码片段,可以快速重新用于当前项目。

目前,它提供的只是一个基本的结构/框架和两个简单​​的功能。
1. 确定缺少哪些 R 包(根据手动输入表中的内容)并
2. 创建项目目录(绘图、数据、报告)。

未来版本中将会有更多内容。 README.org 和 TODO.org 提供了更多详细信息。

While I never did get an answer to my question, the discussion was interesting and apparently an org-mode based Template for R strikes a few people as an interesting idea. I downloaded the source code to org-mode and looked at org-babel-execute-buffer. It is, as I feared, a naive function which does precisely what it says it does and nothing more. It is not (currently) possible to pass it any additional parameters to affect it's behavior. (Unless I am badly misreading the lisp, which is entirely possible.)

Eventually, I decided org-babel-execute-buffer is not necessary for a useful R template system. Babel's noweb functionality is really flexible and I think it is possible to build a workable solution using noweb, rather than trying to develop a complex tagging schema to define how/when to run things.

For tangling/export it should still be possible to use tags to create usable/sane output.

For anyone who is interested: LiterateR

It's probably a little rude to use this thread to put this out there but this is why I asked the question in the first place. TemplateR is my attempt to make R a little easier to use. Right now it is just a template with two simplistic functions. I consider it to be a proof of concept at this point. Eventually, I want to develop something that does more to help people develop R projects more quickly. TemplateR will accomplish this by:
1. Provide a strong structure to develop around.
2. Provide built-in function to provide support for common tasks, especially in the realm of reproducible research.
3. Provide snippets of tested code that can be rapidly re-purposed for the current project.

Right now, all it provides is a basic structure/framework and two simple functions.
1. Identify which R packages are missing (based on what is manually entered into a table) and
2. Creates project directories (plots, data, reports).

More will come in future versions. The README.org and TODO.org go into further detail.

后来的我们 2024-10-12 18:25:49

我想通了。来自组织手册自更新):

:eval 标头参数可用于限制特定代码块的计算。 :eval 接受两个参数“never”和“query”。 :eval never 将确保永远不会评估代码块,这对于防止评估危险代码块很有用。 :eval query 将需要对代码块的每次执行进行查询,无论 org-confirm-babel-evaluate 变量的值如何。

因此,您只需将 :eval never 添加到您不想执行的块的标题中,瞧!

I figured out. From the Org manual (since updated):

The :eval header argument can be used to limit the evaluation of specific code blocks. :eval accepts two arguments “never” and “query”. :eval never will ensure that a code block is never evaluated, this can be useful for protecting against the evaluation of dangerous code blocks. :eval query will require a query for every execution of a code block regardless of the value of the org-confirm-babel-evaluate variable.

So you just have to add :eval never to the header of the blocks that you don’t want to execute, and voilá!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文