R quantstrat 代码中的 while 循环 - 如何使其更快？

发布于 2024-12-09 01:42:51 字数 2970 浏览 0 评论 0原文

在 quantstrat 包中，我找到了 applyRule 函数缓慢的罪魁祸首之一，并想知道是否有更有效的方法来编写 while 循环。任何反馈都会有帮助。对于任何有将这部分封装到 Parallel R 中的经验的人来说。

作为一个选项 apply 可以工作吗？或者我应该将这部分重写为新函数，例如ruleProc和nextIndex？我也在研究 Rcpp，但这可能有点困难。非常感谢任何帮助和建设性建议？

   while (curIndex) {
    timestamp = Dates[curIndex]
    if (isTRUE(hold) & holdtill < timestamp) {
        hold = FALSE
        holdtill = NULL
    }
    types <- sort(factor(names(strategy$rules), levels = c("pre",
        "risk", "order", "rebalance", "exit", "enter", "entry",
        "post")))
    for (type in types) {
        switch(type, pre = {
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules$pre, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        }, risk = {
            if (length(strategy$rules$risk) >= 1) {
              ruleProc(strategy$rules$risk, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        }, order = {
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules[[type]], timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr,)
            } else {
              if (isTRUE(path.dep)) {
                timespan <- paste("::", timestamp, sep = "")
              } else timespan = NULL
              ruleOrderProc(portfolio = portfolio, symbol = symbol,
                mktdata = mktdata, timespan = timespan)
            }
        }, rebalance = , exit = , enter = , entry = {
            if (isTRUE(hold)) next()
            if (type == "exit") {
              if (getPosQty(Portfolio = portfolio, Symbol = symbol,
                Date = timestamp) == 0) next()
            }
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules[[type]], timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
            if (isTRUE(path.dep) && length(getOrders(portfolio = portfolio,
              symbol = symbol, status = "open", timespan = timestamp,
              which.i = TRUE))) {
            }
        }, post = {
            if (length(strategy$rules$post) >= 1) {
              ruleProc(strategy$rules$post, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        })
    }
    if (isTRUE(path.dep))
        curIndex <- nextIndex(curIndex)
    else curIndex = FALSE
}

原文

In quantstrat package I have located one of the main culprits for slowness of the applyRule function and wonder if there is more efficient to write the while loop. Any feedback would be helpful. For anyone experience wrapping this part into Parallel R.

As an option apply would work instead while? Or should I re-write this part into new function such as ruleProc and nextIndex? I am also dveling on Rcpp but that may be a streach. Any help and constructive advice is much appreciated?

   while (curIndex) {
    timestamp = Dates[curIndex]
    if (isTRUE(hold) & holdtill < timestamp) {
        hold = FALSE
        holdtill = NULL
    }
    types <- sort(factor(names(strategy$rules), levels = c("pre",
        "risk", "order", "rebalance", "exit", "enter", "entry",
        "post")))
    for (type in types) {
        switch(type, pre = {
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules$pre, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        }, risk = {
            if (length(strategy$rules$risk) >= 1) {
              ruleProc(strategy$rules$risk, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        }, order = {
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules[[type]], timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr,)
            } else {
              if (isTRUE(path.dep)) {
                timespan <- paste("::", timestamp, sep = "")
              } else timespan = NULL
              ruleOrderProc(portfolio = portfolio, symbol = symbol,
                mktdata = mktdata, timespan = timespan)
            }
        }, rebalance = , exit = , enter = , entry = {
            if (isTRUE(hold)) next()
            if (type == "exit") {
              if (getPosQty(Portfolio = portfolio, Symbol = symbol,
                Date = timestamp) == 0) next()
            }
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules[[type]], timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
            if (isTRUE(path.dep) && length(getOrders(portfolio = portfolio,
              symbol = symbol, status = "open", timespan = timestamp,
              which.i = TRUE))) {
            }
        }, post = {
            if (length(strategy$rules$post) >= 1) {
              ruleProc(strategy$rules$post, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        })
    }
    if (isTRUE(path.dep))
        curIndex <- nextIndex(curIndex)
    else curIndex = FALSE
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

记忆里有你的影子 2024-12-16 01:42:51

Garrett 的回答确实指出了 R-SIG-Finance 列表上的最后一次主要讨论，其中讨论了相关问题。

quantstrat 中的 applyRules 函数绝对是花费最多时间的地方。

本问题中复制的 while 循环代码是 applyRules 执行的路径相关部分。我相信所有这些都包含在文档中，但我将简要回顾一下后代。

我们在applyRules内部构造了一个降维索引，这样我们就不必观察每个时间戳并检查它。我们仅记录策略可合理预期对订单簿起作用或订单可合理预期被执行的特定时间点。

这是状态相关和路径相关代码。在这种情况下，空谈“矢量化”没有任何意义。如果我需要了解市场的当前状态、订单簿和我的头寸，并且如果我的订单可能会被其他规则以依赖于时间的方式修改，我不知道如何对这段代码进行矢量化。

从计算机科学的角度来看，这是一个状态机。我能想到的几乎所有语言中的状态机通常都写成 while 循环。这并不是真正可以协商或改变的。

问题询问使用 apply 是否有帮助。 R 中的 apply 语句是作为循环实现的，所以不，它没有帮助。即使是诸如 mclapply 或 foreach 之类的并行应用也无济于事，因为这是在代码的状态相关部分内。不考虑状态来评估不同的时间点是没有任何意义的。您会注意到，Quantstrat 的非状态相关部分会尽可能进行矢量化，并且只占用很少的运行时间。

John 的评论建议删除 ruleProc 中的 for 循环。 for 循环所做的只是检查此时与策略关联的每个规则。该循环中唯一的计算密集型部分是调用规则函数的do.call。 for 循环的其余部分只是简单地查找和匹配这些函数的参数，并且从代码分析来看，根本不需要太多时间。在这里使用并行应用也没有多大意义，因为规则函数是按类型顺序应用的，因此可以在新的进入指令之前应用取消或风险指令。就像数学有一个运算顺序，或者银行有一个存款/取款处理顺序一样，Quantstrat 有一个规则类型评估顺序，如文档中所述。

为了加快执行速度，可以做四件主要的事情：

编写非路径依赖策略：这是由代码支持的，简单的策略可以通过这种方式建模。在此模型中，您将编写一个自定义规则函数，当您认为应该得到满足时，该函数会直接调用 addTxn。它可以是对您的指标/信号进行操作的矢量化函数，并且应该非常快。
预处理您的信号：如果状态机需要评估订单簿/规则/投资组合的状态以查看是否需要执行某些操作的地方较少，则速度增加几乎与信号减少。这是大多数用户忽视的领域，编写的信号函数并没有真正评估何时需要采取行动来修改仓位或订单簿。
显式并行化分析问题的各个部分：我通常编写显式并行包装器来分离不同的参数计算或符号计算，请参阅 applyParameter 使用 foreach 的示例strong>
用 C/C++ 重写 applyRules 内的状态机：欢迎补丁，但请参阅 Garrett 发布的链接以获取更多详细信息。

我可以向您保证，如果对信号生成函数稍加注意，大多数策略可以在每个交易品种每天每个核心的一小部分核心分钟内运行。不建议在笔记本电脑上运行大型回测。

参考：quantstrat - applyRules

Garrett's answer does point to the last major discussion on the R-SIG-Finance list where a related question was discussed.

The applyRules function in quantstrat is absolutely where most of the time is spent.

The while loop code copied in this question is the path-dependent part of the applyRules execution. I believe all of this is covered in the documentation, but I'll briefly review for SO posterity.

We construct a dimension reduction index inside applyRules so that we don't have to observe every timestamp and check it. We only take note of specific points in time where the strategy may reasonably be expected to act on the order book, or where orders may reasonably be expected to get filled.

This is state-dependent and path-dependent code. Idle talk of 'vectorization' doesn't make any sense in this context. If I need to know the current state of the market, the order book, and my position, and if my orders may be modified in a time-dependent manner by other rules, I don't see how this code can be vectorized.

From a computer science perspective, this is a state machine. State machines in almost every language I can think of are usually written as while loops. This isn't really negotiable or changeable.

The question asks if use of apply would help. apply statements in R are implemented as loops, so no, it wouldn't help. Even a parallel apply such as mclapply or foreach can't help because this is inside a state dependent part of the code. Evaluating different time points without regard to state doesn't make any sense. You'll note that the non-state-dependent parts of quantstrat are vectorized wherever possible, and account for very little of the running time.

The comment made by John suggests removing the for loop in ruleProc. All that the for loop does is check each rule associated with the strategy at this point in time. The only compute-intensive part of that loop is the do.call to call the rule function. The rest of the for loop is simply locating and matching arguments for these functions, and from code profiling, doesn't take much time at all. It would not make much sense to use a parallel apply here either, since the rule functions are applied in type order, so that cancels or risk directives can be applied before new entry directives. Much as mathematics has an order of operations, or a bank has a deposit/withdrawal processing order, quantstrat has a rule type evaluation order, as laid out in the documentation.

To speed up execution, there are four main things that can be done:

write a non-path dependent strategy: this is supported by the code, and simple strategies may be modeled this way. In this model you would write a custom rule function that calls addTxn directly when you think you should get your fills. It could be a vectorized function operating on your indicators/signals, and should be very fast.
preprocess your signals:if there are fewer places where the state machine needs to evaluate the state of the order book/rules/portfolio to see if it needs to do something, the speed increase is nearly linear with the reduction in signals. This is the area most users neglect, writing signal functions that don't really do evaluation of when action may be required that would modify positions or the order book.
explicitly parallelize parts of your analysis problem: I commonly write explicitly parallel wrappers to separate out different parameter evaluations or symbol evaluations, see applyParameter for an example using foreach
rewrite the state machine inside applyRules in C/C++: Patches welcome, but do see the link Garrett posted for additional details.

I can assure you that most strategies can run in a fraction of a core-minute per symbol per day per core on tick data, if a little care is taken to the signal generation functions. Running large backtests on a laptop is not recommended.

Ref: quantstrat - applyRules

回复收藏 0 原文

~没有更多了~