R quantstrat 代码中的 while 循环 - 如何使其更快?
在 quantstrat 包中,我找到了 applyRule 函数缓慢的罪魁祸首之一,并想知道是否有更有效的方法来编写 while 循环。任何反馈都会有帮助。对于任何有将这部分封装到 Parallel R 中的经验的人来说。
作为一个选项 apply 可以工作吗?或者我应该将这部分重写为新函数,例如ruleProc和nextIndex?我也在研究 Rcpp,但这可能有点困难。非常感谢任何帮助和建设性建议?
while (curIndex) {
timestamp = Dates[curIndex]
if (isTRUE(hold) & holdtill < timestamp) {
hold = FALSE
holdtill = NULL
}
types <- sort(factor(names(strategy$rules), levels = c("pre",
"risk", "order", "rebalance", "exit", "enter", "entry",
"post")))
for (type in types) {
switch(type, pre = {
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules$pre, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
}, risk = {
if (length(strategy$rules$risk) >= 1) {
ruleProc(strategy$rules$risk, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
}, order = {
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules[[type]], timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr,)
} else {
if (isTRUE(path.dep)) {
timespan <- paste("::", timestamp, sep = "")
} else timespan = NULL
ruleOrderProc(portfolio = portfolio, symbol = symbol,
mktdata = mktdata, timespan = timespan)
}
}, rebalance = , exit = , enter = , entry = {
if (isTRUE(hold)) next()
if (type == "exit") {
if (getPosQty(Portfolio = portfolio, Symbol = symbol,
Date = timestamp) == 0) next()
}
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules[[type]], timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
if (isTRUE(path.dep) && length(getOrders(portfolio = portfolio,
symbol = symbol, status = "open", timespan = timestamp,
which.i = TRUE))) {
}
}, post = {
if (length(strategy$rules$post) >= 1) {
ruleProc(strategy$rules$post, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
})
}
if (isTRUE(path.dep))
curIndex <- nextIndex(curIndex)
else curIndex = FALSE
}
In quantstrat package I have located one of the main culprits for slowness of the applyRule function and wonder if there is more efficient to write the while loop. Any feedback would be helpful. For anyone experience wrapping this part into Parallel R.
As an option apply would work instead while? Or should I re-write this part into new function such as ruleProc and nextIndex? I am also dveling on Rcpp but that may be a streach. Any help and constructive advice is much appreciated?
while (curIndex) {
timestamp = Dates[curIndex]
if (isTRUE(hold) & holdtill < timestamp) {
hold = FALSE
holdtill = NULL
}
types <- sort(factor(names(strategy$rules), levels = c("pre",
"risk", "order", "rebalance", "exit", "enter", "entry",
"post")))
for (type in types) {
switch(type, pre = {
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules$pre, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
}, risk = {
if (length(strategy$rules$risk) >= 1) {
ruleProc(strategy$rules$risk, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
}, order = {
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules[[type]], timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr,)
} else {
if (isTRUE(path.dep)) {
timespan <- paste("::", timestamp, sep = "")
} else timespan = NULL
ruleOrderProc(portfolio = portfolio, symbol = symbol,
mktdata = mktdata, timespan = timespan)
}
}, rebalance = , exit = , enter = , entry = {
if (isTRUE(hold)) next()
if (type == "exit") {
if (getPosQty(Portfolio = portfolio, Symbol = symbol,
Date = timestamp) == 0) next()
}
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules[[type]], timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
if (isTRUE(path.dep) && length(getOrders(portfolio = portfolio,
symbol = symbol, status = "open", timespan = timestamp,
which.i = TRUE))) {
}
}, post = {
if (length(strategy$rules$post) >= 1) {
ruleProc(strategy$rules$post, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
})
}
if (isTRUE(path.dep))
curIndex <- nextIndex(curIndex)
else curIndex = FALSE
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Garrett 的回答确实指出了 R-SIG-Finance 列表上的最后一次主要讨论,其中讨论了相关问题。
quantstrat 中的 applyRules 函数绝对是花费最多时间的地方。
本问题中复制的 while 循环代码是 applyRules 执行的路径相关部分。我相信所有这些都包含在文档中,但我将简要回顾一下后代。
我们在applyRules内部构造了一个降维索引,这样我们就不必观察每个时间戳并检查它。我们仅记录策略可合理预期对订单簿起作用或订单可合理预期被执行的特定时间点。
这是状态相关和路径相关代码。在这种情况下,空谈“矢量化”没有任何意义。如果我需要了解市场的当前状态、订单簿和我的头寸,并且如果我的订单可能会被其他规则以依赖于时间的方式修改,我不知道如何对这段代码进行矢量化。
从计算机科学的角度来看,这是一个状态机。我能想到的几乎所有语言中的状态机通常都写成 while 循环。这并不是真正可以协商或改变的。
问题询问使用 apply 是否有帮助。 R 中的 apply 语句是作为循环实现的,所以不,它没有帮助。即使是诸如 mclapply 或 foreach 之类的并行应用也无济于事,因为这是在代码的状态相关部分内。不考虑状态来评估不同的时间点是没有任何意义的。您会注意到,Quantstrat 的非状态相关部分会尽可能进行矢量化,并且只占用很少的运行时间。
John 的评论建议删除 ruleProc 中的 for 循环。 for 循环所做的只是检查此时与策略关联的每个规则。该循环中唯一的计算密集型部分是调用规则函数的do.call。 for 循环的其余部分只是简单地查找和匹配这些函数的参数,并且从代码分析来看,根本不需要太多时间。在这里使用并行应用也没有多大意义,因为规则函数是按类型顺序应用的,因此可以在新的进入指令之前应用取消或风险指令。就像数学有一个运算顺序,或者银行有一个存款/取款处理顺序一样,Quantstrat 有一个规则类型评估顺序,如文档中所述。
为了加快执行速度,可以做四件主要的事情:
我可以向您保证,如果对信号生成函数稍加注意,大多数策略可以在每个交易品种每天每个核心的一小部分核心分钟内运行。不建议在笔记本电脑上运行大型回测。
参考:quantstrat - applyRules
Garrett's answer does point to the last major discussion on the R-SIG-Finance list where a related question was discussed.
The applyRules function in quantstrat is absolutely where most of the time is spent.
The while loop code copied in this question is the path-dependent part of the applyRules execution. I believe all of this is covered in the documentation, but I'll briefly review for SO posterity.
We construct a dimension reduction index inside applyRules so that we don't have to observe every timestamp and check it. We only take note of specific points in time where the strategy may reasonably be expected to act on the order book, or where orders may reasonably be expected to get filled.
This is state-dependent and path-dependent code. Idle talk of 'vectorization' doesn't make any sense in this context. If I need to know the current state of the market, the order book, and my position, and if my orders may be modified in a time-dependent manner by other rules, I don't see how this code can be vectorized.
From a computer science perspective, this is a state machine. State machines in almost every language I can think of are usually written as while loops. This isn't really negotiable or changeable.
The question asks if use of apply would help. apply statements in R are implemented as loops, so no, it wouldn't help. Even a parallel apply such as mclapply or foreach can't help because this is inside a state dependent part of the code. Evaluating different time points without regard to state doesn't make any sense. You'll note that the non-state-dependent parts of quantstrat are vectorized wherever possible, and account for very little of the running time.
The comment made by John suggests removing the for loop in ruleProc. All that the for loop does is check each rule associated with the strategy at this point in time. The only compute-intensive part of that loop is the do.call to call the rule function. The rest of the for loop is simply locating and matching arguments for these functions, and from code profiling, doesn't take much time at all. It would not make much sense to use a parallel apply here either, since the rule functions are applied in type order, so that cancels or risk directives can be applied before new entry directives. Much as mathematics has an order of operations, or a bank has a deposit/withdrawal processing order, quantstrat has a rule type evaluation order, as laid out in the documentation.
To speed up execution, there are four main things that can be done:
I can assure you that most strategies can run in a fraction of a core-minute per symbol per day per core on tick data, if a little care is taken to the signal generation functions. Running large backtests on a laptop is not recommended.
Ref: quantstrat - applyRules