使用相同的参考变量与单独的变量。 .Net 中引用变量的消耗
我不确定这个问题是否愚蠢,但我真的很想得到一些意见。我将直接提供一个我目前正在做的事情的例子。
我正在使用 HtmlAgilityPack 来解析一些 HTML 页面。
我的代码中有一个方法,它接收 HtmlDocument 并进行解析。目前,代码如下:
private void OperateOnDocument (HtmlDocument pageSource)
{
HtmlNode node;
node = pageSource.DocumentNode.SelectSingleNode (/*XPath to find a node */);
// do some operation on the extracted HtmlNode.
node = pageSource.DocumentNode.SelectSingleNode (/* XPath to find another node */);
// do some operation on the newly acquired node.
// Likewise, reuse the same reference variable "node" to extract all the nodes and operate on them.
}
优点:使用单个引用变量对所有节点进行操作。
缺点:您不知道单个解析提取哪个节点,因为所有提取的节点都有不同的值和目的(一个节点可能包含费率,另一个节点可能包含城市名称。)
另一种方法是:
private void OperateOnDocument (HtmlDocument pageSource)
{
HtmlNode idNode = pageSource.DocumentNode.SelectSingleNode (/*XPath to find the node containing id*/);
// parse the text and store it in string.
HtmlNode rateNode = pageSource.DocumentNode.SelectSingleNode (/* XPath to find the node containing the rates */);
// parse the text into decimal.
// Likewise, use separte meaningful names of reference variables to extract all the nodes and operate on them.
}
优点:为每个节点单独提供有意义的名称正在运行的节点。
缺点:有时,操作数量可能会达到 10 个;因此,有 10 个不同的参考变量。
那么,我的问题是第二种方法是否真的值得做?或者由于参考变量太多,它可能会有点昂贵? .Net 中的此类变量有多昂贵?或者一般的任何其他语言/框架?
I am not sure if this question is not silly, but I really want to get some opinions. I will directly provide an example of what I am currently working on.
I am using HtmlAgilityPack to parse some HTML pages.
I have a method in my code which receives the HtmlDocument and does the parsing. Currently, the code is like this :
private void OperateOnDocument (HtmlDocument pageSource)
{
HtmlNode node;
node = pageSource.DocumentNode.SelectSingleNode (/*XPath to find a node */);
// do some operation on the extracted HtmlNode.
node = pageSource.DocumentNode.SelectSingleNode (/* XPath to find another node */);
// do some operation on the newly acquired node.
// Likewise, reuse the same reference variable "node" to extract all the nodes and operate on them.
}
Pros : A single reference variable is used to operate on all the nodes.
Cons : You dont know what node an individual parsing extracts, as all the extracted nodes have different value and purpose (One node may contain rate, another may contain city name.)
Another way of doing this is :
private void OperateOnDocument (HtmlDocument pageSource)
{
HtmlNode idNode = pageSource.DocumentNode.SelectSingleNode (/*XPath to find the node containing id*/);
// parse the text and store it in string.
HtmlNode rateNode = pageSource.DocumentNode.SelectSingleNode (/* XPath to find the node containing the rates */);
// parse the text into decimal.
// Likewise, use separte meaningful names of reference variables to extract all the nodes and operate on them.
}
Pros : Separate meaningful name to each node being operated.
Cons : At times, the number of operations may mount to 10; hence, 10 different reference variables.
So, my question is if the second method is really worth doing? Or it can prove a bit expensive because of so many reference variables? How expensive are such variables in .Net? or any other language/framework in general?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果你担心做某事的成本,那就衡量一下。如果你无法衡量差异,那么差异就不可能重要(如果重要,你就能够衡量它)。
为了回答您更具体的问题,变量没有任何内在成本。 CPU 上实际执行的代码中不存在变量。它们是源代码级构造,编译器没有义务在源代码变量和执行的
push
指令或使用的寄存器或任何内容之间创建某种 1 对 1 的映射别的。代码中的单个变量可以对应于多个不同寄存器的使用(编译器可能会选择不时地移动它),或者多个不同的变量可能最终都会共享同一个寄存器,因为它们从未在同一时间。编译器所做的许多事情之一就是确定变量的生命周期。第一次使用是什么时候,最后一次使用是什么时候?您何时声明它,或者何时超出范围并不重要。
编译器通常用来将代码转换为更适合优化的形式的转换是将每个赋值转换为一个新的、唯一的变量,基本上就像第二个示例中一样。这就是编译器更喜欢处理代码的方式,因为现在它可以清楚地看到依赖关系。在第一种情况下,看起来同一变量必须存在于整个函数中,并且可以重用于两个操作。在第二种情况下,我们可以很容易地看到它们是单独的、独立的变量,并且我们可以看到它们的生命周期不重叠(在声明
rateNode< 后不使用
idNode
) /代码>)。两个版本可能会生成完全相同的代码,但第二个版本更接近优化编译器在内部对代码所做的操作。
If you're worried about the cost of doing something, then measure it. If you can't measure a difference, then the difference can not possibly matter (if it mattered, you'd be able to measure it).
To answer your more specific question, a variable does not have any intrinsic cost. Variables do not exist in the code that is actually executed on the CPU. They are a source-level construct, and the compiler is not obliged to create some kind of 1-to-1 mapping between source code variables and, say,
push
instructions executed, or registers used, or anything else. A single variable in your code can correspond to the use of several different registers (the compiler might choose to move it around from time to time), or several different variables might all end up sharing the same register because they're never used at the same time.One of the many things a compiler does is determine the lifetime of a variable. When is it first used, and when is it last used? It doesn't matter when you declared it, or when it goes out of scope.
A transformation commonly used by compilers to put your code into a form more amenable to optimizations is to turn every assignment into a new, unique, variable, basically like in your second example. That's how the compiler prefers to work with the code, because now it can see the dependencies clearly. In the first case, it looks as if the same variable has to exist throughout the function, and be reused for both operations. In the second case, we can easily see that they're separate, independent, variables, and we can see that their lifetimes do not overlap (
idNode
is not used after the declaration ofrateNode
).Both versions will likely produce the exact same code, but the second version is closer to what optimizing compilers do to the code internally.
我的理解是,这并不重要。编译器(在发布模式下!)对变量的使用有很好的了解。它应该能够检测到在分配
rateNode
时不再使用idNode
,因此甚至可能重新使用该内存位置。另请参阅此问题。
My understanding is that it doesn't matter very much. The compiler (in release mode!) has a good idea of the usage of a variable. It should be able to detect that
idNode
is no longer used at the point where you assignrateNode
, so might even re-use that memory location.See also this question.