表达式树依赖性分析器
我正在为跨数据源 IQueryProvider 构建表达式树依赖性分析器。
也就是说,我有一个 IQueryable,其中包含一些元素,可以针对某个任意提供程序(例如实体框架)在内存中本地执行。 IQueryable 中的一些其他元素与我需要进行远程 WCF 调用的实体相冲突。 WCF 操作采用序列化表达式树,将其反序列化,针对其自己的本地数据存储(也可以说实体框架)执行 LINQ 查询,然后将结果发回给我(尽管此机制可以很容易地成为 WCF 数据服务) DataServiceQuery...但我没有使用它,因为它的功能支持水平是有限的...充其量)。一旦我从 WCF 服务返回结果,我将根据本地执行的 LINQ 查询在内存中执行 LINQ 查询的结果。
那么,这有什么难的呢?好吧,我需要确定表达式树的依赖关系,以便我的本地底层查询提供程序在尝试执行我的 LINQ 查询时不会发生爆炸,该查询具有只能在远程 WCF 服务上执行的组件...反之亦然。
让我们看一个简单的场景:
var result =
(from entityX in new Query<MyEntityX>()
from entityY in new Query<MyEntityY>()
where entityX.SomeProperty == "Hello" &&
entityY.SomeOtherProperty == "Hello 2" && entityX.Id == entityY.XId).ToList();
Query
是一个简单的可查询包装器,带有我自己的提供程序,它有机会解析树并弄清楚在使用不同的查询提供程序交换根之前要做什么。因此,在上述情况下,我需要:
使用本地对象上下文对 MyEntityA 执行查询,并仅应用
myEntityX.SomeProperty == "Hello"
标准。也就是说,在本地运行以下命令:// 假定替换新 Query
的功能与新
// ObjectContext() 已经存在...
var resultX =(来自新查询中的entityX()
其中entityX.SomeProperty == "Hello").ToList().AsQueryable();发送以下序列化内容并让它在我的远程WCF服务上执行,然后返回结果。
// 通过线路发送前面的表达式
// 并获取结果(相信我的话,这已经有效了)
变量结果Y = (来自 new Query() 中的实体Y
其中entityY.SomeOtherProperty == "Hello 2").ToList().AsQueryable();在内存中执行以下内容:
var 最终结果 = (来自 resultX 中的实体X
来自结果Y中的实体Y
其中entityX.SomeProperty == "Hello" &&
entityY.SomeOtherProperty ==“你好2”&&
EntityX.Id ==EntityY.XId).ToList();
请注意,该解决方案必须包含一种累积在投影之外指定的条件的方法...就像
var result =
(from i in
(from entityX in new Query<MyEntityX>()
from entityY in new Query<MyEntityY>()
select new { PropX = entityX, PropY = entityY })
where
i.PropX.SomeProperty == "Hello" && i.PropY.SomeOtherProperty == "Hello 2"
&& i.PropX.Id == i.PropY.XId
select i)
.ToList();
这应该会产生与上面相同的两个单独查询在内存中评估其余部分之前实际发出。在一个不相关的说明中,我想我可能会使用 PLINQ 和/或 DRYAD 来运行内存中的操作,以提高性能。
所以,我有一些想法(比如与访问者一起对树进行一些传递并积累给定实体类型的候选者),但我正在寻找其他人关于如何积累我的表达树的部分的建议针对给定的上下文执行...也就是说,知道一个条件适用于一个底层的新 Query
而另一个条件适用于另一个条件...这样我就可以弄清楚什么我可以对数据存储 1 执行操作、对数据存储 2 执行操作以及需要在内存中执行操作,并相应地执行树的不同部分。它有点像 Funcletizer,但更复杂一点......
感谢您的帮助。
I'm building an expression tree dependency analyzer for a cross data source IQueryProvider.
That is, I have an IQueryable with some elements that can be executed locally in memory against some arbitrary provider (say Entity Framework). Some other elements in the IQueryable go against an entity that I need to make a remote WCF call. The WCF operation takes a serialized expression tree, will deserialize it, execute the LINQ query against its own local data store (lets also say Entity Framework), then send me back the results (though this mechanism could just as easily be a WCF Data Services DataServiceQuery...but I'm not using it because it's level of functional support is limited...at best). Once I get the results back from the WCF service, I will perform the result of the LINQ query in memory against the locally executed LINQ query.
So, what's so hard about that? Well, I need to determine the dependencies of the expression tree, so that my local underlying Query Provider won't explode trying to execute my LINQ query which has components that can only be executed on the remote WCF service...and vice versa.
Let's take the simple scenario:
var result =
(from entityX in new Query<MyEntityX>()
from entityY in new Query<MyEntityY>()
where entityX.SomeProperty == "Hello" &&
entityY.SomeOtherProperty == "Hello 2" && entityX.Id == entityY.XId).ToList();
Query<T>
is a simple queryable wrapper with my own provider which has the chance to parse the tree an figure out what to do before swapping out roots with a different query provider. So, in the above case I need to:
Execute the query against MyEntityA using a local object context and apply only the
myEntityX.SomeProperty == "Hello"
criteria. That is, run the following locally:// assume the functionality for replacing new Query<MyEntityA> with new
// ObjectContext<MyEntityA>() is already there...
var resultX = (from entityX in new Query<MyEntityX>()
where entityX.SomeProperty == "Hello").ToList().AsQueryable();Send over the following serialized and have it execute on my remote WCF service, then get the results back.
// Send the preceeding expression over the over the wire
// and get the results back (just take my word this already works)
var resultY =
(from entityY in new Query<MyEntityY>()
where entityY.SomeOtherProperty == "Hello 2").ToList().AsQueryable();Execute the following in memory:
var finalResult =
(from entityX in resultX
from entityY in resultY
where entityX.SomeProperty == "Hello" &&
entityY.SomeOtherProperty == "Hello 2" &&
entityX.Id == entityY.XId).ToList();
Note that the solution must incorporate a way of accumulating criteria that is specified off of projections too...like
var result =
(from i in
(from entityX in new Query<MyEntityX>()
from entityY in new Query<MyEntityY>()
select new { PropX = entityX, PropY = entityY })
where
i.PropX.SomeProperty == "Hello" && i.PropY.SomeOtherProperty == "Hello 2"
&& i.PropX.Id == i.PropY.XId
select i)
.ToList();
This should result in the same two individual queries above being actually issued before the rest is evaluated in memory. On an unrelated note, I think I will probably used PLINQ and or DRYAD for running the in memory operations with improved performance.
So, I have some ideas (like doing some passes over the tree with a visitor and accumulating candidates for a given entity type), but I'm looking for some other peoples' suggestions about how to accumulate the parts my expression tree that can be executed against a given context...that is, knowing that one where criteria applies to one underlying new Query<T>
and another criteria applies to a different one...so that I can figure out what I can do against data store 1, what I can do against data store 2 and what I need to do in memory, and execute the different parts of the tree accordingly. It's sort of like a Funcletizer, but a bit more complex...
Thanks for any assistance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一个有趣的问题。我会考虑一种由几个阶段组成的方法:
下面给出了每个阶段的更多详细信息。我的答案末尾的“备注”部分提供了一些需要考虑的重要注释。
免责声明:我的回答相当示意性,我确信它没有涵盖与表达式树中允许的各个操作的语义相关的许多方面和情况。我认为必须做出某些妥协才能使实施变得相当简单。
阶段 1:表达式分析和标记
表达式树中的每个节点都可以被认为属于以下类别:
用于遍历和处理表达式树的自下而上的方法似乎适合这种情况。原因是当处理给定节点X时,具有子节点Y_1至Y_n,节点X的类别严重依赖于其子节点Y_1至T_n的类别。
让我们将您提供的示例重写
为相应表达式树的轮廓:
然后该表达式树将被自下而上标记。
R
表示“远程”,L
表示“本地”,N
表示“中性”。假设entityX
是远程的,entityY
是本地的,结果将如下所示:如您所见,对于每个运算符,您的分析器必须根据其类别来决定类别它的子节点。在上面的例子中:
但是,您可以考虑将自下而上的方法与自上而下的优化传递相结合,以获得更好的结果。考虑这个(象征性的):
R == R + L
。您想如何执行相等比较?使用纯粹的自下而上的方法,您可以在本地执行它。然而,在某些情况下,最好在本地预先计算L
,用实际值(即中性)替换子表达式并远程执行相等比较。换句话说,您最终可以实现查询计划优化器。第 2 阶段:识别远程子表达式
在下一阶段,将自上而下处理标记的表达式树,并将每个标记为远程的子表达式从树中取出,并列入为远程数据中的每个项目远程评估的表达式组中放。
从上面可以清楚地看出,某些远程子表达式将封装本地子表达式。因此,本地子表达式可能包含远程子表达式。只有中性节点才应表示在类别方面同质的子表达式。
因此,可能需要通过与远程查询处理器的多次往返来执行给定查询。另一种方法是允许查询处理器之间进行双向通信,以便“远程”处理器可以识别“本地”(从其角度来看实际上是“远程”)子表达式并回调“本地”处理器来执行它。
第 3 阶段:远程查询执行
在第三阶段,远程子表达式列表将被发送到“远程”查询处理器以执行。 (另请参阅上一阶段中的讨论。)
问题还在于,如何识别可用于有效限制远程查询处理器返回的结果数据集的子表达式。为此,必须考虑表达式树中顶级运算符的语义(通常是
&&
和||
)。&&
和||
的短路计算使事情变得有点复杂,因为查询预处理器可能不会重新排序操作数。第 4 阶段:本地查询执行
当执行所有远程子表达式时,它们在原始表达式树中的出现将被收集的结果替换。
备注
您最终可能需要限制“远程”子树中仅允许某些操作,以降低处理复杂性 - 这将是实现查询预处理器所花费的功能和时间之间的权衡。< /p>
要处理数据别名(如您提供的
PropX =entityX … i.PropX.SomeProperty == "Hello"
示例中的情况),您必须执行数据流分析。在这里,您很可能会遇到一组过于复杂而不值得处理的情况。This is an interesting problem. I would consider an approach consisting of several phases:
The following gives more details for each phase. The Remarks section at the end of my answer provides some important notes to consider.
Disclaimer: My answer is rather schematic and I'm sure it doesn't cover a lot of aspects and cases that may happen in respect to the semantics of individual operations allowed in an expression tree. I think certain compromises will have to be made to make the implementation reasonably simple.
Phase 1: Expression analysis and tagging
Each node in the expression tree can be considered to fall within the following categories:
A bottom-up approach for traversing and processing the expression tree seems as appropriate for this case. The reason is when processing a given node X, having subnodes Y_1 to Y_n the category of the node X heavily depends on the categories of its subnodes Y_1 to T_n.
Let's rewrite the sample you provided:
into an outline of the corresponding expression tree:
This expression tree will then be tagged bottom-up.
R
for “remote”,L
for “local”,N
for “neutral”. ProvidingentityX
is remote andentityY
is local the result will look like this:As you can see, for each operator your analyzer will have to decide the category based on the categories of its subnodes. In the example above:
However, you might consider combining the bottom-up approach with a top-down optimization pass to get better results. Consider this (symbolic):
R == R + L
. How do you want to execute the equality comparison? With a pure bottom-up approach you'd execute it locally. However, in some situations it might be better to precalculateL
locally, replace the subexpression with an actual value (i.e. neutral) and execute the equality comparison remotely. In other words, you can end-up implementing a query plan optimizer.Phase 2: Identification of remote subexpressions
In the next phase, the tagged expression tree will be processed top-down and each subexpression marked as remote taken out of the tree, and enlisted among the set of expressions evaluated remotely for each item in the remote data set.
From the above it's clear that certain remote subexpressions will encapsulate local subexpression. And, consequently, local subexpressions may contain remote subexpressions. Only neutral nodes shall represent subexpressions that are homogeneous it terms of category.
Hence it may be necessary to execute a given query with several round-trips to the remote query processor. An alternate approach would be to allow bi-directional communications between the query processors, so that the “remote” processor can identify a “local” (actually “remote” in from its point of view) subexpression and call back the “local” processor to execute it.
Phase 3: Remote query execution
In the third phase the list of remote subexpressions will be sent to the “remote” query processor for execution. (See also discussion in the previous phase.)
The question also is, how to identify subexpressions that can be used to effectively limit the resulting data set returned by the remote query processor. To do this, the semantics of top-level operators in the expression tree (usually
&&
and||
) have to be taken into account. Short-circuit evaluation of&&
and||
complicates the things a bit because the query preprocessor may not reorder operands.Phase 4: Local query execution
When all remote subexpression are executed, their occurrences in the original expression tree are replaced with gathered results.
Remarks
You may end up with the necessity to limit only certain operations to be allowed in “remote” subtrees to reduce processing complexity — it will be a trade-off between capabilities and time spent on implementing the query pre-processor.
To handle data aliasing (like in the
PropX = entityX … i.PropX.SomeProperty == "Hello"
example you provided) you will have to perform data flow analysis. Here you will most likely end-up with a set of cases that will be to complicated to be worth handling.