如何识别 C# 中给定方法中读取/写入的状态变量

发布于 2024-09-19 00:12:36 字数 549 浏览 5 评论 0原文

识别给定方法是否正在读取或写入成员变量或属性的最简单方法是什么? 我正在编写一个工具来协助 RPC 系统,在该系统中访问远程对象的成本很高。能够检测给定对象是否未在方法中使用可以让我们避免序列化其状态。在源代码上执行此操作是完全合理的(但能够在编译代码上执行此操作将是令人惊奇的)

我想我可以编写自己的简单解析器,我可以尝试使用现有的 C# 解析器之一并使用 AST。我不确定是否可以使用反射对程序集执行此操作。还有其他方法吗?什么是最简单的?

编辑:感谢您的快速回复。让我提供更多信息以使问题更清楚。我绝对更喜欢正确的,但它绝对不应该非常复杂。我的意思是,我们不能走得太远,检查极端情况或不可能情况(正如前面提到的传入代表,这是一个很好的观点)。检测这些情况并假设所有内容都可以使用而不是在那里进行优化就足够了。我认为这些情况相对不常见。 我们的想法是将该工具交给我们团队之外的开发人员,他们不应该关心这种优化。该工具获取他们的代码并为我们自己的 RPC 协议生成代理。 (我们仅使用 protobuf-net 进行序列化,但不使用 wcf 或 .net 远程处理)。因此,我们使用的任何东西都必须是免费的,否则我们将无法部署该工具来解决许可问题。

What is the simplest way to identify if a given method is reading or writing a member variable or property?
I am writing a tool to assist in an RPC system, in which access to remote objects is expensive. Being able to detect if a given object is not used in a method could allow us to avoid serializing its state. Doing it on source code is perfectly reasonable (but being able to do it on compiled code would be amazing)

I think I can either write my own simple parser, I can try to use one of the existing C# parsers and work with the AST. I am not sure if it is possible to do this with Assemblies using Reflection. Are there any other ways? What would be the simplest?

EDIT: Thanks for all the quick replies. Let me give some more information to make the question clearer. I definitely prefer correct, but it definitely shouldn't be extremely complex. What I mean is that we can't go too far checking for extremes or impossibles (as the passed-in delegates that were mentioned, which is a great point). It would be enough to detect those cases and assume everything could be used and not optimize there. I would assume that those cases would be relatively uncommon.
The idea is for this tool to be handed to developers outside of our team, that should not be concerned about this optimization. The tool takes their code and generates proxies for our own RPC protocol. (we are using protobuf-net for serialization only, but no wcf nor .net remoting). For this reason, anything we use has to be free or we wouldn't be able to deploy the tool for licensing issues.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

千柳 2024-09-26 00:12:37

您可以选择简单,也可以选择正确 - 您更喜欢哪一个?

最简单的方法是解析类和方法体。然后识别标记集,它们是类的属性和字段名称。方法主体中出现的这些标记的子集是您关心的属性和字段名称。

这种琐碎的分析当然是不正确的。如果您有

class C
{
    int Length;
    void M() { int x = "".Length; }
}

那么您会错误地得出 M 引用 C.Length 的结论。这是误报。

正确的方法是编写一个完整的 C# 编译器,并使用其语义分析器的输出来回答您的问题。这就是 IDE 实现“转到定义”等功能的方式。

You can have simple or you can have correct - which do you prefer?

The simplest way would be to parse the class and the method body. Then identify the set of tokens which are properties and field names of the class. The subset of those tokens which appears in the method body are the properties and field names you care about.

This trivial analysis of course is not correct. If you had

class C
{
    int Length;
    void M() { int x = "".Length; }
}

Then you would incorrectly conclude that M references C.Length. That's a false positive.

The correct way to do it is to write a full C# compiler, and use the output of its semantic analyzer to answer your question. That's how the IDE implements features like "go to definition".

财迷小姐 2024-09-26 00:12:37

在尝试自己编写这种逻辑之前,我会检查一下您是否可以利用 NDepend 来满足您的需求。

NDepend 是一个代码依赖性分析工具......等等。它实现了一个复杂的分析器来检查代码构造之间的关系,并且应该能够回答这个问题。如果我没记错的话,它还可以在源代码和 IL 上运行。

NDepend 公开了 CQL(代码查询语言),它允许您针对代码中结构之间的关系编写类似 SQL 的查询。 NDepend 对脚本有一定的支持,并且能够与您的构建过程集成。

Before attempting to write this kind of logic yourself, I would check to see if you can leverage NDepend to meet your needs.

NDepend is a code dependency analysis tool ... and much more. It implements a sophisticated analyzer for examining relationships between code constructs and should be able to answer that question. It also operates on both source and IL, if I'm not mistaken.

NDepend exposes CQL - Code Query Language - which allows you to write SQL-like queries against the relationships between structures in your code. NDepend has some support for scripting and is capable of being integrated with your build process.

自由如风 2024-09-26 00:12:37

要完成 LBushkin 对 NDepend 的回答(免责声明:我是该工具的开发人员之一),NDepend 确实可以在这方面为您提供帮助。下面的 代码 LINQ 查询 (CQLinq) 实际上匹配...

  • 不应该 的方法引发任何 RPC 调用,但
  • 正在读取/写入任何 RPC 类型的任何字段
  • 或者读取/写入任何 属性 >RPC类型,

注意我们首先如何定义4个集合:typesRPCfieldsRPCpropertiesRPCmethodsThatShouldntUseRPC - 然后我们匹配违反规则的方法。当然,这个 CQLinq 规则需要进行调整以匹配您自己的 typesRPCmethodsThatShouldntUseRPC

warnif count > 0

// First define what are types whose call are RDC
let typesRPC = Types.WithNameIn("MyRpcClass1", "MyRpcClass2")

// Define instance fields of RPC types
let fieldsRPC = typesRPC.ChildFields()
                .Where(f => !f.IsStatic).ToHashSet()

// Define instance properties getters and setters of RPC types
let propertiesRPC = typesRPC.ChildMethods()
                    .Where(m => !m.IsStatic && (m.IsPropertyGetter || m.IsPropertySetter))
                    .ToHashSet()


// Define methods that shouldn't provoke RPC calls
let methodsThatShouldntUseRPC = 
          Application.Methods.Where(m => m.NameLike("XYZ"))


// Filter method that should do any RPC call 
// but that is using any RPC fields (reading or writing) or properties
from m in methodsThatShouldntUseRPC.UsingAny(fieldsRPC).Union(
          methodsThatShouldntUseRPC.UsingAny(propertiesRPC))

let fieldsRPCUsed = m.FieldsUsed.Intersect(fieldsRPC )
let propertiesRPCUsed = m.MethodsCalled.Intersect(propertiesRPC)

select new { m, fieldsRPCUsed, propertiesRPCUsed  }

To complete the LBushkin answer on NDepend (Disclaimer: I am one of the developer of this tool), NDepend can indeed help you on that. The Code LINQ Query (CQLinq) below, actually match methods that...

  • shouldn't provoque any RPC calls but
  • that are reading/writing any fields of any RPC types,
  • or that are reading/writing any properties of any RPC types,

Notice how first we define the 4 sets: typesRPC, fieldsRPC, propertiesRPC, methodsThatShouldntUseRPC - and then we match methods that violate the rule. Of course this CQLinq rule needs to be adapted to match your own typesRPC and methodsThatShouldntUseRPC:

warnif count > 0

// First define what are types whose call are RDC
let typesRPC = Types.WithNameIn("MyRpcClass1", "MyRpcClass2")

// Define instance fields of RPC types
let fieldsRPC = typesRPC.ChildFields()
                .Where(f => !f.IsStatic).ToHashSet()

// Define instance properties getters and setters of RPC types
let propertiesRPC = typesRPC.ChildMethods()
                    .Where(m => !m.IsStatic && (m.IsPropertyGetter || m.IsPropertySetter))
                    .ToHashSet()


// Define methods that shouldn't provoke RPC calls
let methodsThatShouldntUseRPC = 
          Application.Methods.Where(m => m.NameLike("XYZ"))


// Filter method that should do any RPC call 
// but that is using any RPC fields (reading or writing) or properties
from m in methodsThatShouldntUseRPC.UsingAny(fieldsRPC).Union(
          methodsThatShouldntUseRPC.UsingAny(propertiesRPC))

let fieldsRPCUsed = m.FieldsUsed.Intersect(fieldsRPC )
let propertiesRPCUsed = m.MethodsCalled.Intersect(propertiesRPC)

select new { m, fieldsRPCUsed, propertiesRPCUsed  }
尹雨沫 2024-09-26 00:12:37

我的直觉是,检测哪些成员变量将被访问是错误的方法。我对实现此目的的第一个猜测是仅根据需要请求序列化对象(最好是在任何需要它们的函数开始时,而不是零散地)。请注意,如果这些请求连续快速发出并且很小,则 TCP/IP(即 Nagle 算法)应该将这些请求填充在一起

My intuition is that detecting which member variables will be accessed is the wrong approach. My first guess at a way to do this would be to just request serialized objects on an as-needed basis (preferably at the beginning of whatever function needs them, not piecemeal). Note that TCP/IP (i.e. Nagle's algorithm) should stuff these requests together if they are made in rapid succession and are small

时光礼记 2024-09-26 00:12:37

Eric 说得对:要做好这件事,您需要相当于编译器前端的东西。他没有充分强调的是对强大的流量分析能力的需求(或者愿意接受可能通过用户注释缓解的非常保守的答案)。也许他的意思是在“语义分析”一词中,尽管他的“转到定义”示例只需要符号表,而不是流分析。

普通的 C# 解析器只能用于获得非常保守的答案(例如,如果类 C 中的方法 A 包含标识符 X,则假设它读取类成员 em> X; 如果 A 不包含任何调用,那么您就知道它无法读取成员 X)。

除此之外的第一步是拥有编译器的符号表和类型信息(如果方法 A 直接引用类成员 X,则假设它读取成员 X;如果 A 不包含调用并提及< em>identifier X 仅在访问不属于此类类型的对象的上下文中,那么您就知道它无法读取成员 X)。您还必须担心合格的参考资料;如果 Q 与 C 兼容,QX 可以读取成员 X。

粘性点是调用,它可以隐藏任意操作。仅基于解析和符号表的分析可以确定,如果有调用,参数引用常量或不属于 A 可能表示的类(可能是继承的)的对象。

如果您发现一个具有 C 兼容类类型的参数,现在您必须确定该参数是否可以绑定到 this,需要控制和数据流分析:

   method A( ) {  Object q=this;
                     ...
                     ...q=that;...
                     ...
                     foo(q);
               }

foo 可能隐藏对 X 的访问。所以你需要两件事:流分析来确定对 q 的初始赋值是否可以到达调用 foo (它可能不会;q=that 可能主导对 foo 的所有调用),以及调用图分析来确定 foo 可能实际调用哪些方法,这样您就可以分析这些对成员 X 的访问。

您可以决定要走多远,只要您没有足够的信息来证明其他情况,只需做出保守的假设“A 读取 X”。这会给你一个“安全”的答案(如果不是“正确”或者我更愿意称之为“精确”)。

在可能有用的框架中,您可能会考虑 Mono,它肯定会解析和构建符号表。我不知道它为流程分析或调用图提取提供了哪些支持;我不希望 Mono-to-IL 前端编译器做很多这样的事情,因为人们通常将该机制隐藏在基于 JIT 的系统的 JIT 部分中。缺点是 Mono 可能落后于“现代 C#”曲线;上次我听说它只处理 C# 2.0,但我的信息可能已经过时了。

另一种选择是我们的 DMS 软件重新工程工具包 及其 C# 前端
(不是开源产品)。

DMS 提供通用源代码解析、树构建/检查/分析、通用符号表支持和内置机制,用于实现控制流分析、数据流分析、指向分析(“对象 O 指向什么?”所需)。 ),并构建调用图。该机器已经通过 DMS 的 Java 和 C 前端的严格测试,并且符号表支持已用于实现完整的 C++ 名称和类型解析,因此非常有效。 (您不想低估建造所有这些机器所需的工作量;我们自 1995 年以来一直致力于 DMS)。

C# 前端提供完整的 C# 4.0 解析和完整的树构建。它目前不为 C# 构建符号表(我们正在研究这个),与 Mono 相比这是一个缺点。然而,有了这样的符号表,您就可以访问所有流分析机制(已经使用 DMS 的 Java 和 C 前端进行了测试),如果 Mono 不提供这些功能,那么这可能是 Mono 的一大进步。

如果你想做好这件事,你面前有大量的工作要做。如果您想坚持“简单”,则只需解析树即可,并且可以非常保守。

您没有说太多关于了解方法是否写入给成员的信息。如果您要按照您所描述的方式最小化流量,您需要区分“读”、“写”和“更新”情况,并在两个方向上优化消息。对于各种情况的分析显然非常相似。

最后,您可能会考虑直接处理 MSIL 以获取您需要的信息;您仍然会遇到流量分析和保守分析问题。您可能会发现以下技术论文很有趣;它描述了一个完全分布式的 Java 对象系统,它必须执行与您想做的相同的基本分析,
IIRC 通过分析类文件并进行大量字节代码重写来做到这一点。
Java Orchestra 系统

Eric has it right: to do this well, you need what amounts to a compiler front end. What he didn't emphasize enough is the need for strong flow analysis capabilities (or a willingness to accept very conservative answers possibly alleviated by user annotations). Maybe he meant that in the phrase "semantic analysis" although his example of "goto definition" just needs a symbol table, not flow analysis.

A plain C# parser could only be used to get very conservative answers (e.g., if method A in class C contains identifier X, assume it reads class member X; if A contains no calls then you know it can't read member X).

The first step beyond this is having a compiler's symbol table and type information (if method A refers to class member X directly, then assume it reads member X; if A contains **no* calls and mentions identifier X only in the context of accesses to objects which are not of this class type then you know it can't read member X). You have to worry about qualified references, too; Q.X may read member X if Q is compatible with C.

The sticky point are calls, which can hide arbitrary actions. An analysis based on just parsing and symbol tables could determine that if there are calls, the arguments refer only to constants or to objects which are not of the class which A might represent (possibly inherited).

If you find an argument that has an C-compatible class type, now you have to determine whether that argument can be bound to this, requiring control and data flow analysis:

   method A( ) {  Object q=this;
                     ...
                     ...q=that;...
                     ...
                     foo(q);
               }

foo might hide an access to X. So you need two things: flow analysis to determine whether the initial assignment to q can reach the call foo (it might not; q=that may dominate all calls to foo), and call graph analysis to determine what methods foo might actually invoke, so that you can analyze those for accesses to member X.

You can decide how far you want to go with this simply making the conservative assumption "A reads X" anytime you don't have enough information to prove otherwise. This will you give you a "safe" answer (if not "correct" or what I'd prefer to call "precise").

Of frameworks that might be helpful, you might consider Mono, which surely parses and builds symbol tables. I don't know what support it provides for flow analysis or call graph extraction; I would not expect the Mono-to-IL front-end compiler to do a lot of that, as people usually hide that machinery in the JIT part of JIT-based systems. A downside is that Mono may be behind the "modern C#" curve; last time I heard, it handled only C# 2.0 but my information may be stale.

An alternative is our DMS Software Reengineering Toolkit and its C# Front End.
(Not an open source product).

DMS provides general source code parsing, tree building/inspection/analysis, general symbol table support and built-in machinery for implementing control-flow analysis, data flow analysis, points-to analysis (needed for "What does object O point to?"), and call graph construction. This machinery has all been tested by fire with DMS's Java and C front ends, and the symbol table support has been used to implement full C++ name and type resolution, so its pretty effective. (You don't want to underestimate the work it takes to build all that machinery; we've been working on DMS since 1995).

The C# Front End provides for full C# 4.0 parsing and full tree building. It presently does not build symbol tables for C# (we're working on this) and that's a shortcoming compared to Mono. With such a symbol table, however, you would have access to all that flow analysis machinery (which has been tested with DMS's Java and C front ends) and that might be a big step up from Mono if it doesn't provide that.

If you want to do this well, you have a considerable amount of work in front of you. If you want to stick with "simple", you'll have to do with just parsing the tree and being OK with being very conservative.

You didn't say much about knowing if a method wrote to a member. If you are going to minimize traffic the way you describe, you want to distinguish "read", "write" and "update" cases and optimize messages in both directions. The analysis is obviously pretty similar for the various cases.

Finally, you might consider processing MSIL directly to get the information you need; you'll still have the flow analysis and conservative analysis issues. You might find the following technical paper interesting; it describes a fully-distributed Java object system that has to do the same basic analysis you want to do,
and does so, IIRC, by analyzing class files and doing massive byte code rewriting.
Java Orchestra System

笨死的猪 2024-09-26 00:12:37

您所说的 RPC 是指 .NET Remoting 吗?还是DCOM?还是WCF?

所有这些都提供了通过接收器和其他结构监视跨进程通信和序列化的机会,但它们都是特定于平台的,因此您需要指定平台......

By RPC do you mean .NET Remoting? Or DCOM? Or WCF?

All of these offer the opportunity to monitor cross process communication and serialization via sinks and other constructs, but they are all platform specific, so you'll need to specify the platform...

最近可好 2024-09-26 00:12:37

您可以使用类似于 INotifyPropertyChanged(尽管您显然不知道哪个方法影响了读/写。)

You could listen for the event that a property is being read/written to with an interface similar to INotifyPropertyChanged (although you obviously won't know which method effected the read/write.)

梦亿 2024-09-26 00:12:37

我认为你能做的最好的事情就是明确地维护一个脏标志。

I think the best you can do is explicitly maintain a dirty flag.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文