如何识别 C# 中给定方法中读取/写入的状态变量

发布于 2024-09-19 00:12:36 字数 549 浏览 5 评论 0原文

识别给定方法是否正在读取或写入成员变量或属性的最简单方法是什么？我正在编写一个工具来协助 RPC 系统，在该系统中访问远程对象的成本很高。能够检测给定对象是否未在方法中使用可以让我们避免序列化其状态。在源代码上执行此操作是完全合理的（但能够在编译代码上执行此操作将是令人惊奇的）

我想我可以编写自己的简单解析器，我可以尝试使用现有的 C# 解析器之一并使用 AST。我不确定是否可以使用反射对程序集执行此操作。还有其他方法吗？什么是最简单的？

编辑：感谢您的快速回复。让我提供更多信息以使问题更清楚。我绝对更喜欢正确的，但它绝对不应该非常复杂。我的意思是，我们不能走得太远，检查极端情况或不可能情况（正如前面提到的传入代表，这是一个很好的观点）。检测这些情况并假设所有内容都可以使用而不是在那里进行优化就足够了。我认为这些情况相对不常见。我们的想法是将该工具交给我们团队之外的开发人员，他们不应该关心这种优化。该工具获取他们的代码并为我们自己的 RPC 协议生成代理。（我们仅使用 protobuf-net 进行序列化，但不使用 wcf 或 .net 远程处理）。因此，我们使用的任何东西都必须是免费的，否则我们将无法部署该工具来解决许可问题。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千柳 2024-09-26 00:12:37

您可以选择简单，也可以选择正确 - 您更喜欢哪一个？

最简单的方法是解析类和方法体。然后识别标记集，它们是类的属性和字段名称。方法主体中出现的这些标记的子集是您关心的属性和字段名称。

这种琐碎的分析当然是不正确的。如果您有

class C
{
    int Length;
    void M() { int x = "".Length; }
}

那么您会错误地得出 M 引用 C.Length 的结论。这是误报。

正确的方法是编写一个完整的 C# 编译器，并使用其语义分析器的输出来回答您的问题。这就是 IDE 实现“转到定义”等功能的方式。

You can have simple or you can have correct - which do you prefer?

The simplest way would be to parse the class and the method body. Then identify the set of tokens which are properties and field names of the class. The subset of those tokens which appears in the method body are the properties and field names you care about.

This trivial analysis of course is not correct. If you had

class C
{
    int Length;
    void M() { int x = "".Length; }
}

Then you would incorrectly conclude that M references C.Length. That's a false positive.

The correct way to do it is to write a full C# compiler, and use the output of its semantic analyzer to answer your question. That's how the IDE implements features like "go to definition".

回复收藏 0 原文

财迷小姐 2024-09-26 00:12:37

在尝试自己编写这种逻辑之前，我会检查一下您是否可以利用 NDepend 来满足您的需求。

NDepend 是一个代码依赖性分析工具......等等。它实现了一个复杂的分析器来检查代码构造之间的关系，并且应该能够回答这个问题。如果我没记错的话，它还可以在源代码和 IL 上运行。

NDepend 公开了 CQL（代码查询语言），它允许您针对代码中结构之间的关系编写类似 SQL 的查询。 NDepend 对脚本有一定的支持，并且能够与您的构建过程集成。

回复收藏 0 原文

自由如风 2024-09-26 00:12:37

要完成 LBushkin 对 NDepend 的回答（免责声明：我是该工具的开发人员之一），NDepend 确实可以在这方面为您提供帮助。下面的代码 LINQ 查询 (CQLinq) 实际上匹配...

不应该的方法引发任何 RPC 调用，但
正在读取/写入任何 RPC 类型的任何字段，
或者读取/写入任何属性 >RPC类型，

注意我们首先如何定义4个集合：typesRPC、fieldsRPC、propertiesRPC、methodsThatShouldntUseRPC - 然后我们匹配违反规则的方法。当然，这个 CQLinq 规则需要进行调整以匹配您自己的 typesRPC 和 methodsThatShouldntUseRPC：

warnif count > 0

// First define what are types whose call are RDC
let typesRPC = Types.WithNameIn("MyRpcClass1", "MyRpcClass2")

// Define instance fields of RPC types
let fieldsRPC = typesRPC.ChildFields()
                .Where(f => !f.IsStatic).ToHashSet()

// Define instance properties getters and setters of RPC types
let propertiesRPC = typesRPC.ChildMethods()
                    .Where(m => !m.IsStatic && (m.IsPropertyGetter || m.IsPropertySetter))
                    .ToHashSet()


// Define methods that shouldn't provoke RPC calls
let methodsThatShouldntUseRPC = 
          Application.Methods.Where(m => m.NameLike("XYZ"))


// Filter method that should do any RPC call 
// but that is using any RPC fields (reading or writing) or properties
from m in methodsThatShouldntUseRPC.UsingAny(fieldsRPC).Union(
          methodsThatShouldntUseRPC.UsingAny(propertiesRPC))

let fieldsRPCUsed = m.FieldsUsed.Intersect(fieldsRPC )
let propertiesRPCUsed = m.MethodsCalled.Intersect(propertiesRPC)

select new { m, fieldsRPCUsed, propertiesRPCUsed  }

To complete the LBushkin answer on NDepend (Disclaimer: I am one of the developer of this tool), NDepend can indeed help you on that. The Code LINQ Query (CQLinq) below, actually match methods that...

shouldn't provoque any RPC calls but
that are reading/writing any fields of any RPC types,
or that are reading/writing any properties of any RPC types,

Notice how first we define the 4 sets: typesRPC, fieldsRPC, propertiesRPC, methodsThatShouldntUseRPC - and then we match methods that violate the rule. Of course this CQLinq rule needs to be adapted to match your own typesRPC and methodsThatShouldntUseRPC:

warnif count > 0

// First define what are types whose call are RDC
let typesRPC = Types.WithNameIn("MyRpcClass1", "MyRpcClass2")

// Define instance fields of RPC types
let fieldsRPC = typesRPC.ChildFields()
                .Where(f => !f.IsStatic).ToHashSet()

// Define instance properties getters and setters of RPC types
let propertiesRPC = typesRPC.ChildMethods()
                    .Where(m => !m.IsStatic && (m.IsPropertyGetter || m.IsPropertySetter))
                    .ToHashSet()


// Define methods that shouldn't provoke RPC calls
let methodsThatShouldntUseRPC = 
          Application.Methods.Where(m => m.NameLike("XYZ"))


// Filter method that should do any RPC call 
// but that is using any RPC fields (reading or writing) or properties
from m in methodsThatShouldntUseRPC.UsingAny(fieldsRPC).Union(
          methodsThatShouldntUseRPC.UsingAny(propertiesRPC))

let fieldsRPCUsed = m.FieldsUsed.Intersect(fieldsRPC )
let propertiesRPCUsed = m.MethodsCalled.Intersect(propertiesRPC)

select new { m, fieldsRPCUsed, propertiesRPCUsed  }

回复收藏 0 原文

尹雨沫 2024-09-26 00:12:37

我的直觉是，检测哪些成员变量将被访问是错误的方法。我对实现此目的的第一个猜测是仅根据需要请求序列化对象（最好是在任何需要它们的函数开始时，而不是零散地）。请注意，如果这些请求连续快速发出并且很小，则 TCP/IP（即 Nagle 算法）应该将这些请求填充在一起

回复收藏 0 原文

时光礼记 2024-09-26 00:12:37

Eric 说得对：要做好这件事，您需要相当于编译器前端的东西。他没有充分强调的是对强大的流量分析能力的需求（或者愿意接受可能通过用户注释缓解的非常保守的答案）。也许他的意思是在“语义分析”一词中，尽管他的“转到定义”示例只需要符号表，而不是流分析。

普通的 C# 解析器只能用于获得非常保守的答案（例如，如果类 C 中的方法 A 包含标识符 X，则假设它读取类成员 em> X; 如果 A 不包含任何调用，那么您就知道它无法读取成员 X）。

除此之外的第一步是拥有编译器的符号表和类型信息（如果方法 A 直接引用类成员 X，则假设它读取成员 X；如果 A 不包含调用并提及< em>identifier X 仅在访问不属于此类类型的对象的上下文中，那么您就知道它无法读取成员 X）。您还必须担心合格的参考资料；如果 Q 与 C 兼容，QX 可以读取成员 X。

粘性点是调用，它可以隐藏任意操作。仅基于解析和符号表的分析可以确定，如果有调用，参数仅引用常量或不属于 A 可能表示的类（可能是继承的）的对象。

如果您发现一个具有 C 兼容类类型的参数，现在您必须确定该参数是否可以绑定到 this，需要控制和数据流分析：

   method A( ) {  Object q=this;
                     ...
                     ...q=that;...
                     ...
                     foo(q);
               }

foo 可能隐藏对 X 的访问。所以你需要两件事：流分析来确定对 q 的初始赋值是否可以到达调用 foo （它可能不会；q=that 可能主导对 foo 的所有调用），以及调用图分析来确定 foo 可能实际调用哪些方法，这样您就可以分析这些对成员 X 的访问。

您可以决定要走多远，只要您没有足够的信息来证明其他情况，只需做出保守的假设“A 读取 X”。这会给你一个“安全”的答案（如果不是“正确”或者我更愿意称之为“精确”）。

在可能有用的框架中，您可能会考虑 Mono，它肯定会解析和构建符号表。我不知道它为流程分析或调用图提取提供了哪些支持；我不希望 Mono-to-IL 前端编译器做很多这样的事情，因为人们通常将该机制隐藏在基于 JIT 的系统的 JIT 部分中。缺点是 Mono 可能落后于“现代 C#”曲线；上次我听说它只处理 C# 2.0，但我的信息可能已经过时了。

另一种选择是我们的 DMS 软件重新工程工具包及其 C# 前端。
（不是开源产品）。

DMS 提供通用源代码解析、树构建/检查/分析、通用符号表支持和内置机制，用于实现控制流分析、数据流分析、指向分析（“对象 O 指向什么？”所需）。），并构建调用图。该机器已经通过 DMS 的 Java 和 C 前端的严格测试，并且符号表支持已用于实现完整的 C++ 名称和类型解析，因此非常有效。（您不想低估建造所有这些机器所需的工作量；我们自 1995 年以来一直致力于 DMS）。

C# 前端提供完整的 C# 4.0 解析和完整的树构建。它目前不为 C# 构建符号表（我们正在研究这个），与 Mono 相比这是一个缺点。然而，有了这样的符号表，您就可以访问所有流分析机制（已经使用 DMS 的 Java 和 C 前端进行了测试），如果 Mono 不提供这些功能，那么这可能是 Mono 的一大进步。

如果你想做好这件事，你面前有大量的工作要做。如果您想坚持“简单”，则只需解析树即可，并且可以非常保守。

您没有说太多关于了解方法是否写入给成员的信息。如果您要按照您所描述的方式最小化流量，您需要区分“读”、“写”和“更新”情况，并在两个方向上优化消息。对于各种情况的分析显然非常相似。

最后，您可能会考虑直接处理 MSIL 以获取您需要的信息；您仍然会遇到流量分析和保守分析问题。您可能会发现以下技术论文很有趣；它描述了一个完全分布式的 Java 对象系统，它必须执行与您想做的相同的基本分析，
IIRC 通过分析类文件并进行大量字节代码重写来做到这一点。
Java Orchestra 系统

Eric has it right: to do this well, you need what amounts to a compiler front end. What he didn't emphasize enough is the need for strong flow analysis capabilities (or a willingness to accept very conservative answers possibly alleviated by user annotations). Maybe he meant that in the phrase "semantic analysis" although his example of "goto definition" just needs a symbol table, not flow analysis.

A plain C# parser could only be used to get very conservative answers (e.g., if method A in class C contains identifier X, assume it reads class member X; if A contains no calls then you know it can't read member X).

The first step beyond this is having a compiler's symbol table and type information (if method A refers to class member X directly, then assume it reads member X; if A contains **no* calls and mentions identifier X only in the context of accesses to objects which are not of this class type then you know it can't read member X). You have to worry about qualified references, too; Q.X may read member X if Q is compatible with C.

The sticky point are calls, which can hide arbitrary actions. An analysis based on just parsing and symbol tables could determine that if there are calls, the arguments refer only to constants or to objects which are not of the class which A might represent (possibly inherited).

If you find an argument that has an C-compatible class type, now you have to determine whether that argument can be bound to this, requiring control and data flow analysis:

   method A( ) {  Object q=this;
                     ...
                     ...q=that;...
                     ...
                     foo(q);
               }

foo might hide an access to X. So you need two things: flow analysis to determine whether the initial assignment to q can reach the call foo (it might not; q=that may dominate all calls to foo), and call graph analysis to determine what methods foo might actually invoke, so that you can analyze those for accesses to member X.

You can decide how far you want to go with this simply making the conservative assumption "A reads X" anytime you don't have enough information to prove otherwise. This will you give you a "safe" answer (if not "correct" or what I'd prefer to call "precise").

Of frameworks that might be helpful, you might consider Mono, which surely parses and builds symbol tables. I don't know what support it provides for flow analysis or call graph extraction; I would not expect the Mono-to-IL front-end compiler to do a lot of that, as people usually hide that machinery in the JIT part of JIT-based systems. A downside is that Mono may be behind the "modern C#" curve; last time I heard, it handled only C# 2.0 but my information may be stale.

An alternative is our DMS Software Reengineering Toolkit and its C# Front End.
(Not an open source product).

DMS provides general source code parsing, tree building/inspection/analysis, general symbol table support and built-in machinery for implementing control-flow analysis, data flow analysis, points-to analysis (needed for "What does object O point to?"), and call graph construction. This machinery has all been tested by fire with DMS's Java and C front ends, and the symbol table support has been used to implement full C++ name and type resolution, so its pretty effective. (You don't want to underestimate the work it takes to build all that machinery; we've been working on DMS since 1995).

The C# Front End provides for full C# 4.0 parsing and full tree building. It presently does not build symbol tables for C# (we're working on this) and that's a shortcoming compared to Mono. With such a symbol table, however, you would have access to all that flow analysis machinery (which has been tested with DMS's Java and C front ends) and that might be a big step up from Mono if it doesn't provide that.

If you want to do this well, you have a considerable amount of work in front of you. If you want to stick with "simple", you'll have to do with just parsing the tree and being OK with being very conservative.

You didn't say much about knowing if a method wrote to a member. If you are going to minimize traffic the way you describe, you want to distinguish "read", "write" and "update" cases and optimize messages in both directions. The analysis is obviously pretty similar for the various cases.

Finally, you might consider processing MSIL directly to get the information you need; you'll still have the flow analysis and conservative analysis issues. You might find the following technical paper interesting; it describes a fully-distributed Java object system that has to do the same basic analysis you want to do,
and does so, IIRC, by analyzing class files and doing massive byte code rewriting.
Java Orchestra System

回复收藏 0 原文