解析 C# 代码（作为字符串）并插入其他方法

发布于 2024-10-17 18:49:59 字数 1312 浏览 2 评论 0原文

我正在开发一个 C# 应用程序，它远程加载它的代码，然后运行它（为了论证，您可以假设该应用程序是安全的）。

该代码是 C# 的，但它作为 XML 文档发送，解析为字符串，然后编译并执行。

现在，我想做的 - 并且比我预期的困难一点 - 能够解析整个文档，并在编译之前，在每行执行后插入额外的命令。

例如，考虑一下代码：

using System;
using System.Collections.Generic;
using System.Linq;

namespace MyCode
{
    static class MyProg
    {
        static void Run()
        {
            int i = 0;
            i++;

            Log(i);
        }
    }
}

我想要的，在解析之后更像是：

using System;
using System.Collections.Generic;
using System.Linq;

namespace MyCode
{
    static class MyProg
    {
        static void Run()
        {
            int i = 0;
            MyAdditionalMethod();
            i++;
            MyAdditionalMethod();

            Log(i);
            MyAdditionalMethod();
        }
    }
}

记住明显的陷阱 - 我不能在每个分号后面都有它，因为这在 getter/setter 中不起作用，即：

转换：

public string MyString { get; set; }

到：

public string MyString { get; MyAdditionalMethod(); set; MyAdditionalMethod(); }

会失败。类级声明、using 语句等也是如此。此外，在很多情况下，我还可以在大括号之后添加 MyAdditionalMethod() - 就像在委托中、紧接在 if 语句或方法声明之后等。

所以，我一直在研究 CodeDOM，这看起来可能是一个解决方案，但很难弄清楚从哪里开始。否则，我会尝试解析整个事情并创建一个可以解析的树 - 尽管考虑到我需要考虑的情况数量，这有点困难。

有谁知道还有其他解决方案吗？

原文

I have a C# app I'm working on that loads it's code remotely, and then runs it (for the sake of argument, you can assume the app is secure).

The code is C#, but it is sent as an XML document, parse out as a string, and then compiled and executed.

Now, what I'd like to do - and am having a bit more difficulty than I expected - is be able to parse the entire document, and before compiling, insert additional commands after every line execution.

For example, consider the code:

using System;
using System.Collections.Generic;
using System.Linq;

namespace MyCode
{
    static class MyProg
    {
        static void Run()
        {
            int i = 0;
            i++;

            Log(i);
        }
    }
}

What I'd like, after parsing is something more like:

using System;
using System.Collections.Generic;
using System.Linq;

namespace MyCode
{
    static class MyProg
    {
        static void Run()
        {
            int i = 0;
            MyAdditionalMethod();
            i++;
            MyAdditionalMethod();

            Log(i);
            MyAdditionalMethod();
        }
    }
}

Keep in mind the obvious pitfalls - I can't just have it after every semi-colon, because this would not work in a getter/setter, i.e.:

Converting:

public string MyString { get; set; }

To:

public string MyString { get; MyAdditionalMethod(); set; MyAdditionalMethod(); }

would fail. As would class-level declarations, using statements, etc. Also, there are a number of cases where I could also add in MyAdditionalMethod() after curly braces - like in delegates, immediately after if statements, or method declarations, etc.

So, what I've been looking into CodeDOM, and this looks like it could be a solution but it's tough to figure out where to start. I'm otherwise trying to parse the entire thing and create a tree which I can parse through - though that's a little tough, considering the number of cases I need to consider.

Does anyone know any other solutions that are out there?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ι不睡觉的鱼゛ 2024-10-24 18:49:59

我建议使用 Mono 或 SharpDevelop 中的一些 C# 解析器，因为它们应该是最新的。我尝试过使用 SharpDevelop 的 NRefactory，如果你下载 SharpDevelop 的源代码，其中有一个演示和一些单元测试，可以很好地介绍其用法。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ICSharpCode.NRefactory;
using System.IO;
using ICSharpCode.NRefactory.Ast;
using ICSharpCode.NRefactory.Visitors;
using ICSharpCode.NRefactory.PrettyPrinter;

namespace Parse
{
    class Program
    {
        static void Main(string[] args)
        {
            string code = @"using System;
            using System.Collections.Generic;
            using System.Linq;

            namespace MyCode
            {
                static class MyProg
                {
                    static void Run()
                    {
                        int i = 0;
                        i++;

                        Log(i);
                    }
                }
            }
            ";

            IParser p = ParserFactory.CreateParser(SupportedLanguage.CSharp, new StringReader(code));
            p.Parse();

            //Output Original
            CSharpOutputVisitor output = new CSharpOutputVisitor();
            output.VisitCompilationUnit(p.CompilationUnit, null);
            Console.Write(output.Text);

            //Add custom method calls
            AddMethodVisitor v = new AddMethodVisitor();
            v.VisitCompilationUnit(p.CompilationUnit, null);
            v.AddMethodCalls();
            output = new CSharpOutputVisitor();
            output.VisitCompilationUnit(p.CompilationUnit, null);

            //Output result
            Console.Write(output.Text);
            Console.ReadLine();
        }


    }

    //The vistor adds method calls after visiting by storing the nodes in a dictionary. 
    public class AddMethodVisitor : ConvertVisitorBase
    {
        private IdentifierExpression member = new IdentifierExpression("MyAdditionalMethod");

        private Dictionary<INode, INode> expressions = new Dictionary<INode, INode>();

        private void AddNode(INode original)
        {
            expressions.Add(original, new ExpressionStatement(new InvocationExpression(member)));
        }

        public override object VisitExpressionStatement(ExpressionStatement expressionStatement, object data)
        {
            AddNode(expressionStatement);
            return base.VisitExpressionStatement(expressionStatement, data);
        }

        public override object VisitLocalVariableDeclaration(LocalVariableDeclaration localVariableDeclaration, object data)
        {
            AddNode(localVariableDeclaration);
            return base.VisitLocalVariableDeclaration(localVariableDeclaration, data);
        }

        public void AddMethodCalls()
        {
            foreach (var e in expressions)
            {
                InsertAfterSibling(e.Key, e.Value);
            }
        }

    }
}

您将需要改进访问者以处理更多案例，但这是一个好的开始。

或者，您可以编译原始版本并使用 Cecil 进行一些 IL 操作，或者尝试一些 AOP 库，例如 PostSharp。最后，您可以查看.NET Profiling API。

There are a few C# parsers out there I'd recommend using something from Mono or SharpDevelop as they should be up to date. I had a go using NRefactory from SharpDevelop, if you download the source for SharpDevelop there is a demo and some UnitTests that are a good intro to its usage.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ICSharpCode.NRefactory;
using System.IO;
using ICSharpCode.NRefactory.Ast;
using ICSharpCode.NRefactory.Visitors;
using ICSharpCode.NRefactory.PrettyPrinter;

namespace Parse
{
    class Program
    {
        static void Main(string[] args)
        {
            string code = @"using System;
            using System.Collections.Generic;
            using System.Linq;

            namespace MyCode
            {
                static class MyProg
                {
                    static void Run()
                    {
                        int i = 0;
                        i++;

                        Log(i);
                    }
                }
            }
            ";

            IParser p = ParserFactory.CreateParser(SupportedLanguage.CSharp, new StringReader(code));
            p.Parse();

            //Output Original
            CSharpOutputVisitor output = new CSharpOutputVisitor();
            output.VisitCompilationUnit(p.CompilationUnit, null);
            Console.Write(output.Text);

            //Add custom method calls
            AddMethodVisitor v = new AddMethodVisitor();
            v.VisitCompilationUnit(p.CompilationUnit, null);
            v.AddMethodCalls();
            output = new CSharpOutputVisitor();
            output.VisitCompilationUnit(p.CompilationUnit, null);

            //Output result
            Console.Write(output.Text);
            Console.ReadLine();
        }


    }

    //The vistor adds method calls after visiting by storing the nodes in a dictionary. 
    public class AddMethodVisitor : ConvertVisitorBase
    {
        private IdentifierExpression member = new IdentifierExpression("MyAdditionalMethod");

        private Dictionary<INode, INode> expressions = new Dictionary<INode, INode>();

        private void AddNode(INode original)
        {
            expressions.Add(original, new ExpressionStatement(new InvocationExpression(member)));
        }

        public override object VisitExpressionStatement(ExpressionStatement expressionStatement, object data)
        {
            AddNode(expressionStatement);
            return base.VisitExpressionStatement(expressionStatement, data);
        }

        public override object VisitLocalVariableDeclaration(LocalVariableDeclaration localVariableDeclaration, object data)
        {
            AddNode(localVariableDeclaration);
            return base.VisitLocalVariableDeclaration(localVariableDeclaration, data);
        }

        public void AddMethodCalls()
        {
            foreach (var e in expressions)
            {
                InsertAfterSibling(e.Key, e.Value);
            }
        }

    }
}

You will need to improve the visitor to handle more cases but it's a good start.

Alternatively you could compile the original and do some IL manipulation using Cecil or try some AOP library like PostSharp. Finally you could look into the .NET Profiling API.

回复收藏 0 原文

鹿港巷口少年归 2024-10-24 18:49:59

您可以使用源到源程序转换系统。这样的工具会解析代码、构建和 AST，让您应用转换，然后从 AST 重新生成文本。源到源系统的优点在于，您可以根据源语言语法而不是 AST 的分形细节来编写转换，这使得它们以后更容易编写和理解。

你想要做的事情将通过一个非常简单的程序转换来建模
使用我们的 DMS 软件重新工程工具包：

rule insert_post_statement_call(s: stmt): stmt -> stmt =
   " \s " -> " { \s ; MyAdditionalMethod();   }";

此规则不是“文本”替换;相反，它是由处理目标代码的解析器解析的，因此实际上它代表两个 AST，左侧和右侧（用“->”语法分隔。引号不是字符串引号；而是由“->”语法分隔）。它们是目标语言语法周围的引号，以将其与规则语言本身的语法区分开来。引号内是带有 \s 等转义符的目标语言（例如，C#）文本，它代表整个语言元素（在本例中，根据目标语言（例如 C#）语法的 stmt 左侧表示“匹配任何语句 s”，因为 s 在语法中被定义为“stmt”。右侧表示“将语句替换为”。包含原始语句 \s 和要插入的新代码的块”。这都是使用语法作为指导，根据语法树完成的；它不能将转换应用于不是语句的任何内容。 [将语句重写为块的原因是因为这样右侧在语句有效的情况下有效，请检查您的语法。]

实际上，您需要编写规则处理其他特殊情况，但这主要是编写更多规则。您还需要将解析器/转换器/prettyprinter 打包为捆绑包，这需要一些程序粘合。这仍然比尝试编写代码来可靠地在树上爬上爬下、匹配节点然后粉碎这些节点以获得你想要的东西要容易得多。更好的是，当你的语法（总是）必须调整时，重写规则会根据修改后的语法重新解析并且仍然有效；无论你在做什么爬树程序，几乎都肯定会被破坏。

随着您编写越来越多的转换，此功能变得越来越有价值。当您通过少量转换取得成功时，添加更多转换很快就会变得有吸引力。

请参阅此技术论文，更深入地讨论 DMS 的工作原理及其工作原理用于在实际工具中应用仪器转换，就像您想要做的那样。本文描述了 Semantic Designs 销售的测试覆盖率工具背后的基本思想。

You could use a source-to-source program transformation system. Such a tool parses the code, builds and ASTs, lets you apply transformations, and then regenerates text from the AST. What makes a source-to-source system nice, it that you can write transformations in terms of the source language syntax rather than the fractal detail of the AST, which makes them far easier to write and understand later.

What you want to do would be modelled by a pretty simple program transformation
using our DMS Software Reengineering Toolkit:

rule insert_post_statement_call(s: stmt): stmt -> stmt =
   " \s " -> " { \s ; MyAdditionalMethod();   }";

This rule isn't a "text" substitution; rather, it is parsed by the parser that processes the target code, and so in fact it represents two ASTs, a left- and right- hand side (separated by the "->" syntax. The quotes aren't string quotes; they are quotes around the target language syntax to differentiate it from the syntax of the rule language itself. What is inside the quotes is target language (e.g., C#) text with escapes like \s, which represent entire language elements (in this case, a stmt according the the target language (e.g. C#) grammar. The left hand side says, "match any statement s" because s is defined to be a "stmt" in the grammar. The right hand side says, "replace the statement with a block containing the original statement \s, and the new code you want inserted". This is all done in terms of syntax trees using the grammar as a guide; it can't apply the transform to anything that isn't a statement. [The reason for rewriting the statement as a block, is because that way the right side is valid where statements are valid, go check your grammar.]

As a practical matter, you'll need to write rules to handle other special cases but this is mostly writing more rules. You also need to package the parser/transformer/prettyprinter as bundle which requires some procedural glue. This is still far easier than trying to write code to reliably climb up and down the tree, matching the nodes and then smashing those nodes to get what you want. Better, when your grammar (invariably) has to be adjusted, the rewrite rules are reparsed according to the revised grammar and still work; whatever procedural tree climbing you might be doing is almost certainly gauranteed to break.

As you write more and more transformations, this capability becomes more and more valuable. And when you are successful with a small number of transformations, adding more becomes attractive quickly.

See this technical paper for a more thorough discussion of how DMS works, and how it is used to apply instrumentation transformations, like you want to do, in real tools. This paper describes the basic ideas behind the test coverage tools sold by Semantic Designs.

回复收藏 0 原文