将 C 源代码转换为 C++
您将如何将相当大(>300K)、相当成熟的 C 代码库转换为 C++?
所考虑的 CI 类型被分成大致对应于模块的文件(即,比典型的基于 OO 类的分解粒度更小),使用内部链接代替私有函数和数据,使用外部链接代替公共函数和数据。 全局变量广泛用于模块之间的通信。 有非常广泛的集成测试套件可用,但没有单元(即模块)级测试。
我心中有一个总体策略:
- 编译 C++ 的 C 子集中的所有内容并使其正常工作。
- 将模块转换为巨大的类,以便所有交叉引用都由类名限定范围,但将所有函数和数据保留为静态成员,并使其正常工作。
- 使用适当的构造函数和初始化的交叉引用将大型类转换为实例; 根据需要用间接访问替换静态成员访问; 并让它发挥作用。
- 现在,将项目视为一个不良因素的 OO 应用程序,并在依赖关系易于处理的情况下编写单元测试,在不易处理的情况下将其分解为单独的类; 这里的目标是在每次转变时从一个工作计划转向另一个工作计划。
显然,这将是一项相当大的工作。 有没有关于这种翻译的案例研究/战争故事? 替代策略? 其他有用的建议?
注 1:该程序是一个编译器,可能有数百万其他程序依赖于其行为不改变,因此大规模重写几乎不是一个选择。
注 2:源代码已近 20 年历史,每年可能有 30% 的代码变动(修改的行数 + 添加的行数/之前的总行数)。 换句话说,它得到了大力维护和扩展。 因此,目标之一是提高可维护性。
[就问题而言,假设必须强制转换为 C++,而将其保留为 C 则不是一种选择。 添加此条件的目的是消除“将其保留在 C 中”答案。]
How would you go about converting a reasonably large (>300K), fairly mature C codebase to C++?
The kind of C I have in mind is split into files roughly corresponding to modules (i.e. less granular than a typical OO class-based decomposition), using internal linkage in lieu private functions and data, and external linkage for public functions and data. Global variables are used extensively for communication between the modules. There is a very extensive integration test suite available, but no unit (i.e. module) level tests.
I have in mind a general strategy:
- Compile everything in C++'s C subset and get that working.
- Convert modules into huge classes, so that all the cross-references are scoped by a class name, but leaving all functions and data as static members, and get that working.
- Convert huge classes into instances with appropriate constructors and initialized cross-references; replace static member accesses with indirect accesses as appropriate; and get that working.
- Now, approach the project as an ill-factored OO application, and write unit tests where dependencies are tractable, and decompose into separate classes where they are not; the goal here would be to move from one working program to another at each transformation.
Obviously, this would be quite a bit of work. Are there any case studies / war stories out there on this kind of translation? Alternative strategies? Other useful advice?
Note 1: the program is a compiler, and probably millions of other programs rely on its behaviour not changing, so wholesale rewriting is pretty much not an option.
Note 2: the source is nearly 20 years old, and has perhaps 30% code churn (lines modified + added / previous total lines) per year. It is heavily maintained and extended, in other words. Thus, one of the goals would be to increase mantainability.
[For the sake of the question, assume that translation into C++ is mandatory, and that leaving it in C is not an option. The point of adding this condition is to weed out the "leave it in C" answers.]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
几个月前,我刚刚开始做几乎同样的事情(在一个已有十年历史的商业项目中,最初是用“C++ 只不过是具有智能
struct
s 的 C”哲学编写的),我建议使用与吃大象相同的策略:一次咬一口。 :-)尽可能将其分成几个阶段,这些阶段可以在对其他部分影响最小的情况下完成。 正如 Federico Ramponi 所建议的那样,构建一个立面系统是一个好的开始——一旦一切都具有 C++ 外观并通过它进行通信,您就可以相当确定地更改模块的内部结构,并且它们不会影响外部的任何内容。
我们已经有了部分 C++ 接口系统(由于之前较小的重构工作),因此这种方法在我们的案例中并不困难。 一旦我们将所有内容都作为 C++ 对象进行通信(这花了几周时间,在一个完全独立的源代码分支上工作,并将所有更改集成到主分支,因为它们被批准),我们很少无法编译完全我们离开之前的工作版本。
转变尚未完成——我们已经暂停了两次临时发布(我们的目标是每隔几周发布一次点发布),但一切进展顺利,并且没有客户抱怨任何问题。 我们的 QA 人员也只发现了一个我记得的问题。 :-)
Having just started on pretty much the same thing a few months ago (on a ten-year-old commercial project, originally written with the "C++ is nothing but C with smart
struct
s" philosophy), I would suggest using the same strategy you'd use to eat an elephant: take it one bite at a time. :-)As much as possible, split it up into stages that can be done with minimal effects on other parts. Building a facade system, as Federico Ramponi suggested, is a good start -- once everything has a C++ facade and is communicating through it, you can change the internals of the modules with fair certainty that they can't affect anything outside them.
We already had a partial C++ interface system in place (due to previous smaller refactoring efforts), so this approach wasn't difficult in our case. Once we had everything communicating as C++ objects (which took a few weeks, working on a completely separate source-code branch and integrating all changes to the main branch as they were approved), it was very seldom that we couldn't compile a totally working version before we left for the day.
The change-over isn't complete yet -- we've paused twice for interim releases (we aim for a point-release every few weeks), but it's well on the way, and no customer has complained about any problems. Our QA people have only found one problem that I recall, too. :-)
怎么样:
为什么“必须翻译成 C++”? 您可以包装 C 代码,而无需将其转换为巨大的类等。
What about:
Why is "translation into C++ mandatory"? You can wrap the C code without the pain of converting it into huge classes and so on.
您的应用程序有很多人在处理它,并且需要不被破坏。
如果您认真考虑大规模转换为 OO 风格,那么什么
您需要大量的转换工具来自动化工作。
基本思想是将数据组指定为类,然后
获取工具来重构代码以将数据移动到类中,
仅将数据上的函数移至这些类中,
并将对该数据的所有访问修改为对类的调用。
您可以进行自动预分析以形成统计集群以获得一些想法,
但您仍然需要一位了解应用程序的工程师来决定什么
数据元素应该分组。
能够完成此任务的工具是我们的 DMS Software Reengineering
工具包。
DMS 具有强大的 C 解析器来读取您的代码,捕获 C 代码
作为编译器抽象语法树,(与传统编译器不同)
可以计算整个 300K SLOC 的流量分析。
DMS 有一个 C++ 前端,可以用作“后端”;
编写将 C 语法映射到 C++ 语法的转换。
大型航空电子系统上的一项主要 C++ 重新设计任务给出了
了解使用 DMS 进行此类活动的一些想法。
请参阅技术论文
www.semdesigns.com/Products/DMS/DMSToolkit.html,
具体来说
通过自动程序转换重新设计 C++ 组件模型
此过程不适合胆小的人。 但比任何人
会考虑手动重构大型应用程序
已经是不怕苦了。
是的,我与这家公司有联系,是它的首席架构师。
Your application has lots of folks working on it, and a need to not-be-broken.
If you are serious about large scale conversion to an OO style, what
you need is massive transformation tools to automate the work.
The basic idea is to designate groups of data as classes, and then
get the tool to refactor the code to move that data into classes,
move functions on just that data into those classes,
and revise all accesses to that data to calls on the classes.
You can do an automated preanalysis to form statistic clusters to get some ideas,
but you'll still need an applicaiton aware engineer to decide what
data elements should be grouped.
A tool that is capable of doing this task is our DMS Software Reengineering
Toolkit.
DMS has strong C parsers for reading your code, captures the C code
as compiler abstract syntax trees, (and unlike a conventional compiler)
can compute flow analyses across your entire 300K SLOC.
DMS has a C++ front end that can be used as the "back" end;
one writes transformations that map C syntax to C++ syntax.
A major C++ reengineering task on a large avionics system gives
some idea of what using DMS for this kind of activity is like.
See technical papers at
www.semdesigns.com/Products/DMS/DMSToolkit.html,
specifically
Re-engineering C++ Component Models Via Automatic Program Transformation
This process is not for the faint of heart. But than anybody
that would consider manual refactoring of a large application
is already not afraid of hard work.
Yes, I'm associated with the company, being its chief architect.
我会通过 C 接口编写 C++ 类。 不接触 C 代码将减少混乱的可能性并显着加快进程。
一旦你有了你的 C++ 接口; 那么将代码复制+粘贴到您的类中就是一个简单的任务。 正如您所提到的 - 在此步骤中进行单元测试至关重要。
I would write C++ classes over the C interface. Not touching the C code will decrease the chance of messing up and quicken the process significantly.
Once you have your C++ interface up; then it is a trivial task of copy+pasting the code into your classes. As you mentioned - during this step it is vital to do unit testing.
GCC 目前正处于从 C 到 C++ 的过渡过程中。显然,他们首先将所有内容移至 C 和 C++ 的公共子集中。 当他们这样做时,他们向 GCC 添加了在
-Wc++-compat
下找到的所有内容的警告。 这应该会让您踏上旅程的第一部分。对于后面的部分,一旦您实际上使用 C++ 编译器编译了所有内容,我将专注于替换具有惯用 C++ 对应项的内容。 例如,如果您正在使用使用 C 宏定义的列表、映射、集合、位向量、哈希表等,那么通过将它们移至 C++,您可能会获益匪浅。 同样,对于 OO,您可能会发现已经使用 C OO 习惯用法(如结构继承)的好处,而 C++ 将为您的代码提供更高的清晰度和更好的类型检查。
GCC is currently in midtransition to C++ from C. They started by moving everything into the common subset of C and C++, obviously. As they did so, they added warnings to GCC for everything they found, found under
-Wc++-compat
. That should get you on the first part of your journey.For the latter parts, once you actually have everything compiling with a C++ compiler, I would focus on replacing things that have idiomatic C++ counterparts. For example, if you're using lists, maps, sets, bitvectors, hashtables, etc, which are defined using C macros, you will likely gain a lot by moving these to C++. Likewise with OO, you'll likely find benefits where you are already using a C OO idiom (like struct inheritence), and where C++ will afford greater clarity and better type checking on your code.
您的列表看起来不错,但我建议首先检查测试套件并在进行任何编码之前尝试使其尽可能紧凑。
Your list looks okay except I would suggest reviewing the test suite first and trying to get that as tight as possible before doing any coding.
让我们提出另一个愚蠢的想法:
Let's throw another stupid idea:
除了如何开始之外,可能还需要考虑两件事:您想要关注的内容,以及您想要停止的地方。
您指出存在大量代码流失,这可能是您集中精力的关键。 我建议你选择代码中需要大量维护的部分,成熟/稳定的部分显然工作得足够好,所以最好保留它们原样,除了一些带有外观的门面等
。您想要停止取决于想要转换为 C++ 的原因是什么。 这本身很难成为一个目标。 如果是由于某些第三方依赖性造成的,请将精力集中在该组件的接口上。
我工作的软件是一个巨大的、旧的代码库,几年前已经从 C“转换”为 C++。 我认为这是因为 GUI 已转换为 Qt。 即使现在,它看起来仍然像一个带有类的 C 程序。 打破公共数据成员造成的依赖关系,并将具有过程怪物方法的巨大类重构为较小的方法和类从未真正成功,我认为原因如下:
注意。 我想您知道“有效地使用遗留代码”这本书吗?
Probably two things to consider besides how you want to start are on what you want to focus, and where you want to stop.
You state that there is a large code churn, this may be a key to focus your efforts. I suggest you pick the parts of your code where a lot of maintenance is needed, the mature/stable parts are apparently working well enough, so it is better to leave them as they are, except probably for some window dressing with facades etc.
Where you want to stop depends on what the reason is for wanting to convert to C++. This can hardly be a goal in itself. If it is due to some 3rd party dependency, focus your efforts on the interface to that component.
The software I work on is a huge, old code base which has been 'converted' from C to C++ years ago now. I think it was because the GUI was converted to Qt. Even now it still mostly looks like a C program with classes. Breaking the dependencies caused by public data members, and refactoring the huge classes with procedural monster methods into smaller methods and classes never has really taken off, I think for the following reasons:
NB. I suppose you know the "Working effectively with Legacy code" book?
您提到您的工具是一个编译器,并且:“实际上,模式匹配,而不仅仅是类型匹配,在多重分派中会更好”。
您可能想看看 maketea。 它提供 AST 的模式匹配,以及抽象语法、访问者、变形器等的 AST 定义。
You mention that your tool is a compiler, and that: "Actually, pattern matching, not just type matching, in the multiple dispatch would be even better".
You might want to take a look at maketea. It provides pattern matching for ASTs, as well as the AST definition from an abstract grammar, and visitors, tranformers, etc.
如果您有一个小型或学术项目(例如,少于 10,000 行),重写可能是您的最佳选择。 您可以随心所欲地考虑它,并且不会花费太多时间。
如果您有一个真实的应用程序,我建议将其编译为 C++(这通常意味着主要修复函数原型等),然后进行重构和 OO 包装。 当然,我不同意代码需要采用面向对象结构才能成为可接受的 C++ 代码的理念。 我会根据需要进行逐个转换、重写和重构(为了功能或合并单元测试)。
If you have a small or academic project (say, less than 10,000 lines), a rewrite is probably your best option. You can factor it however you want, and it won't take too much time.
If you have a real-world application, I'd suggest getting it to compile as C++ (which usually means primarily fixing up function prototypes and the like), then work on refactoring and OO wrapping. Of course, I don't subscribe to the philosophy that code needs to be OO structured in order to be acceptable C++ code. I'd do a piece-by-piece conversion, rewriting and refactoring as you need to (for functionality or for incorporating unit testing).
这就是我要做的:
Here's what I would do: