将 C 源代码转换为 C++

发布于 2024-07-07 08:59:45 字数 812 浏览 9 评论 0原文

您将如何将相当大(>300K)、相当成熟的 C 代码库转换为 C++?

所考虑的 CI 类型被分成大致对应于模块的文件(即,比典型的基于 OO 类的分解粒度更小),使用内部链接代替私有函数和数据,使用外部链接代替公共函数和数据。 全局变量广泛用于模块之间的通信。 有非常广泛的集成测试套件可用,但没有单元(即模块)级测试。

我心中有一个总体策略:

  1. 编译 C++ 的 C 子集中的所有内容并使其正常工作。
  2. 将模块转换为巨大的类,以便所有交叉引用都由类名限定范围,但将所有函数和数据保留为静态成员,并使其正常工作。
  3. 使用适当的构造函数和初始化的交叉引用将大型类转换为实例; 根据需要用间接访问替换静态成员访问; 并让它发挥作用。
  4. 现在,将项目视为一个不良因素的 OO 应用程序,并在依赖关系易于处理的情况下编写单元测试,在不易处理的情况下将其分解为单独的类; 这里的目标是在每次转变时从一个工作计划转向另一个工作计划。

显然,这将是一项相当大的工作。 有没有关于这种翻译的案例研究/战争故事? 替代策略? 其他有用的建议?

注 1:该程序是一个编译器,可能有数百万其他程序依赖于其行为不改变,因此大规模重写几乎不是一个选择。

注 2:源代码已近 20 年历史,每年可能有 30% 的代码变动(修改的行数 + 添加的行数/之前的总行数)。 换句话说,它得到了大力维护和扩展。 因此,目标之一是提高可维护性。

[就问题而言,假设必须强制转换为 C++,而将其保留为 C 则不是一种选择。 添加此条件的目的是消除“将其保留在 C 中”答案。]

How would you go about converting a reasonably large (>300K), fairly mature C codebase to C++?

The kind of C I have in mind is split into files roughly corresponding to modules (i.e. less granular than a typical OO class-based decomposition), using internal linkage in lieu private functions and data, and external linkage for public functions and data. Global variables are used extensively for communication between the modules. There is a very extensive integration test suite available, but no unit (i.e. module) level tests.

I have in mind a general strategy:

  1. Compile everything in C++'s C subset and get that working.
  2. Convert modules into huge classes, so that all the cross-references are scoped by a class name, but leaving all functions and data as static members, and get that working.
  3. Convert huge classes into instances with appropriate constructors and initialized cross-references; replace static member accesses with indirect accesses as appropriate; and get that working.
  4. Now, approach the project as an ill-factored OO application, and write unit tests where dependencies are tractable, and decompose into separate classes where they are not; the goal here would be to move from one working program to another at each transformation.

Obviously, this would be quite a bit of work. Are there any case studies / war stories out there on this kind of translation? Alternative strategies? Other useful advice?

Note 1: the program is a compiler, and probably millions of other programs rely on its behaviour not changing, so wholesale rewriting is pretty much not an option.

Note 2: the source is nearly 20 years old, and has perhaps 30% code churn (lines modified + added / previous total lines) per year. It is heavily maintained and extended, in other words. Thus, one of the goals would be to increase mantainability.

[For the sake of the question, assume that translation into C++ is mandatory, and that leaving it in C is not an option. The point of adding this condition is to weed out the "leave it in C" answers.]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

阳光下的泡沫是彩色的 2024-07-14 08:59:45

几个月前,我刚刚开始做几乎同样的事情(在一个已有十年历史的商业项目中,最初是用“C++ 只不过是具有智能structs 的 C”哲学编写的),我建议使用与吃大象相同的策略:一次咬一口。 :-)

尽可能将其分成几个阶段,这些阶段可以在对其他部分影响最小的情况下完成。 正如 Federico Ramponi 所建议的那样,构建一个立面系统是一个好的开始——一旦一切都具有 C++ 外观并通过它进行通信,您就可以相当确定地更改模块的内部结构,并且它们不会影响外部的任何内容。

我们已经有了部分 C++ 接口系统(由于之前较小的重构工作),因此这种方法在我们的案例中并不困难。 一旦我们将所有内容都作为 C++ 对象进行通信(这花了几周时间,在一个完全独立的源代码分支上工作,并将所有更改集成到主分支,因为它们被批准),我们很少无法编译完全我们离开之前的工作版本。

转变尚未完成——我们已经暂停了两次临时发布(我们的目标是每隔几周发布一次点发布),但一切进展顺利,并且没有客户抱怨任何问题。 我们的 QA 人员也只发现了一个我记得的问题。 :-)

Having just started on pretty much the same thing a few months ago (on a ten-year-old commercial project, originally written with the "C++ is nothing but C with smart structs" philosophy), I would suggest using the same strategy you'd use to eat an elephant: take it one bite at a time. :-)

As much as possible, split it up into stages that can be done with minimal effects on other parts. Building a facade system, as Federico Ramponi suggested, is a good start -- once everything has a C++ facade and is communicating through it, you can change the internals of the modules with fair certainty that they can't affect anything outside them.

We already had a partial C++ interface system in place (due to previous smaller refactoring efforts), so this approach wasn't difficult in our case. Once we had everything communicating as C++ objects (which took a few weeks, working on a completely separate source-code branch and integrating all changes to the main branch as they were approved), it was very seldom that we couldn't compile a totally working version before we left for the day.

The change-over isn't complete yet -- we've paused twice for interim releases (we aim for a point-release every few weeks), but it's well on the way, and no customer has complained about any problems. Our QA people have only found one problem that I recall, too. :-)

翻身的咸鱼 2024-07-14 08:59:45

怎么样:

  1. 编译 C++ 的 C 子集中的所有内容并使其工作,并
  2. 实现一组 外观 离开C代码不变?

为什么“必须翻译成 C++”? 您可以包装 C 代码,而无需将其转换为巨大的类等。

What about:

  1. Compiling everything in C++'s C subset and get that working, and
  2. Implementing a set of facades leaving the C code unaltered?

Why is "translation into C++ mandatory"? You can wrap the C code without the pain of converting it into huge classes and so on.

私藏温柔 2024-07-14 08:59:45

您的应用程序有很多人在处理它,并且需要不被破坏。
如果您认真考虑大规模转换为 OO 风格,那么什么
您需要大量的转换工具来自动化工作。

基本思想是将数据组指定为类,然后
获取工具来重构代码以将数据移动到类中,
仅将数据上的函数移至这些类中,
并将对该数据的所有访问修改为对类的调用。

您可以进行自动预分析以形成统计集群以获得一些想法,
但您仍然需要一位了解应用程序的工程师来决定什么
数据元素应该分组。

能够完成此任务的工具是我们的 DMS Software Reengineering
工具包

DMS 具有强大的 C 解析器来读取您的代码,捕获 C 代码
作为编译器抽象语法树,(与传统编译器不同)
可以计算整个 300K SLOC 的流量分析。
DMS 有一个 C++ 前端,可以用作“后端”;
编写将 C 语法映射到 C++ 语法的转换。

大型航空电子系统上的一项主要 C++ 重新设计任务给出了
了解使用 DMS 进行此类活动的一些想法。
请参阅技术论文
www.semdesigns.com/Products/DMS/DMSToolkit.html,
具体来说
通过自动程序转换重新设计 C++ 组件模型

此过程不适合胆小的人。 但比任何人
会考虑手动重构大型应用程序
已经是不怕苦了。

是的,我与这家公司有联系,是它的首席架构师。

Your application has lots of folks working on it, and a need to not-be-broken.
If you are serious about large scale conversion to an OO style, what
you need is massive transformation tools to automate the work.

The basic idea is to designate groups of data as classes, and then
get the tool to refactor the code to move that data into classes,
move functions on just that data into those classes,
and revise all accesses to that data to calls on the classes.

You can do an automated preanalysis to form statistic clusters to get some ideas,
but you'll still need an applicaiton aware engineer to decide what
data elements should be grouped.

A tool that is capable of doing this task is our DMS Software Reengineering
Toolkit
.
DMS has strong C parsers for reading your code, captures the C code
as compiler abstract syntax trees, (and unlike a conventional compiler)
can compute flow analyses across your entire 300K SLOC.
DMS has a C++ front end that can be used as the "back" end;
one writes transformations that map C syntax to C++ syntax.

A major C++ reengineering task on a large avionics system gives
some idea of what using DMS for this kind of activity is like.
See technical papers at
www.semdesigns.com/Products/DMS/DMSToolkit.html,
specifically
Re-engineering C++ Component Models Via Automatic Program Transformation

This process is not for the faint of heart. But than anybody
that would consider manual refactoring of a large application
is already not afraid of hard work.

Yes, I'm associated with the company, being its chief architect.

云淡月浅 2024-07-14 08:59:45

我会通过 C 接口编写 C++ 类。 不接触 C 代码将减少混乱的可能性并显着加快进程。

一旦你有了你的 C++ 接口; 那么将代码复制+粘贴到您的类中就是一个简单的任务。 正如您所提到的 - 在此步骤中进行单元测试至关重要。

I would write C++ classes over the C interface. Not touching the C code will decrease the chance of messing up and quicken the process significantly.

Once you have your C++ interface up; then it is a trivial task of copy+pasting the code into your classes. As you mentioned - during this step it is vital to do unit testing.

半边脸i 2024-07-14 08:59:45

GCC 目前正处于从 C 到 C++ 的过渡过程中。显然,他们首先将所有内容移至 C 和 C++ 的公共子集中。 当他们这样做时,他们向 GCC 添加了在 -Wc++-compat 下找到的所有内容的警告。 这应该会让您踏上旅程的第一部分。

对于后面的部分,一旦您实际上使用 C++ 编译器编译了所有内容,我将专注于替换具有惯用 C++ 对应项的内容。 例如,如果您正在使用使用 C 宏定义的列表、映射、集合、位向量、哈希表等,那么通过将它们移至 C++,您可能会获益匪浅。 同样,对于 OO,您可能会发现已经使用 C OO 习惯用法(如结构继承)的好处,而 C++ 将为您的代码提供更高的清晰度和更好的类型检查。

GCC is currently in midtransition to C++ from C. They started by moving everything into the common subset of C and C++, obviously. As they did so, they added warnings to GCC for everything they found, found under -Wc++-compat. That should get you on the first part of your journey.

For the latter parts, once you actually have everything compiling with a C++ compiler, I would focus on replacing things that have idiomatic C++ counterparts. For example, if you're using lists, maps, sets, bitvectors, hashtables, etc, which are defined using C macros, you will likely gain a lot by moving these to C++. Likewise with OO, you'll likely find benefits where you are already using a C OO idiom (like struct inheritence), and where C++ will afford greater clarity and better type checking on your code.

命硬 2024-07-14 08:59:45

您的列表看起来不错,但我建议首先检查测试套件并在进行任何编码之前尝试使其尽可能紧凑。

Your list looks okay except I would suggest reviewing the test suite first and trying to get that as tight as possible before doing any coding.

羁〃客ぐ 2024-07-14 08:59:45

让我们提出另一个愚蠢的想法:

  1. 编译 C++ 的 C 子集中的所有内容并使其正常工作。
  2. 从一个模块开始,将其转换为一个巨大的类,然后转换为一个实例,并从该实例构建一个 C 接口(与您开始的接口相同)。 让剩余的 C 代码与该 C 接口一起工作。
  3. 根据需要进行重构,将 OO 子系统从 C 代码中一次扩展为一个模块,并在部分 C 接口变得无用时将其删除。

Let's throw another stupid idea:

  1. Compile everything in C++'s C subset and get that working.
  2. Start with a module, convert it in a huge class, then in an instance, and build a C interface (identical to the one you started from) out of that instance. Let the remaining C code work with that C interface.
  3. Refactor as needed, growing the OO subsystem out of C code one module at a time, and drop parts of the C interface when they become useless.
∞琼窗梦回ˉ 2024-07-14 08:59:45

除了如何开始之外,可能还需要考虑两件事:您想要关注的内容,以及您想要停止的地方。

您指出存在大量代码流失,这可能是您集中精力的关键。 我建议你选择代码中需要大量维护的部分,成熟/稳定的部分显然工作得足够好,所以最好保留它们原样,除了一些带有外观的门面等

。您想要停止取决于想要转换为 C++ 的原因是什么。 这本身很难成为一个目标。 如果是由于某些第三方依赖性造成的,请将精力集中在该组件的接口上。

我工作的软件是一个巨大的、旧的代码库,几年前已经从 C“转换”为 C++。 我认为这是因为 GUI 已转换为 Qt。 即使现在,它看起来仍然像一个带有类的 C 程序。 打破公共数据成员造成的依赖关系,并将具有过程怪物方法的巨大类重构为较小的方法和类从未真正成功,我认为原因如下:

  1. 不需要更改正在运行的代码和不运行的代码需要加强。 这样做会引入新的错误而不添加功能,而最终用户不会意识到这一点;
  2. 可靠地进行重构是非常非常困难的。 许多代码是如此庞大且如此重要,以至于人们几乎不敢碰它。 我们有一套相当广泛的功能测试,但很难获得足够的代码覆盖率信息。 因此,很难确定是否已经有足够的测试来检测重构期间的问题;
  3. 投资回报率很难确定。 最终用户不会从重构中受益,因此必须降低维护成本,维护成本最初会增加,因为通过重构,您会在成熟的(即相当无错误的代码)中引入新的错误。 而且重构本身的成本也很高......

注意。 我想您知道“有效地使用遗留代码”这本书吗?

Probably two things to consider besides how you want to start are on what you want to focus, and where you want to stop.

You state that there is a large code churn, this may be a key to focus your efforts. I suggest you pick the parts of your code where a lot of maintenance is needed, the mature/stable parts are apparently working well enough, so it is better to leave them as they are, except probably for some window dressing with facades etc.

Where you want to stop depends on what the reason is for wanting to convert to C++. This can hardly be a goal in itself. If it is due to some 3rd party dependency, focus your efforts on the interface to that component.

The software I work on is a huge, old code base which has been 'converted' from C to C++ years ago now. I think it was because the GUI was converted to Qt. Even now it still mostly looks like a C program with classes. Breaking the dependencies caused by public data members, and refactoring the huge classes with procedural monster methods into smaller methods and classes never has really taken off, I think for the following reasons:

  1. There is no need to change code that is working and that does not need to be enhanced. Doing so introduces new bugs without adding functionality, and end users don't appreciate that;
  2. It is very, very hard to do refactor reliably. Many pieces of code are so large and also so vital that people hardly dare touching it. We have a fairly extensive suite of functional tests, but sufficient code coverage information is hard to get. As a result, it is difficult to establish whether there are already sufficient tests in place to detect problems during refactoring;
  3. The ROI is difficult to establish. The end user will not benefit from refactoring, so it must be in reduced maintenance cost, which will increase initially because by refactoring you introduce new bugs in mature, i.e. fairly bug-free code. And the refactoring itself will be costly as well ...

NB. I suppose you know the "Working effectively with Legacy code" book?

旧梦荧光笔 2024-07-14 08:59:45

您提到您的工具是一个编译器,并且:“实际上,模式匹配,而不仅仅是类型匹配,在多重分派中会更好”。

您可能想看看 maketea。 它提供 AST 的模式匹配,以及抽象语法、访问者、变形器等的 AST 定义。

You mention that your tool is a compiler, and that: "Actually, pattern matching, not just type matching, in the multiple dispatch would be even better".

You might want to take a look at maketea. It provides pattern matching for ASTs, as well as the AST definition from an abstract grammar, and visitors, tranformers, etc.

°如果伤别离去 2024-07-14 08:59:45

如果您有一个小型或学术项目(例如,少于 10,000 行),重写可能是您的最佳选择。 您可以随心所欲地考虑它,并且不会花费太多时间。

如果您有一个真实的应用程序,我建议将其编译为 C++(这通常意味着主要修复函数原型等),然后进行重构和 OO 包装。 当然,我不同意代码需要采用面向对象结构才能成为可接受的 C++ 代码的理念。 我会根据需要进行逐个转换、重写和重构(为了功能或合并单元测试)。

If you have a small or academic project (say, less than 10,000 lines), a rewrite is probably your best option. You can factor it however you want, and it won't take too much time.

If you have a real-world application, I'd suggest getting it to compile as C++ (which usually means primarily fixing up function prototypes and the like), then work on refactoring and OO wrapping. Of course, I don't subscribe to the philosophy that code needs to be OO structured in order to be acceptable C++ code. I'd do a piece-by-piece conversion, rewriting and refactoring as you need to (for functionality or for incorporating unit testing).

段念尘 2024-07-14 08:59:45

这就是我要做的:

  • 由于代码已有 20 年历史,废弃解析器/语法分析器,并将其替换为较新的 lex/yacc/bison(或任何类似的)等基于 C++ 代码之一,更易于维护且更易于维护理解。 如果您手边有 BNF,那么开发速度也会更快。
  • 一旦对旧代码进行了改造,就开始将模块包装到类中。 用接口替换全局/共享变量。
  • 现在你拥有的将是一个 C++ 编译器(但不完全是)。
  • 绘制系统中所有类的类图,并查看它们如何通信。
  • 使用相同的类绘制另一个,看看它们应该如何通信。
  • 重构代码以将第一个图转换为第二个图。 (这可能会很混乱和棘手)
  • 请记住对所有添加的新代码使用 C++ 代码。
  • 如果还有时间的话,尝试一一替换数据结构,使用更标准化的STL或者Boost。

Here's what I would do:

  • Since the code is 20 years old, scrap down the parser/syntax analyzer and replace it with one of the newer lex/yacc/bison(or anything similar) etc based C++ code, much more maintainable and easier to understand. Faster to develop too if you have a BNF handy.
  • Once this is retrofitted to the old code, start wrapping modules into classes. Replace global/shared variables with interfaces.
  • Now what you have will be a compiler in C++ (not quite though).
  • Draw a class diagram of all the classes in your system, and see how they are communicating.
  • Draw another one using the same classes and see how they ought to communicate.
  • Refactor the code to transform the first diagram to the second. (this might be messy and tricky)
  • Remember to use C++ code for all new code added.
  • If you have some time left, try replacing data structures one by one to use the more standardized STL or Boost.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文