熟悉大型代码库的最佳方法是什么?

发布于 2024-07-07 03:56:07 字数 919 浏览 19 评论 0原文

加入一个拥有庞大代码库的现有团队可能会令人望而生畏。 最好的方法是什么?

  • 广阔; 代码中总体了解所有内容如何链接在一起
  • 尝试从Narrow ; 一次专注于一小部分代码,了解它们是如何充分工作的
  • 选择一个功能来开发并边学习边
  • 尝试从类图和 uml 中获得洞察力(如果可用的话)(并且是最新的)
  • 完全是其他东西吗?

我正在开发目前大约 20k 行的 C++ 应用程序和应用程序。 图书馆(编辑:在宏伟的计划中很小!)。 在工业界,我想你会得到一位经验丰富的程序员的介绍。 但是,如果情况并非如此,您可以采取什么措施来尽快开始增加价值?

--
答案摘要:

  • 在调试模式下单步调试代码,看看它是如何工作的
  • 与比你更熟悉代码库的人配对,轮流担任编码人员和观看/讨论人员。 在团队成员之间轮换合作伙伴,以便传播知识。
  • 编写单元测试。 首先断言您认为代码将如何工作。 如果结果如您所料,您可能已经理解了代码。 如果没有,您就有一个难题需要解决,或者需要进行调查。 (感谢 Donal,这是一个很好的答案)
  • 以与上面类似的方式浏览功能代码的现有单元测试,
  • 阅读 UML、Doxygen 生成的类图和其他文档,以获得对代码的广泛了解。
  • 进行小的编辑或错误修复,然后逐渐建立
  • 保留笔记,不要跳入并开始开发; 花时间理解比生成混乱或不适当的代码更有价值。

这篇文章与-熟悉继承代码库的最佳方法

Joining an existing team with a large codebase already in place can be daunting. What's the best approach;

  • Broad; try to get a general overview of how everything links together, from the code
  • Narrow; focus on small sections of code at a time, understanding how they work fully
  • Pick a feature to develop and learn as you go along
  • Try to gain insight from class diagrams and uml, if available (and up to date)
  • Something else entirely?

I'm working on what is currently an approx 20k line C++ app & library (Edit: small in the grand scheme of things!). In industry I imagine you'd get an introduction by an experienced programmer. However if this is not the case, what can you do to start adding value as quickly as possible?

--
Summary of answers:

  • Step through code in debug mode to see how it works
  • Pair up with someone more familiar with the code base than you, taking turns to be the person coding and the person watching/discussing. Rotate partners amongst team members so knowledge gets spread around.
  • Write unit tests. Start with an assertion of how you think code will work. If it turns out as you expected, you've probably understood the code. If not, you've got a puzzle to solve and or an enquiry to make. (Thanks Donal, this is a great answer)
  • Go through existing unit tests for functional code, in a similar fashion to above
  • Read UML, Doxygen generated class diagrams and other documentation to get a broad feel of the code.
  • Make small edits or bug fixes, then gradually build up
  • Keep notes, and don't jump in and start developing; it's more valuable to spend time understanding than to generate messy or inappropriate code.

this post is a partial duplicate of the-best-way-to-familiarize-yourself-with-an-inherited-codebase

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(19

深海蓝天 2024-07-14 03:56:07

如果可能的话,从一些小任务开始,围绕您的问题调试代码。
在调试模式下单步执行代码是了解某些内容如何工作的最简单方法。

Start with some small task if possible, debug the code around your problem.
Stepping through code in debug mode is the easiest way to learn how something works.

莫多说 2024-07-14 03:56:07

另一种选择是为您感兴趣的功能编写测试。设置测试工具是确定系统具有哪些依赖关系及其状态所在的好方法。 每个测试都以关于您认为系统应该如何工作的方式的断言开始。 如果事实证明它是这样工作的,那么您就已经取得了一些成就,并且您已经有了一些可以重现它的工作示例代码。 如果这不起作用,那么你就需要解决一个难题并进行一系列调查。

Another option is to write tests for the features you're interested in. Setting up the test harness is a good way of establishing what dependencies the system has and where its state resides. Each test starts with an assertion about the way you think the system should work. If it turns out to work that way, you've achieved something and you've got some working sample code to reproduce it. If it doesn't work that way, you've got a puzzle to solve and a line of enquiry to follow.

绝影如岚 2024-07-14 03:56:07

我通常向人们建议尚未提及的一件事是,在成为开发人员之前,成为现有代码库的合格用户非常重要。 当新开发人员进入我们的大型软件项目时,我建议他们花时间成为专家用户,然后再尝试编写代码。

也许这是显而易见的,但我看到很多人试图太快地跳入代码,因为他们渴望开始取得进展。

One thing that I usually suggest to people that has not yet been mentioned is that it is important to become a competent user of the existing code base before you can be a developer. When new developers come into our large software project, I suggest that they spend time becoming expert users before diving in trying to work on the code.

Maybe that's obvious, but I have seen a lot of people try to jump into the code too quickly because they are eager to start making progress.

为人所爱 2024-07-14 03:56:07

这很大程度上取决于您是哪种类型的学习者和哪种程序员,但是:

  • 首先是广泛的 - 您需要了解范围和规模。 如果文档/uml 不错的话,这可能包括浏览它们。 如果这是一个长期项目,并且您需要全面了解所有内容,那么我实际上可能会正确阅读文档。 再说一次,如果他们很好。
  • 狭窄——选择一些可管理的东西并尝试理解它。 体验一下代码。
  • 选择一个功能 - 如果您有信心,可以选择与您刚刚查看的功能不同的功能,然后开始进行一些小的更改。
  • 迭代 - 评估事情的进展情况,看看是否可以从更深入地重复早期步骤中受益。

This is quite dependent on what sort of learner and what sort of programmer you are, but:

  • Broad first - you need an idea of scope and size. This might include skimming docs/uml if they're good. If it's a long term project and you're going to need a full understanding of everything, I might actually read the docs properly. Again, if they're good.
  • Narrow - pick something manageable and try to understand it. Get a "taste" for the code.
  • Pick a feature - possibly a different one to the one you just looked at if you're feeling confident, and start making some small changes.
  • Iterate - assess how well things have gone and see if you could benefit from repeating an early step in more depth.
阳光①夏 2024-07-14 03:56:07

我建议在其上运行 Doxygen 以获得最新的类图,然后广泛使用一段时间。 这为您提供了一个快速的总体情况,您可以在深入了解代码时使用它。

I would suggest running Doxygen on it to get an up-to-date class diagram, then going broad-in for a while. This gives you a quickie big picture that you can use as you get up close and dirty with the code.

情归归情 2024-07-14 03:56:07

严格轮换配对。

如果可能,在浏览文档/代码库时,尝试采用严格轮换配对。 意思是,你们两个人坐在一起一段固定的时间(例如,2 小时的会议),然后交换配对,一个人将继续完成该任务,而另一个人将与另一个伙伴一起完成另一项任务。

你们两人一组都会学到一些知识,然后在轮换时将这些知识提供给团队的其他成员。 这样做的好处还在于,当一对新人聚集在一起时,负责该任务的人(在本例中是研究代码)可以以更容易理解的方式总结和解释这些概念。 随着时间的推移,每个人都应该处于相似的理解水平,并希望避免“哦,只有约翰知道那段代码”综合症。

根据我对你的情况的了解,你有一个很好的数字(3对),但是,如果你是分布式的,或者没有按照相同的时间尺度工作,那么这是不可能的。

Pairing with strict rotation.

If possible, while going through the documentation/codebase, try to employ pairing with strict rotation. Meaning, two of you sit together for a fixed period of time (say, a 2 hour session), then you switch pairs, one person will continue working on that task while the other moves to another task with another partner.

In pairs you'll both pick up a piece of knowledge, which can then be fed to other members of the team when the rotation occurs. What's good about this also, is that when a new pair is brought together, the one who worked on the task (in this case, investigating the code) can then summarise and explain the concepts in a more easily understood way. As time progresses everyone should be at a similar level of understanding, and hopefully avoid the "Oh, only John knows that bit of the code" syndrome.

From what I can tell about your scenario, you have a good number for this (3 pairs), however, if you're distributed, or not working to the same timescale, it's unlikely to be possible.

美人如玉 2024-07-14 03:56:07

我同意这完全取决于你是什么类型的学习者。 话虽如此,我曾在两家拥有非常庞大的代码库的公司工作过。 通常,我的工作方式是这样的:

如果可能的话,在查看任何功能代码之前,我会先检查已经编写的单元测试。 这些通常可以提供很大帮助。 如果它们不可用,那么我会执行以下操作。

首先,我很大程度上忽略了实现,只查看头文件或类接口。 我尝试了解每堂课的目的是什么。 其次,我从似乎最重要的领域开始深入实施一级。 这很难衡量,所以有时我只是从顶部开始,然后在文件列表中向下移动。 我称之为广度优先学习。 在这个初始步骤之后,我通常会深入了解其余的代码。 最初的广度优先的外观有助于巩固/修复我从界面级别获得的任何想法,然后深度的外观向我展示了用于实现系统的模式,以及不同的设计思想。 我所说的深度优先,是指您基本上使用调试器单步执行程序,单步执行每个函数以查看其工作原理,等等。 对于真正的大型系统来说,这显然是不可能的,但 20k LOC 并不算多。 :)

I agree that it depends entirely on what type of learner you are. Having said that, I've been at two companies which had very large code-bases to begin with. Typically, I work like this:

If possible, before looking at any of the functional code, I go through unit tests that are already written. These can generally help out quite a lot. If they aren't available, then I do the following.

First, I largely ignore implementation and look only at header files, or just the class interfaces. I try to get an idea of what the purpose of each class is. Second, I go one level deep into the implementation starting with what seems to be the area of most importance. This is hard to gauge, so occasionally I just start at the top and work my way down in the file list. I call this breadth-first learning. After this initial step, I generally go depth-wise through the rest of the code. The initial breadth-first look helps to solidify/fix any ideas I got from the interface level, and then the depth-wise look shows me the patterns that have been used to implement the system, as well as the different design ideas. By depth-first, I mean you basically step through the program using the debugger, stepping into each function to see how it works, and so on. This obviously isn't possible with really large systems, but 20k LOC is not that many. :)

小镇女孩 2024-07-14 03:56:07

与另一位更熟悉系统的程序员合作开发新功能或修复错误。 这是我见过最有效的方法。

Work with another programmer who is more familiar with the system to develop a new feature or to fix a bug. This is the method that I've seen work out the best.

情场扛把子 2024-07-14 03:56:07

我也有类似的情况。 我想说你可以这样:

  • 如果它是一个数据库驱动的应用程序,那么从数据库开始,尝试理解每个表、它的字段以及它与其他表的关系。
  • 一旦底层存储搞定了,就可以向上移动到 ORM 层。 这些表必须在代码中具有某种表示形式。
  • 完成后,然后继续讨论这些对象是如何以及从哪里来的。 界面? 什么接口? 有任何验证吗? 在它们进入数据存储之前会对它们进行什么预处理?

这将使您更好地熟悉该系统。 请记住,只有当您非常清楚正在测试什么以及为什么需要对其进行测试时,才可以尝试编写或理解单元测试> 那样。

如果大型应用程序不是由数据库驱动的,我会推荐另一种方法:

  • 系统的主要目标是什么?
  • 那么解决这个问题的系统的主要组成部分是什么?
  • 每个组件之间有什么相互作用? 制作一个描述组件依赖关系的图表。 询问已经在从事这方面工作的人。 这些组件必须相互交换某些内容,因此也尝试找出这些组件(例如 IO 可能会将文件对象返回到 GUI 等)
  • 一旦对此感到满意,就深入研究其中依赖性最小的组件。 现在研究该组件如何进一步划分为类以及它们如何相互交互。 这样,您就可以掌握单个组件的全部内容
  • 移动到下一个最不依赖的组件
  • 最后,移动到通常依赖于您已经解决的许多其他组件的核心
  • 组件在核心组件中,您可能会回顾之前检查过的组件,所以不用担心,继续努力!

对于第一个策略:
以 stackoverflow 网站为例。 检查数据存储、存储的内容、存储的方式、这些项目在代码中的表示形式、这些项目在 UI 上的呈现方式和位置。 它们从哪里来,一旦它们返回到数据存储,就会对它们进行什么处理。

第二个
以文字处理器为例。 有哪些组件? IO、UI、页面等。 它们是如何相互作用的? 随着您进一步学习,继续前进。

放松。 编写的代码是某人的思维方式,冻结了逻辑和思维方式,需要时间来读懂这种思维方式。

I had a similar situation. I'd say you go like this:

  • If its a database driven application, start from the database and try to make sense of each table, its fields and then its relation to the other tables.
  • Once fine with the underlying store, move up to the ORM layer. Those table must have some kind of representation in code.
  • Once done with that then move on to how and where from these objects are coming from. Interface? what interface? Any validations? What preprocessing takes place on them before they go to the datastore?

This would familiarize you better with the system. Remember that trying to write or understand unit tests is only possible when you know very well what is being tested and why it needs to be tested in only that way.

And in case of a large application that is not driven towards databases, I'd recommend an other approach:

  • What the main goal of the system?
  • What are the major components of the system then to solve this problem?
  • What interactions each of the component has among them? Make a graph that depicts component dependencies. Ask someone already working on it. These componentns must be exchanging something among each other so try to figure out those as well (like IO might be returning File object back to GUI and like)
  • Once comfortable to this, dive into component that is least dependent among others. Now study how that component is further divided into classes and how they interact wtih each other. This way you've got a hang of a single component in total
  • Move to the next least dependent component
  • To the very end, move to the core component that typically would have dependencies on many of the other components which you've already tackled
  • While looking at the core component, you might be referring back to the components you examined earlier, so dont worry keep working hard!

For the first strategy:
Take the example of this stackoverflow site for instance. Examine the datastore, what is being stored, how being stored, what representations those items have in the code, how an where those are presented on the UI. Where from do they come and what processing takes place on them once they're going back to the datastore.

For the second one
Take the example of a word processor for example. What components are there? IO, UI, Page and like. How these are interacting with each other? Move along as you learn further.

Be relaxed. Written code is someone's mindset, froze logic and thinking style and it would take time to read that mind.

墨落成白 2024-07-14 03:56:07

我认为你需要将其与特定任务联系起来。 当你有时间的时候,可以选择你喜欢的方法。

当你有事情需要完成时,集中注意力并完成它。

I think you need to tie this to a particular task. When you have time on your hands, go for whichever approach you are in the mood for.

When you have something that needs to get done, give yourself a narrow focus and get it done.

时常饿 2024-07-14 03:56:07

让团队让你花两周时间修复错误(如果你有两周时间)。 他们会很高兴有人负责这件事,到这段时间结束时,您将花费大量时间解决图书馆的问题,您可能会非常了解它。

Get the team to put you on bug fixing for two weeks (if you have two weeks). They'll be happy to get someone to take responsibility for that, and by the end of the period you will have spent so much time problem-solving with the library that you'll probably know it pretty well.

泼猴你往哪里跑 2024-07-14 03:56:07

如果它有单元测试(我打赌它没有)。 从小处开始,确保单元测试不会失败。 如果你立刻盯着整个代码库,你的眼睛会变得呆滞,你会感到不知所措。

如果没有单元测试,您需要专注于您想要的功能。 运行应用程序并查看您的功能应该影响的结果。 然后开始查看代码,尝试找出应用程序如何创建您想要更改的内容。 最后更改它并检查结果是否符合您的要求。

您提到它是一个应用程序和一个库。 首先更改应用程序并坚持以用户身份使用库。 那么当你学会了这个库之后,改变起来就会更容易了。

从自上而下的方法来看,应用程序可能有一个主循环或一个主 GUI 来控制所有操作。 值得了解应用程序的主要控制流程。 值得阅读代码,让自己对应用程序的主要流程有一个广泛的了解。 如果它是一个 GUI 应用程序,请创建一篇论文来显示有哪些屏幕以及如何从一个屏幕转到另一个屏幕。 如果是命令行应用程序,处理是如何完成的。

即使在公司中,采用这种方法也并不罕见。 通常没有人完全理解应用程序是如何工作的。 人们没有时间带你四处参观。 他们更喜欢针对具体事物的具体问题,因此您必须自己深入研究并进行实验。 然后,一旦您得到具体问题,您就可以尝试隔离该应用程序部分的知识来源并提出问题。

If it has unit tests (I'm betting it doesn't). Start small and make sure the unit tests don't fail. If you stare at the entire codebase at once your eyes will glaze over and you will feel overwhelmed.

If there are no unit tests, you need to focus on the feature that you want. Run the app and look at the results of things that your feature should affect. Then start looking through the code trying to figure out how the app creates the things you want to change. Finally change it and check that the results come out the way you want.

You mentioned it is an app and a library. First change the app and stick to using the library as a user. Then after you learn the library it will be easier to change.

From a top down approach, the app probably has a main loop or a main gui that controls all the action. It is worth understanding the main control flow of the application. It is worth reading the code to give yourself a broad overview of the main flow of the app. If it is a GUI app, creating a paper that shows which screens there are and how to get from one screen to another. If it is a command line app, how the processing is done.

Even in companies it is not unusual to have this approach. Often no one fully understands how an application works. And people don't have time to show you around. They prefer specific questions about specific things so you have to dig in and experiment on your own. Then once you get your specific question you can try to isolate the source of knowledge for that piece of the application and ask it.

戒ㄋ 2024-07-14 03:56:07

首先了解“问题领域”(是工资系统?库存?实时控制还是其他什么)。 如果您不理解用户使用的术语,您将永远无法理解代码。

然后看对象模型; 可能已经有一个图表,或者您可能必须对一个图表进行逆向工程(手动或使用 Doug 建议的工具)。 在此阶段,您还可以调查数据库(如果有),是否应该遵循对象模型,但也可能不遵循,了解这一点很重要。

查看更改历史记录或错误数据库,如果有一个区域经常出现,请先查看该区域。 这并不意味着它写得不好,而是说它是每个人都使用的。

最后,记一些笔记(我更喜欢维基百科)。

  • 现有的人可以用它来检查你的假设并帮助你。
  • 您稍后需要再次参考它。
  • 团队中的下一个新人会非常感谢你。

Start by understanding the 'problem domain' (is it a payroll system? inventory? real time control or whatever). If you don't understand the jargon the users use, you'll never understand the code.

Then look at the object model; there might already be a diagram or you might have to reverse engineer one (either manually or using a tool as suggested by Doug). At this stage you could also investigate the database (if any), if should follow the object model but it may not, and that's important to know.

Have a look at the change history or bug database, if there's an area that comes up a lot, look into that bit first. This doesn't mean that it's badly written, but that it's the bit everyone uses.

Lastly, keep some notes (I prefer a wiki).

  • The existing guys can use it to sanity check your assumptions and help you out.
  • You will need to refer back to it later.
  • The next new guy on the team will really thank you.
梦归所梦 2024-07-14 03:56:07

首先,如果您有具有代码经验的团队成员,您应该安排他们与您一起概述代码。 每个团队成员都应该向您提供有关其专业领域的信息。 让多人解释事情通常很有价值,因为有些人比其他人更擅长解释,有些人比其他人有更好的理解。

然后,您需要开始在没有任何压力的情况下阅读一段代码(几天或一周,如果您的老板愿意的话)。 它通常有助于您自己编译/构建项目,并能够在调试模式下运行项目,以便您可以单步调试代码。 然后,开始实践,修复小错误并进行小改进。 希望您很快就能为一个中型项目做好准备,然后再为一个大型项目做好准备。 在前进的过程中继续依靠你的队友——通常你可以找到一个特别愿意指导你的人。

如果你遇到困难,不要对自己太严厉——这很正常。 理解大型代码库可能需要很长时间,甚至几年。 实际上,经常出现的情况是,即使经过多年,代码中的某些部分仍然有点令人恐惧和不透明。 当您在项目之间遇到停机时间时,您可以深入研究这些领域,并且您经常会发现,经过几次尝试后,您甚至可以弄清楚这些部分。

祝你好运!

First, if you have team members available who have experience with the code you should arrange for them to do an overview of the code with you. Each team member should provide you with information on their area of expertise. It is usually valuable to get multiple people explaining things, because some will be better at explaining than others and some will have a better understanding than others.

Then, you need to start reading the code for a while without any pressure (a couple of days or a week if your boss will provide that). It often helps to compile/build the project yourself and be able to run the project in debug mode so you can step through the code. Then, start getting your feet wet, fixing small bugs and making small enhancements. You will hopefully soon be ready for a medium-sized project, and later, a big project. Continue to lean on your team-mates as you go - often you can find one in particular who is willing to mentor you.

Don't be too hard on yourself if you struggle - that's normal. It can take a long time, maybe years, to understand a large code base. Actually, it's often the case that even after years there are still some parts of the code that are still a bit scary and opaque. When you get downtime between projects you can dig in to those areas and you'll often find that after a few tries you can figure even those parts out.

Good luck!

深海夜未眠 2024-07-14 03:56:07

您可能需要考虑查看源代码逆向工程工具。 我知道有两个工具:

  • SWAG Kit(仅限 Linux)链接
  • Bauhaus 学术 学术 axivion.com/index-en.html" rel="nofollow noreferrer">商业

两种工具都提供类似的功能集,其中包括生成软件中模块之间关系图的静态分析。

这主要由调用图和类型/类礼仪组成。 查看此信息应该可以让您很好地了解代码各部分如何相互关联。 使用此信息,您可以深入了解您最感兴趣且需要首先理解/修改的部分的实际来源。

You may want to consider looking at source code reverse engineering tools. There are two tools that I know of:

Both tools offer similar feature sets that include static analysis that produces graphs of the relations between modules in the software.

This mostly consists of call graphs and type/class decencies. Viewing this information should give you a good picture of how the parts of the code relate to one another. Using this information, you can dig into the actual source for the parts that you are most interested in and that you need to understand/modify first.

姜生凉生 2024-07-14 03:56:07

我发现仅仅跳入代码可能有点让人不知所措。 尝试尽可能多地阅读有关设计的文档。 这有望解释每个组件的目的和结构。 如果现有的开发人员可以引导您完成它,那就最好了,但这并不总是可能的。

一旦您熟悉了代码的高级结构,请尝试修复一两个错误。 这将帮助您掌握实际的代码。

I find that just jumping in to code can be a a bit overwhelming. Try to read as much documentation on the design as possible. This will hopefully explain the purpose and structure of each component. Its best if an existing developer can take you through it but that isn't always possible.

Once you are comfortable with the high level structure of the code, try to fix a bug or two. this will help you get to grips with the actual code.

烟燃烟灭 2024-07-14 03:56:07

我喜欢所有的答案,这些答案说你应该使用像 Doxygen 这样的工具来获取类图,并首先尝试理解大局。 我完全同意这种说法。

也就是说,这很大程度上取决于代码一开始的分解程度。 如果它是一个巨大的混乱,它将很难学习。 如果它干净并且组织得当,那么它应该不会那么糟糕。

I like all the answers that say you should use a tool like Doxygen to get a class diagram, and first try to understand the big picture. I totally agree with this.

That said, this largely depends on how well factored the code is to begin with. If its a gigantic mess, it's going to be hard to learn. If its clean, and organized properly, it shouldn't be that bad.

月依秋水 2024-07-14 03:56:07

请参阅这个答案关于如何使用测试覆盖率工具来定位感兴趣的功能的代码,而不知道该功能在哪里,或者它如何分布在许多模块中。

See this answer on how to use test coverage tools to locate the code for a feature of interest, without knowing anything about where that feature is, or how it is spread across many modules.

江湖正好 2024-07-14 03:56:07

(前方无耻的营销)

您应该查看nWire。 它是一个 Eclipse 插件,用于导航和可视化大型代码库。 我们的许多客户使用它通过打印主要流程的可视化来吸引新开发人员。

(shameless marketing ahead)

You should check out nWire. It is an Eclipse plugin for navigating and visualizing large codebases. Many of our customers use it to break-in new developers by printing out visualizations of the major flows.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文