返回介绍

Exercise 27: Creative And Defensive Programming

发布于 2025-03-08 19:42:07 字数 24792 浏览 0 评论 0 收藏 0

You have now learned most of the basics of C programming and are ready to start becoming a serious programmer. This is where you go from beginner to expert, both with C and hopefully with core computer science concepts. I will be teaching you a few of the core data structures and algorithms that every programmer should know, and then a few very interesting ones I've used in real software for years.

Before I can do that I have to teach you some basic skills and ideas that will help you make better software. Exercises 27 through 31 will teach you advanced concepts and feature more talking than code, but after those you'll apply what you learn to making a core library of useful data structures.

The first step in getting better at writing C code (and really any language) is to learn a new mindset called "defensive programming". Defensive programming assumes that you are going to make many mistakes and then attempts to prevent them at every possible step. In this exercise I'm going to teach you how to think about programming defensively.

The Creative Programmer Mindset

It's not possible to tell you how to be creative in a short exercise like this, but I will tell you that creativity involves taking risks and being open minded. Fear will quickly kill creativity, so the mindset I adopt, and many programmers adopt on, accident is designed to make me unafraid of taking chances and looking like an idiot:

  • I can't make a mistake.
  • It doesn't matter what people think.
  • Whatever my brain comes up with is going to be a great idea.

I only adopt this mindset temporarily, and even have little tricks to turn it on. By doing this I can come up with ideas, find creative solutions, open my thoughts to odd connections, and just generally invent weirdness without fear. In this mindset I will typically write a horrible first version of something just to get the idea out.

However, when I've finished my creative prototype I will throw it out and get serious about making it solid. Where other people make a mistake is carrying the creative mindset into their implementation phase. This then leads to a very different destructive mindset that is the dark side of the creative mindset:

  • It is possible to write perfect software.
  • My brain tells me the truth, and it can't find any errors, therefore I have written perfect software.
  • My code is who I am and people who criticize its perfection are criticizing me.

These are lies. You will frequently run into programmers who feel intense pride about what they've created, which is natural, but this pride gets in the way of their ability to objectively improve their craft. Because of pride and attachment to what they've written, they can continue to believe that what they write is perfect. As long as they ignore other people's criticism of their code they can protect their fragile ego and never improve.

The trick to being creative and making solid software is to also be able to adopt a defensive programming mindset.

The Defensive Programmer Mindset

After you have a working creative prototype and you're feeling good about the idea, it's time to switch to being a defensive programmer. The defensive programmer basically hates your code and believes these things:

  • Software has errors.
  • You are not your software, yet you are are responsible for the errors.
  • You can never remove the errors, only reduce their probability.

This mindset lets you be honest about your work and critically analyze it for improvements. Notice that it doesn't say you are full of errors? It says your code is full of errors. This is a significant thing to understand because it gives you the power of objectivity for the next implementation.

Just like the creative mindset, the defensive programming mindset has a dark side as well. The defensive programmer is a paranoid who is afraid of everything, and this fear prevents them from possibly being wrong or making mistakes. That's great when you are trying to be ruthlessly consistent and correct, but it is murder on creative energy and concentration.

The Eight Defensive Programmer Strategies

Once you've adopted this mindset, you can then rewrite your prototype and follow a set of eight strategies I use to make my code as solid as I can. While I work on the "real" version I ruthlessly follow these strategies and try to remove as many errors as I can, thinking like someone who wants to break the software.

Never Trust Input

Never trust the data you are given and always validate it.

Prevent Errors

If an error is possible, no matter how probable, try to prevent it.

Fail Early And Openly

Fail early, cleanly, and openly, stating what happened, where and how to fix it.

Document Assumptions

Clearly state the pre-conditions, post-conditions, and invariants.

Prevention Over Documentation

Do not do with documentation, that which can be done with code or avoided completely.

Automate Everything

Automate everything, especially testing.

Simplify And Clarify

Always simplify the code to the smallest, cleanest form that works without sacrificing safety.

Question Authority

Do not blindly follow or reject rules.

These aren't the only ones, but they're the core things I feel programmers have to focus on when trying to make good solid code. Notice that I don't really say exactly how to do these. I'll go into each of these in more detail, and some of the exercises actually cover them extensively.

Applying The Eight Strategies

These ideas are all great pop-psychology platitudes, but how do you actually apply them to working code? I'm now going to give you a set of things to always do in this book's code that demonstrate each one with a concrete example. The ideas aren't limited to these examples, and you should use these as a guide to making your own code tougher.

Never Trust Input

Let's look at an example of bad design and "better" design. I won't say good design because this could be done even better. Take a look at two functions that both copy a string and a simple main to test out the better one.

#undef NDEBUG
#include "dbg.h"
#include <stdio.h>
#include <assert.h>

/*
 * Naive copy that assumes all inputs are always valid
 * taken from K&R C and cleaned up a bit.
 */
void copy(char to[], char from[])
{
    int i = 0;

    // while loop will not end if from isn't '\0' terminated
    while((to[i] = from[i]) != '\0') {
        ++i;
    }
}

/*
 * A safer version that checks for many common errors using the
 * length of each string to control the loops and termination.
 */
int safercopy(int from_len, char *from, int to_len, char *to)
{
    assert(from != NULL && to != NULL && "from and to can't be NULL");
    int i = 0;
    int max = from_len > to_len - 1 ? to_len - 1 : from_len;

    // to_len must have at least 1 byte
    if(from_len < 0 || to_len <= 0) return -1;

    for(i = 0; i < max; i++) {
        to[i] = from[i];
    }

    to[to_len - 1] = '\0';

    return i;
}

int main(int argc, char *argv[])
{
    // careful to understand why we can get these sizes
    char from[] = "0123456789";
    int from_len = sizeof(from);

    // notice that it's 7 chars + \0
    char to[] = "0123456";
    int to_len = sizeof(to);

    debug("Copying '%s':%d to '%s':%d", from, from_len, to, to_len);

    int rc = safercopy(from_len, from, to_len, to);
    check(rc > 0, "Failed to safercopy.");
    check(to[to_len - 1] == '\0', "String not terminated.");

    debug("Result is: '%s':%d", to, to_len);

    // now try to break it
    rc = safercopy(from_len * -1, from, to_len, to);
    check(rc == -1, "safercopy should fail #1");
    check(to[to_len - 1] == '\0', "String not terminated.");

    rc = safercopy(from_len, from, 0, to);
    check(rc == -1, "safercopy should fail #2");
    check(to[to_len - 1] == '\0', "String not terminated.");

    return 0;

error:
    return 1;
}

The copy function is typical C code and it's the source of a huge number of buffer overflows. It is flawed because it assumes that it will always receive a validly terminated C string (with '\0' ) and just uses a while-loop to process it. Problem is, ensuring that is incredibly difficult, and if not handled right it causes the while-loop to loop infinitely. A cornerstone of writing solid code is never writing loops that can possibly loop forever.

The safercopy function tries to solve this by requiring the caller to give the lengths of the two strings it must deal with. By doing this it can make certain checks about these strings that the copy function can't. It can check the lengths are right, that the to string has enough space, and it will always terminate. It's impossible for this function to run on forever like the copy function.

This is the idea behind never trusting the inputs you receive. If you assume that your function is going to get a string that's not terminated (which is common) then you design your function to not rely on that to function properly. If you need the arguments to never be NULL then you should check for that too. If the sizes should be within sane levels, then check that. You simply assume that whoever is calling you got it wrong and try to make it difficult for them to give you bad state.

This then extends out to software you write that gets input from the external universe. The famous last words of the programmer are, "Nobody's going to do that." I've seen them say that and then the next day someone does exactly that, crashing or hacking their application. If you say nobody is going to do that, just throw in the code to make sure they simply can't hack your application. You'll be glad you did.

There is a diminishing returns on this, but here's a list of things I try to do with all of my functions I write in C:

  • For each parameter identify what its preconditions are, and whether the precondition should cause a failure or return an error. If you are writing a library, favor errors over failures.
  • Add assert calls at the beginning that checks for each failure precondition using assert(test && "message"); This little hack does the test, and when it fails the OS will typically print the assert line for you, which then includes that message. Very helpful when you're trying to figure out why that assert is there.
  • For the other preconditions, return the error code or use my check macro to do that and give an error message. I didn't use check in this example since it would confuse the comparison.
  • Document why these preconditions exist so that when a programmer hits the error they can figure out if they are really necessary or not.
  • If you are modifying the inputs, make sure that they are correctly formed when the function exits, or abort if they aren't.
  • Always check the error codes of functions you use. For example, people frequently forget to check the return codes from fopen or fread which causes them to use the resources they give despite the error. This causes your program to crash or gives an avenue for an attack.
  • You also need to be returning consistent error codes so that you can do this for all of your functions too. Once you get in this habit you will then understand why my check macros work the way they do.

Just doing these simple things will improve your resource handling and prevent quite a few errors.

Prevent Errors

In the previous example you may hear people say, "Well it's not very likely someone will use copy wrong." Despite the mountain of attacks made against this very kind of function they still believe that the probability of this error is very low. Probability is a funny thing because people are incredibly bad at guessing the probability of any event. People are however much better at determining if something is possible . They may say the error in copy is not probably , but they can't deny that it's possible .

The key reason is that for something to be probable, it first has to be possible. Determining the possibility is easy, since we can all imagine something happening. What's not so easy is determining its possibility after that. Is the chance that someone might use copy wrong 20%, 10%, or 1%? Who knows, and to determine that you'd need to gather evidence, look at rates of failure in many software packages, and probably survey real programmers and how they use the function.

This means, if you're going to prevent errors then you need to try to prevent what is possible, but focus your energies on what's most probable first. It may not be feasible to handle all the possible ways your software can be broken, but you have to attempt it. But, at the same time, if you don't constrain your efforts to the most probably events with the least effort then you'll be wasting time on irrelevant attacks.

Here's a process for determining what to prevent in your software:

  • List all the possible errors that can happen, no matter how probable. Within reason of course. No point listing aliens sucking your memories out to steal your passwords.
  • Give each one a probability that's a percentage of operations that can be vulnerable. If you are handling requests from the internet, then it's the percentage of requests that can cause the error. If it's function calls, then it's what percentage of function calls can cause it.
  • Give each one an effort in number of hours or amount of code to prevent it. You could also just give an easy or hard metric. Any metric that prevents you from working on the impossible when there's easier things to fix still on the list.
  • Rank them by effort (lowest to highest), and probability (highest to lowest). This is now your task list.
  • Prevent all the errors you can in this list, aiming for removing the possibility, then reducing the probability if you can't make it impossible.
  • If there are errors you can't fix, then document them so someone else can fix it.

This little process will give you a nice list of things to do, but more importantly keep you from working on useless things when there's other more important things to work on. You can also be more or less formal with this process. If you're doing a full security audit this will be better done with a whole team and a nice spreadsheet. If you're just writing a function then simply reviewing the code and scratching out these into some comments is good enough. What's important is you stop assuming that errors don't happen, and you work on removing them when you can without wasting effort.

Fail Early And Openly

If you encounter an error in C you have two choices:

  • Return an error code.
  • Abort the process.

This is just how it is, so what you need to do is make sure the failures happen quickly, are clearly documented, give an error message, and are easy for the programmer to avoid. This is why the check macros I've given you work the way they do. For every error you find it prints a message, the file and line number where it happened, and force a return code. If you just use my macros you'll end up doing the right thing anyway.

I tend to prefer returning error code to aborting the program. If it's catastrophic then I will, but very few errors are truly catastrophic. A good example of when I'll abort a program is if I'm given an invalid pointer, as I did in safercopy . Instead of having the programmer experience a segmentation fault explosion "somewhere", I catch it right away and abort. However, if it's common to pass in a NULL then I'll probably change that to a check instead so that the caller can adapt and keep running.

In libraries however, I try my hardest to never abort. The software using my library can decide if it should abort, and typically I'll only abort if the library is very badly used.

Finally, a big part of being "open" about errors is not using the same message or error code for more than one possible error. You typically see this with errors on external resources. A library will receive an error on a socket, and then simply report "bad socket". What they should do is return exactly what the error was on the socket so it can be debugged properly and fixed. When designing your error reporting, make sure you give a different error message for the different possible errors.

Document Assumptions

If you're following along and doing this advice then what you'll be doing is building a "contract" of how your functions expect the world to be. You've created preconditions for each argument, you've handled possible errors, and you're failing elegantly. The next step is to complete the contract and add "invariants" and "postconditions".

An invariant is some condition that must be held true in some state while the function runs. This isn't very common in simple functions, but when you're dealing with complex structures it becomes more necessary. A good example of an invariant is that a structure is always initialized properly while it's being used. Another would be that a sorted data structure is always sorted during processing.

A postcondition is a guarantee on the exit value or result of a function running. This can blend together with invariants, but this is something as simple as "function always returns 0 or -1 on error". Usually these are documented, but if your function returns an allocated resource, you can add a postcondition that checks to make sure it's returning something and not NULL. Or, you can use NULL to indicate an error, so in that case your postcondition is now checking the resource is deallocated on any errors.

In C programming invariants and postconditions are usually more documentation than actual code and assertions. The best way to handle them is add assert calls for the ones you can, then document the rest. If you do that then when people hit an error they can see what assumptions you made when writing the function.

Prevention Over Documentation

A common problem when programmers write code is they will document a common bug rather than simply fix it. My favorite is when the Ruby on Rails system simply assumed that all months had 30 days. Calendars are hard, so rather than fix it they threw a tiny little comment somewhere that said this was on purpose, and then they refused to fix it for years. Every time someone would complain they would then bluster and yell, "But it's documented!"

Documentation doesn't matter if you can actually fix the problem, and if the function has a fatal flaw then simply don't include it until you can fix it. In the case of Ruby on Rails, not having date functions would have been better than including purposefully broken ones that nobody could use.

As you go through your defensive programming cleanups, try to fix everything you can. If you find yourself documenting more and more problems you can't fix, then consider redesigning the feature or simply removing it. If you really have to keep this horribly broken feature, then I suggest you write it, document it and find a new job before you are blamed for it.

Automate Everything

You are a programmer, and that means your job is putting other people out of jobs with automation. The pinnacle of this is putting yourself out of a job with your own automation. Obviously you won't completely remove what you do, but if are spending your whole day rerunning manual tests in your terminal, then your job is not programming. You are doing QA, and you should automate yourself out of this QA job you probably don't really want anyway.

The easiest way to do this is to write automated tests, or unit tests. In this book I'm going to get into how to do this easily, and I'll avoid most of the dogma of when you should write tests. I'll focus on how to write them, what to test, and how to be efficient at the testing.

Common things programmers fail to automate but they should:

  • Testing and validation.
  • Build processes.
  • Deployment of software.
  • System administration.
  • Error reporting.

Try to devote some of your time to automating this and you'll have more time to work on the fun stuff. Or, if this is fun to you, then maybe you should work on software that makes automating these things easier.

Simplify And Clarify

The concept of "simplicity" is a slippery one to many people, especially smart people. They generally confuse "comprehension" with "simplicity". If they understand it well, clearly it's simple. The actual test of simplicity is by comparison with something else that could be simpler. But, you'll see people who write code go running to the most complex obtuse structures possible because they think the simpler version of the same thing is "dirty". A love affair with complexity is a programming sickness.

You can fight this disease by first telling yourself, "Simple and clear is not dirty, no matter what everyone else is doing." If everyone else is writing insane visitor patterns involving 19 classes over 12 interfaces and you can do it with two string operations, then you win. They are wrong, no matter how "elegant" they think their complex monstrosity is.

The simplest test of which function to use is:

  • Make sure both functions have no errors. It doesn't matter how fast or simple a function is if it has errors.
  • If you can't fix one, then pick the other.
  • Do they produce the same result? If not then pick the one that has the result you need.
  • If they produce the same result, then pick the one that either has fewer features, fewer branches, or you just think is simpler.
  • Make sure you're not just picking the one that is most impressive. Simple and dirty beats complex and clean any day.

You'll notice that I mostly give up at the end and tell you to use your judgment. Simplicity is ironically a very complex thing, so using your tastes as a guide is the best way to go. Just make sure you adjust your view of what's "good" as you grow and gain more experience.

Question Authority

The final strategy is the most important because it breaks you out of the defensive programming mindset and lets you transition into the creative mindset. Defensive programming is authoritarian and it can be cruel. The job of this mindset is to make you follow rules because without them you'll miss something or get distracted.

This authoritarian attitude has the disadvantage of disabling independent creative thought. Rules are necessary for getting things done, but being a slave to them will kill your creativity.

This final strategy means you should question the rules you follow periodically and assume that they could be wrong, just like the software you are reviewing. What I will typically do is, after a session of defensive programming, I'll go take a non-programming break and let the rules go. Then I'll be ready to do some creative work or do more defensive coding if need to.

Order Is Not Important

The final thing I'll say on this philosophy is that I'm not telling you to do this in a strict order of "CREATE! DEFEND! CREATE! DEFEND!" At first you may want to do that, but I will actually do either in varying amounts depend on what I want to do, and I may even meld them together with no defined boundary.

I also don't think one mindset is better than another, or that there are strict separation between them. You need both creativity and strictness to do programming well, so work on both if you want to improve.

Extra Credit

  • The code in the book up to this point (and for the rest of it) potentially violates these rules. Go back through and apply what you've learned to one exercise to see if you can improve it or find bugs.
  • Find an open source project and give some of the files a similar code review. Submit a patch that fixes a bug if you find it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文