Code Reformat Finished

At the end of January we completed the OpenSSL code reformat as previously mentioned here and here. This post is intended to give you a bit more insight into exactly what we’ve done.

The historic OpenSSL coding style was very unusual and idiosyncratic. Not only that, but it was inconsistently applied and was not formally defined anywhere. This made reading the code and maintaining it more difficult than it needed to be. As part of our roadmap document we decided to change that. Recently we published for the first time our new Coding Style. This is very much inspired by the Linux Kernel style and we used their Coding Style document as a starting point for developing ours.

Problems encountered

Having developed our Coding Style, the next question was, how are we going to implement it? After looking around at different options we chose the GNU indent tool for performing the task. For those interested, the specific indent options being used are:

-bap -bbo -br -brs -c33 -cd33 -ce -ci4 -cli0 -cp33 -d0 -di1 -hnl -i4 -il1 -ip0
-l78 -lp -nbad -nbc -ncdb -ncs -nfc1 -nfca -npcs -nprs -npsl -nsc -ppi1  -saf
-sai -saw -sob -ss -ts0 -fc1 -fca -cdb -sc

However, it was a little more complicated than we first hoped. There were a number of problems that needed to be overcome.

Firstly, indent needs to know about all types that are used within the code. Without this you get lines like this:

int BIO_read(BIO * b, void *out, int outl)

instead of:

int BIO_read(BIO *b, void *out, int outl)

Note the extra space between the ‘*’ and the ‘b’. indent seems to treat the ‘*’ as a multiplication operator without knowledge of the types. Fortunately indent has a mechanism for supplying type information - all the openssl types are listed in the indent.pro file in the util directory. This isn’t a complete solution however because indent has problems with certain types that OpenSSL uses, e.g. STACK_OF(X509). In order to workaround problems like this we have introduced a script, openssl-format-source which is in the util directory. This performs some up front manipulation of source files to get them into a form that indent can handle; runs indent; and then backs out the changes it initially made. For example it will convert all instances of STACK_OF(X509) into STACK_OF_X509_ (which indent can deal with), and then converts them back afterwards. The script is invoked as follows:

util/openssl-format-source -v -c .

The above command assumes you are in the top level directory, and will apply the formatting rules to all source files. The ‘-v’ argument provides verbose output, and ‘-c’ applies comment formatting rules.

The next problem was that we wanted indent to reformat our comments. However, there are some comments in the code which have some internal formatting which we would not want to lose. For example take this comment from “bio.h”:

/* Buffers are setup like this:
*
* <---------------------- size ----------------------->
* +---------------------------------------------------+
* | consumed | remaining          | free space        |
* +---------------------------------------------------+
* <-- off --><------- len ------->
*/

Running indent on the above, with our selected options, this is what you get:

/*
* Buffers are setup like this: <---------------------- size
* ----------------------->
* +---------------------------------------------------+ | consumed |
* remaining | free space |
* +---------------------------------------------------+ <-- off
* --><------- len ------->
*/

Not quite what we had in mind! The solution that we have applied is to add a ‘-’ symbol at the beginning of the comment, as shown below:

/*-
* Buffers are setup like this:
*
* <---------------------- size ----------------------->
* +---------------------------------------------------+
* | consumed | remaining          | free space        |
* +---------------------------------------------------+
* <-- off --><------- len ------->
*/

This instructs indent not to reformat the comment, and it is left untouched in the final source. That solution in itself causes other problems though:

  • All comments throughout the whole code have to be manually inspected to check whether we should preserve formatting or not
  • When I said above that the ‘-’ symbol leaves a comment untouched - I mean completely untouched, so if the indenting of the surrounding source has changed then the comment will remain at its original indentation

The solution to the second problem above was to go through the entire codebase after the automated formatting had been run, and manually reindent the comments.

It seems in general indent is not very good at handling comments. In particular comments that are embedded in a statement/expression like this:

point_add(nq[0], nq[1], nq[2],
    nq[0], nq[1], nq[2],
    1 /* mixed */, tmp[0], tmp[1], tmp[2]);

After reformatting this line looked like:

point_add(nq[0], nq[1], nq[2], nq[0], nq[1], nq[2], 1 /* mixed
                                                       */ , tmp[0], tmp[1], tmp[2]);

Similar problems can occur with comments on the right hand side of statements. The solution to most of these problems was, on the whole, to move the comment to somewhere else so that indent didn’t get confused.

Sometimes rules are made to be broken. In a small number of instances adhering to the strict letter of the style doesn’t make sense. Take crypto/aes/aes_core.c as an example. This has lots of lines that look similar to:

t0 =
    Te0[(s0 >> 24)       ] ^
    Te1[(s1 >> 16) & 0xff] ^
    Te2[(s2 >>  8) & 0xff] ^
    Te3[(s3      ) & 0xff] ^
    rk[4];

Applying the strict formatting rules, this turns into:

t0 = Te0[(s0 >> 24)] ^
    Te1[(s1 >> 16) & 0xff] ^
    Te2[(s2 >> 8) & 0xff] ^ Te3[(s3) & 0xff] ^ rk[4];

Whilst strictly correct, this is a lot less readable. In the case of this file, and a small number of others, the answer was to manually apply the formatting rules, and to skip it completely from auto-formatting. The openssl-format-source script has a list of files to skip and so does not apply the formatting rules to them.

Tags

The formatting work has been applied to all of our active branches, i.e. in git:

  • master
  • OpenSSL_1_0_2-stable
  • OpenSSL_1_0_1-stable
  • OpenSSL_1_0_0_stable
  • OpenSSL_0_9_8_stable

This has been done to ease mainteance work, and enable cherry-picking of changes between the different branches. In order to minimise the impact for everyone we have tried to keep all of the formatting commits together so that they can be easily identified. For the branches that were released at the time the work was being performed (i.e. versions 1.0.1, 1.0.0 and 0.9.8) we have timed the work so that the commits appeared immediately after a release. Version 1.0.2 was not released at the time (although it has now been). There was a period of about a week between the last releases of 1.0.1, 1.0.0 and 0.9.8 and the formatting commits being applied. During that time the public repository was frozen and no other non-formatting related commits were applied.

In order to help people identify the commits related to reformatting a number of tags have been applied to the repository.

On “master”, the last commit before the change freeze was this one: commit 4b618848f9beb8271f24883694e097caa70013c0

This has been tagged as: master-pre-reformat

The last commit before the openssl-format-source script was run (but after manual prep work had been completed) was: commit 22b52164aaed31d6e93dbd2d397ace041360e6aa

This has been tagged as: master-pre-auto-reformat

The last commit after the automated script was run was: commit 739a5eee619fc8c03736140828891b369f8690f4

This has been tagged as: master-post-auto-reformat

The last commit at the end of the change freeze was this one: commit 35a1cc90bc1795e8893c11e442790ee7f659fffb

This has been tagged as: master-post-reformat

It should be noted that on master only there were a couple of commits that went in some time in advance of the change freeze related to comment changes. This was done before we had decided on the strategy that we finally went with. The commits in question are:

commit 3a83462dfea67566ba9bcedee266dc93d2e911e2 and commit 1d97c8435171a7af575f73c526d79e1ef0ee5960

These two commits will therefore fall outside the two tags mentioned above for the master branch only.

For the other branches tags have been similarly created for immediately before the reformatting work, and for the last commit at the end of the reformatting work. These tags are as follows:

For 1.0.2: OpenSSL_1_0_2-pre-reformat OpenSSL_1_0_2-pre-auto-reformat OpenSSL_1_0_2-post-auto-reformat OpenSSL_1_0_2-post-reformat

For 1.0.1: OpenSSL_1_0_1-pre-reformat OpenSSL_1_0_1-pre-auto-reformat OpenSSL_1_0_1-post-auto-reformat OpenSSL_1_0_1-post-reformat

For 1.0.0: OpenSSL_1_0_0-pre-reformat OpenSSL_1_0_0-pre-auto-reformat OpenSSL_1_0_0-post-auto-reformat OpenSSL_1_0_0-post-reformat

For 0.9.8: OpenSSL_0_9_8-pre-reformat OpenSSL_0_9_8-pre-auto-reformat OpenSSL_0_9_8-post-auto-reformat OpenSSL_0_9_8-post-reformat

There are a couple of “gotchas” about these tags to be aware of:

Firstly I stated above that the reformat work was the first thing done after the last releases for 1.0.1, 1.0.0 and 0.9.8 (i.e. releases 1.0.1l, 1.0.0q and 0.9.8ze). Technically speaking for 1.0.1 the 1.0.1l release is tagged at this point:

commit b83ceba7d51e846cf24433aa3c417bfd62b3ffa5
Author:     Matt Caswell <matt@openssl.org>
AuthorDate: Thu Jan 15 14:45:15 2015 +0000

    Prepare for 1.0.1l release

    Reviewed-by: Stephen Henson <steve@openssl.org>

However, the release process uses an automated script which generated one additional commit after the above:

commit 3a9a0321638ae13957b66baae6d4955597fc128d
Author:     Matt Caswell <matt@openssl.org>
AuthorDate: Thu Jan 15 14:49:54 2015 +0000

    Prepare for 1.0.1m-dev

    Reviewed-by: Stephen Henson <steve@openssl.org>

So that means OpenSSL_1_0_1-pre-reformat is not the same tag as OpenSSL_1_0_1l…it’s the commit after it, i.e. there is one commit after the release but before the reformat. The situation is similar for 1.0.0 and 0.9.8.

Another gotcha is that immediately after the change freeze was lifted “business as usual” began and normal commits started to be applied. Fairly early on it was noticed that, due to some reformatting errors, we were failing to build across all branches on windows. Therefore some additional commits were applied across all branches. These commits are:

0.9.8: commit 6844c129682c525af278bac75cb5d0696b85fa10 commit ead95e760c04bb9445793f5f57eec073b507f891

1.0.0: commit 192e148154dc02a3d867cc2f45d33eb94436f9a6 commit 1804f782987adbdc51e8dec8b8ddb22cece8b66

1.0.1: commit 925bfca5d347d10f1a2e172be001090ae7ebafc2 commit 90a5adffc7c12135821ee22f9cc482df3a4bc035

1.0.2: commit d3b7cac41b957704932a0cdbc74d4d48ed507cd0 commit fdc3ced983bb3efbb2ff4da52fdaef147f7f7ed2 commit 65d6fdaa21d03a1640d04ba48c0f8047873921e6

master: commit d2a0d72f33e2cd81a5c81b29b05d6fdb2cc67ac2 commit a8fe430a0d1ece596e66f27e09dc63ca19a2e2d6

The tags I have identified above mark the end of the change freeze and therefore do not include the above commits (which occurred later). On the 1.0.1, 1.0.2 and master branches there are some intervening “business as usual” non reformat related commits between the above and the OpenSSL_1_0_1-post-reformat, OpenSSL_1_0_2-post-reformat and OpenSSL_master-reformat tags.

It is not unreasonable to expect that there may be further fixes we need to apply, due to reformatting errors that we have not yet encountered.

Reformatting patches

If you maintain your own patches for OpenSSL releases then you may want to consider reformatting those patches. In the following instructions I will outline how this can be done. It’s a bit fiddly unfortunately, and gets more fiddly if you have a large number of commits to reformat.

I assume a few things. If any are false then you will have to adjust the instructions accordingly:

  • You are using a branch in git to maintain your patches, and that you want to rebase your branch commits against the latest state of master in git
  • These instructions are for reformatting one commit only. If you have multiple commits then you will have to repeat for each one.
  • If your patches are against a released version (as opposed to master) then you will have to adjust accordingly.
  • You are on a Linux/Unix system

Assuming you are on the branch with your commits that need reformatting, then first of all you need to rebase against the master-pre-auto-reformat tag:

git fetch
git rebase master-pre-auto-reformat

Fix any conflicts that may occur in the normal way. Get a list of all the commit hashes that need to be reformatted:

git log --format=oneline master-pre-auto-reformat..HEAD

Create some temporary branches to work on:

git checkout master-post-auto-reformat
git checkout -b tmp-dest
git checkout master-pre-auto-reformat
git checkout -b tmp-src

Now you are on tmp-src you need to cherry-pick the first commit onto it (replace firsthash with the hash value of the first commit to be reformatted):

git cherry-pick firsthash

You will want to preserve the author and commit message details, so store them away:

git log -1 --format="%an <%ae>" >/tmp/tmp.author
git log -1 --format="%B" >/tmp/tmp.commitmsg

Now reformat the code (you will need GNU indent installed for this). This needs to be done twice (for some reason indent takes a little while to settle down):

util/openssl-format-source -v -c .
util/openssl-format-source -v -c .

Now create a patch file for your first commit:

git diff tmp-dest --stdout >/tmp/tmp.patch

Reset the source branch to remove the formatting:

git reset --hard

Move to the target branch:

git checkout tmp-dest

Apply the patch:

patch -p1 </tmp/tmp.patch

Commit the patch:

git add .
git commit --file=/tmp/tmp.commitmsg --author="`cat /tmp/tmp.author`"

If you have other commits to reformat then checkout the tmp-src branch again:

git checkout tmp-src

Now repeat the process - go back to the cherry-pick instruction above and repeat for the second commit.

Finally, once all commits have been done:

git checkout tmp-dest
git rebase master

Fix any conflicts. You’re done.

Conclusion

The reformat work is now completed, aside from any support issues which occur as a result. We are very pleased with the results that we have achieved. I hope that this will make the OpenSSL source code much easier to work with in the future.