Failing to build VirtualBox on Windows

In 2017, it’s still way too hard to build software on Windows. I wanted to see how difficult it’d be to make a few changes to VirtualBox, for which I’d have to rebuild it from source. The version of VirtualBox you download as a binary from virtualbox.org isn’t the same one you get if you try to build from source, and I think this is where the problems start.

The first challenge is getting the source. I gave up after an hour of waiting for the source to download from http://www.virtualbox.org/svn/vbox/trunk. As I was going to be working from Git repo, I used BitBucket’s Subversion import to create a repo at https://bitbucket.org/voltagex/virtualbox-mirror.

The first thing you’ll notice about the VirtualBox build system is that the configure script for Windows is in VBScript, which is an unusual choice but it technically means you could get up and running on most Windows installs without too much effort. Unfortunately, this is where the easy part ends. I very quickly worked out that I wanted to use AppVeyor, mainly because their VM images are very well configured. I did try building on my own Server 2016 VM, but I couldn’t get the right version of the Windows SDK / DDK installed, or at least not in the paths that the VirtualBox build system expected them.

This leads me to the next reason why building VirtualBox on Windows absolutely sucks. It requires both an old version of MinGW and the Visual Studio 2010 compiler. Visual Studio 2010 was released on 12 April 2010, while MinGW 4.9.3 was released sometime in 2015, I think. SourceForge’s file browser is very hard to navigate.

I suspect choosing a 64 bit toolchain was my first mistake here, but we’ll get to that later.

I created an appveyor branch to work from, mainly so I could easily diff changes that I made to configure.vbs. When I started working on this, each build took about 20 minutes on AppVeyor, depending on where the build broke. You can see all the builds at https://ci.appveyor.com/project/voltagex/virtualbox-mirror/history, along with my terrible commit messages. Future Adam – please write better commit / build messages so that writing these blog posts will be easier. If anyone has any suggestions for technical note-taking, I’m all ears – but I think OneNote will be hard to beat.

AppVeyor’s build history shows 63-odd builds. I know I did a few more on AWS and various virtual machines, but I gave those up quickly – I think most of the issues were around not being able to find WinDDK even though it was installed.

Apparently it took me 4 builds even to get a sensible error message out of the configure script

Checking for MinGW32 GCC v3.3.x + Binutils + Runtime + W32API...
warning: Can't locate a suitable MinGW32 installation, ignoring since we're targeting AMD64 and won't need it.
Checking for MinGW-w64 GCC (unprefixed)...
error: Can't locate a suitable MinGW-w64 installation. Try specify the path with the --with-MinGW-w64= argument. If still no luck, consult the configure.log and the build requirements.

The hint there is that the error complains about a suitable MinGW installation. At that point it’s actually found some of the required files, but buried in the configure.log is the real reason it’s failing

trying: strPathMinGWw64=C:\tools\mingw64
Testing 'C:\tools\mingw64': lib64/libgcc_s.a not found
trying: strPathMinGWw64=C:\tools\mingw64
Testing 'C:\tools\mingw64': lib64/libgcc_s.a not found

I can’t remember whether at this point I realised that the version of MinGW I had installed was too new, but it definitely wasn’t the only problem. Skipping ahead a few builds shows I was copying files into the lib64 directory with xcopy, betting that the build system was looking in an old or obsolete path.

I don’t know why the configure script hides most of the useful information in configure.log – for example MinGW-w64 version '5.3.0' is not supported (or configure.vbs failed to parse it correctly).

Anyway, after dropping the MinGW version back I was able to progress past this point. I still needed to copy files from lib to lib64.

For the next little while it was as simple as adding dependencies in and rebuilding. The builds only took 5 or so minutes to fail so it wasn’t too bad, although it was very frustrating to wait and then find out I’d messed up a 7z command line, like in https://ci.appveyor.com/project/voltagex/virtualbox-mirror/build/1.0.15.

Things get interesting when your software depends on OpenSSL on Windows. I’ve never actually built it from source myself and I’m afraid of the amount of whisky I’ll need when I eventually try. Security implications be damned, by 15 builds in I’d ‘cheated’ and downloaded some pre-built libraries from https://www.npcglib.org/~stathis/blog/precompiled-openssl/ and included them in the build. Unfortunately this involved renaming the dlls to the ‘old’ OpenSSL names – apparently sometime in 2016 the OpenSSL project changed the names, breaking decades  of assumptions.

curl is another story. libcurl – despite being one of the most commonly used libraries anywhere (it’s probably in your phone, your car and maybe even your lightbulb), there were no precompiled binaries for Windows that I could find.

A slight diversion to work out how to build curl, then I guess. This means (in theory) linking to OpenSSL again, too. Luckily someone else had done it for me and I could lean on https://github.com/blackrosezy/build-libcurl-windows. This has some pretty neat batch scripting in it and soon I had https://github.com/baxterworks-build/build-libcurl-appveyor for myself. I should have really learned how to use GitHub Releases or BinTray but I think at this point this silly project had consumed enough of my evening and I threw a zip up on my NeoCities site and carried on.

20 builds later, I’d passed the trials of configure.vbs, and in theory I was ready to build VirtualBox.

Execute env.bat once before you start to build VBox:
env.bat
kmk

This in itself proved a challenge because I couldn’t work out how to set the path to kmk so that AppVeyor’s build system would find it. I believe that AppVeyor is running everything in a single PowerShell instance by default and every command runs in a ‘child’ process so I couldn’t work out how to get the variables where they needed to be.

Sidenote: kmk is part of kBuild, which is not Kbuild, the Linux Kernel build system. Have a look at https://trac.netlabs.org/kbuild/wiki/kBuild and then just fucking use cmake like everyone else. Anyone building on Windows will thank you for it.

A couple (9) more failed builds later I’d hacked the batch files enough to start actually building VirtualBox! Is almost 3 days to set up a build system some kind of record? I’m sure even a new Microsoft employee could kick off a Windows build faster than this. Come on.

I didn’t expect it to work first time but I definitely didn’t expect this error:

build.bat
Config.kmk:2773: C:/projects/virtualbox-mirror/out/win.amd64/release/DynamicConfig.kmk: No such file or directory
Config.kmk:3503: *** You need to enable code signing for a hardened windows build to work.. Stop.
Command exited with code 2
I think on Linux there’s a `–disable-hardening` option for configure, but this didn’t exist for the Windows build – there’s another good reason to use a single build system for all builds. Looking at this commit you’d think it’s as easy as flipping a switch, but apparently not. I could not disable the code signing or hardening options no matter what I tried. The real solution, of course, is to disable the error message itself.
Onwards and… upwards? The next error to sort out was a missing python.exe, which turned out to be pretty boring – someone had coded it to only ever look in /bin/, which won’t help you much on Windows. With that fixed, it should be smooth sailing, right? Yeah, nah.
35 builds in, I got completely stuck. For some reason openssl.h wasn’t being found, even though it wasn’t a problem before. Luckily, AppVeyor let you RDP into a build machine during the build itself to inspect the state of the machine. This is pretty amazing – and I’m still on a free account!

By adding iex ((new-object net.webclient).DownloadString('https://raw.githubusercontent.com/appveyor/ci/master/scripts/enable-rdp.ps1')) into AppVeyor you’ll get an IP and credentials printed out in the build log. On a free account, I think the build logs are public so be careful with this.

From memory I was trying to use SysInternals Procmon (which every dev should learn to use) to work out where kBuild was expecting to find openssl.h. This turned out to be much too slow and created huge trace log files – 600mb+. I’m very glad I’m no longer on DSL based broadband. I think I tried a few different filters before giving up and looking for a way to run procmon non-interactively. Luckily that’s already been thought of, if you run
procmon /Quiet /Minimized /BackingFile virtualbox-build.pml you can start procmon automagically. I did have to ask for help for this one as it seemed to ‘block’ the build. AppVeyor support is awesome and came through with a fix, even opening a pull request on my repo (tl;dr start commands in PowerShell jobs and they won’t interrupt anything – similar to a bash subshell).

After fiddling with paths and learning about all the different hooks AppVeyor has (I needed to be able to retreive files from the build but the artifacts section normally isn’t run on a ‘failed’ build), I managed to get 628mb of 7zipped PML files off the build host. If you ever need to debug to this level, I highly recommend doing something like this. This was build number 44, clocking in at over 56 minutes, which is a bit of a problem as AppVeyor’s free plan has a 60 minute limit. Over the next little while I tried different paths for OpenSSL and talked to one of the VirtualBox developers on IRC.

This developer told me that internally (for the ‘commercial’ build of VirtualBox) they use a different build process which unfortunately couldn’t be shared due to licencing concerns. Sigh. At least a few of the things in the build system made a bit more sense (it looks like an internal checkout of the source contains most of the build tools).

It seems like the include path just got too long and the build system doesn’t see all of it, or something along those lines – the fix was to move openssl.h to virtualbox\include. After an evening and a half (?) I was off and moving again.

You should definitely include any references you’ve used in troubleshooting in code comments and commit messages. I don’t think I would have got much further without https://forums.virtualbox.org/viewtopic.php?f=10&t=61510, which described the exact issue I was having at this point and a solution – copying a file with a specific name. I’m not sure whether I caused more issues here by mixing 32 and 64 bit libraries, but the build continued.

By build number 60, with a commit message of ‘Sigh’, I’d definitely hit the 60 minute limit of the free AppVeyor plan. Suprisingly, AppVeyor staff increased my time limit to 90 minutes when I asked. Thanks, Ilya!

At this point I was tracking down the cause of [00:16:37] kmk_builtin_redirect: _spawnvpe(/bin/nm.exe) failed: No such file or directory, which I probably should have recognised from the Python failure earlier – but which nm.exe did it want? Visual Studio? MinGW? 32 or 64 bit? It took multiple gigs of ProcMon logs to work this out, and I’m still not sure I chose the right one.

It was at build 62 I find myself completely defeated – VirtualBox appears to build some SSL certificates into a binary for some unknown reason and whatever was generating byte arrays generated bad code.

[00:19:41] TrustAnchorsAndCerts.cpp
[00:19:42] C:\projects\virtualbox-mirror\out\win.amd64\release\obj\SUPR3\TrustAnchorsAndCerts.cpp(496) : error C2070: 'const unsigned char []': illegal sizeof operand
[00:19:42] C:\projects\virtualbox-mirror\out\win.amd64\release\obj\SUPR3\TrustAnchorsAndCerts.cpp(540) : error C2070: 'const unsigned char []': illegal sizeof operand
[00:19:42] kmk: *** [C:/projects/virtualbox-mirror/out/win.amd64/release/obj/SUPR3/gen/TrustAnchorsAndCerts.obj] Error 2 (0x2)

I don’t know enough about C++ to fix this, and I don’t know enough about VirtualBox to fix whatever’s generating this. Maybe another day, with another bottle of Monkey Shoulder.

I only wanted to play with a modified VirtualBox Web Service, why is it this hard?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s