Compacting VirtualBox disks

For my development tasks I often use VirtualBox, mostly for testing purposes. Sadly though, installing multiple operating systems consumes quite a lot of disk space, so I need some way to keep the virtual disks small.

Because you can use auto expanding virtual disks you might think they will automatically shrink if you reduce the amount of content in there, but this is not true: if you have a virtual disk of say 40GB and 30GB in use, your VDI file size will be about 30GB, but if you delete 20GB of data from within your virtual disk the VDI file size will remain 30GB.

That space can be reclaimed though by using a command line tool called vboxmanage which provides a command called modifyhd which in turn has a –compact option.

In other words you can execute something like

vboxmanage modifyhd your/virtual/hard/disk/file.vdi --compact

and shrink your VDI file to its real content size… if you managed to wipe your virtual disk free space with zeroes!

Do not underestimate the last statement: the vboxmanage tool will eliminate from your VDI file empty space only, but usually when you delete a file the space it was occupying is not emptied, just unlinked!

Luckily for us there are tools around to help us on this task, which has to be executed from within the virtualized machine (aka the guest machine). These tools though depend on the virtual OS you are running.


Open a terminal window and run the following command, then wait and ignore the warnings you get.

diskutil secureErase freespace 0 /


Defrag your disk, download SDelete from the Windows Technet web site and run it with the -c option.

sdelete.exe -c C:


Run this command and wait for it’s completion. Note though this will expand your virtual disk to it’s maximum capacity to allow you to shrink it.

cat /dev/zero > /tmp/junk & rm /tmp/junk


Subversion: protocol and beyond

Subversion is a wonderful open source VCS (Version Control System), but it’s rarely used at its best.

Just taking in consideration the server configuration one of the nice features it provides is the ability to encapsulate the proprietary svn protocol inside ssh to add security and a lot more.

Such configuration publishes URLs of svn+ssh:// type, is quite simple to set up and allows to:

  • use LDAP or RADIUS for user accounts storage through PAM SSH modules
  • adopt fine grained access control through file system permissions;
  • secure the communication channel;
  • quicker and easier set up compared to the widely adopted http/https protocols which require an Apache Web Server integration.

Setting up a decent svn+ssh configuration is a very easy task if you follow a few simple rules that guarantee a flexible and user friendly set up.

Client SSH host keys

The SVN client is unable to cache SSH credentials because it’s actually the SSH client that is in charge of performing authentication, which means that if we want to avoid the constant prompt of credentials we have to generate authorization keys.

Let’s start with opening (or creating, if it doesn’t exists yet) the svn_user_home/.ssh/authorized_keys file (where svn_user_home refers to the home folder of the user running the svnserve server) and let’s add one row per each client in the following format:

command="", TYPE KEY

You obviously have to replace xxx with the username you want to grant access while TYPE and KEY refers to the type and has of the generated key associated to that user. Wait a minute: we haven’t generated any key yet! Let’s do it now by following the instructions in this mini guide.

Obviously in the authorized_keys file we’ll have to put the public key only while the private one has to be given to the user that is going to connect to Subversion. To avoid the hassle and to improve security it’s better if those keys are generated by the users and they provide their public ones to the Subversion administrator.

Now you only need to create a script which will set the most appropriate umask for SVN (I suggest 002).

This will solve most problems related to authentication so we are left with authorizations: I found that Unix groups work quite well and my choice is to use at least a group per each project hosted on my repository so that I can grant permissions to developers on a per project basis by adding users to groups (this is the reason for the umask setting).

If you need a more fine grained control you can imagine to create additional groups, like:

  • prj-devels (read/write on trunk and branches)
  • prj-testers (read only on everything)
  • prj-admins (read/write on everything)