>> How to re-build a gosmore .pak file on Amazon EC2

Published by on 2012-02-03 11:18:17

(in English because this will mostly be interesting to an international readership 8))

A while ago, I decided that I needed an up-to-date gosmore routing description file for my own little routing server. As it takes some time to build, and needs a lot of memory and processing power, I decided to just let "the cloud" do the dirty work for me.

As I'm confronted with Amazons cloud services daily at work, I chose Amazons EC2/S3 services. They can be quite cost-efficient if you know what to do, but have a tendency to get uber-expensive if you don't. From my experiences, it should be possible to rebuild a worldwide .pak file for well under $5.

I'm not going to explain all details on how EC2 and S3 work and will use several terms that might be "puzzling" if you never used Amazons services before - take care to dig them up in the Amazon FAQs or documentation. I've taken care to put the real crazy stuff in italic.

Continue below for the named crazy stuff ;)

Choosing the right machine type
This was the most difficult thing to do. First of all, you need 64bit and high memory support, so the so-called "small" computing instances are ruled out - they are 32-bit only. Going for a micro-instance (which strangely enough is 64 bit as opposed to the bigger "small" ones!) is not a good idea, as they only run on EBS block devices (expensive and slow), and only sport 600MB memory. You can create a swapfile, but recreating the .pak file takes several days on these machines and I/O is more expensive than what you save on money (micro instances are close to being for free).

As gosmore only makes use of one single core, it does not make sense to use a very big machine to save on computing time; in a nutshell, what I discovered to be perfect is a "m1.large" instance, which sports 7.5GB of main memory and four ECU units in two cores. If you hunt down a spot instance in a cheap region, you might get it for something as low as $0.14 per hour.

AMI/operating system
I needed the machines volatile instance storage mounted properly, and for some reason the official Amazon unix AMIs did not do that correctly, and they all NEED a EBS block storage as a boot volume, which would have been a waste in my case. I looked around for something that is purely S3-backed, and found this one: ami-a33b06d7 to be quite usable; it originates from Amazon, so I considered it to be quite safe.

Storage
As I expected from the beginning that I'd have to try out different setups, images and machines, I decided to create a 50GB block storage (EBS volume, which is non-volatile and can be mounted into and unmounted from a virtual computing instance) to store the original Open Streetmap planet file, the gosmore compiled source code and so on. I've deleted it by now, and pushed all necessary files into S3. If you follow this guide, it should not be necessary to use any block storage at all, which saves money. Downloading a 10GB OSM planet file takes some time though, so if you want to be safer, store it on a mounted EBS volume that you can re-use until you are "done" with everything. S3 is an option, but copying from S3 takes a lot longer than mounting an EBS volume.

Gosmore
Use the latest version from the code repository, as opposed to the latest release. Nic Roets of Gosmore applied some minor fixes to the code/settings during my try+error phase that have been important to make building and running gosmore for rebuilding big .pak files work correctly.

Setting up the EC2 instance
First of all, create a EC2 instance in your preferred region, possibly a spot instance to save some money; use the AMI named above. After you gained shell access, you need to install the following packages to make gosmore happy:

# prepare the instance (missing packages)
sudo yum install screen libxml2-devel gtk2-devel gcc-c++ make subversion libcurl-devel gpsd-devel

Next, you need to grab gosmore from SVN. Install it on the ephemermal block device - it has enough space to store all temp files!


cd /media/ephemeral0/
svn checkout http://svn.openstreetmap.org/applications/rendering/gosmore/
cd gosmore
./configure

#build gosmore for ONLY routing and headless usage:
make CFLAGS='-O2 -DRES_DIR=\"/usr/share/gosmore/\" -DHEADLESS -DONLY_ROUTING'

Download the .planet file you want to build into a .pak file, for example from any OSM mirror. I used a file covering whole Europe, but the world should also work out fine.

The following commands expect the file europe.osm.bz2 to exist in the same directory as gosmore.

#rebuild gosmore .pak file
screen
bzcat europe.osm.bz2 | ./gosmore rebuild

After 4-5 hours, the instance should be done with building the files; for Europe, 7.5GB of memory would be sufficient without creating a swapfile. I ended up only using a Micro instance, which took roughly 5 days due to the fact that they have dramatic CPU and bandwith limitations, and it needed to swap heavily.

In the end, I used this tool to push the final file (and my gosmore build directory for later reference) into S3:
http://www.beaconhill.com/solutions/opensource/s3cp.html

Using EC2 as a routing backend
Of course you can also run gosmore on EC2; considering the costs, it might be cheaper to go for any other web hosting service, but if you would need dramatic load balancing and high-load scaling, Amazon has what you need.

Straightforward setup guide for an m1.large instance:

# install additional packages
sudo yum install screen libxml2-devel gtk2-devel gcc-c++ make subversion libcurl-devel gpsd-devel

# install gosmore
cd /media/ephemeral0/
svn checkout http://svn.openstreetmap.org/applications/rendering/gosmore/
cd gosmore
./configure

#build gosmore for ONLY routing and headless usage:
make CFLAGS='-O2 -DRES_DIR=\"/usr/share/gosmore/\" -DHEADLESS -DONLY_ROUTING'

Now place a .pak file in the gosmore directory, and it should be usable fine. Note that this setup will also work to build a new file.

Some additional hints

Easy setup for your fstab to mount the volume:
sudo bash
echo "" >> /etc/fstab
echo "/dev/sdf /mnt/data ext3 defaults 0 0" >> /etc/fstab
mount /mnt/data
exit

In case you need a block storage, create and mount it, and prepare it like this:
sudo mkfs.ext3 /dev/sdf

In case you want to gamble around with micro instances, you need a swapfile. It can be created as follows (8GB example) (/mnt/data would be the volume to store it):
sudo dd if=/dev/zero of=/mnt/data/swapfile bs=1M count=8048
sudo chmod 600 /mnt/data/swapfile
sudo mkswap /mnt/data/swapfile
sudo swapon /mnt/data/swapfile

I hope this works out for you, questions can be asked in the comments anytime.

Categorised as: !bloated, Island, OSM, Southwest USA

4 Comments

  • Comment from Nic Roets on 2012-02-04 01:26

    A few notes:
    1. Increase MAX_NODES in libgosm.cpp so that MAX_NODES * NGROUPS is larger than the largest node ID.
    2. Node IDs larger or equal to 2^31 is untested. That point will be reached within a few months.
    3. Load balancing systems will typically look at CPU idle time, system load and / or response time. These measures are really not applicable to Gosmore routing where overloading manifests itself as running out of memory. Gosmore includes that will allow a machine to fail gracefully by returning incomplete routes, but this code is not well tested because it is actually quite rare for a properly configured 16GB machine to become overloaded.

  • Comment from RouteXL on 2012-02-07 15:55

    Thanks for the great post. I am running Gosmore on dedicated two servers as routing backend, but to using Amazon would be a great next step. This post will help a lot when doing so.

  • Comment from Jens Gassmann on 2013-07-22 16:31

    Is there any restriction to share the pak-Files? Maybe we could share our pak-Files with other people? Planet, Europe or Germany maybe very common

  • Comment from lgw on 2013-07-22 18:41

    I dont think there are legal restrictions (as anybody could create the same files easily, so it just doesn't make sense) - but there is the technical restriction that the machines running the files need to be binary compatible (same endian format, and I think some restrictions regarding byte alignment and bit count for data formats like integer). Also 32 and 64 bit systems seem to need different files.

    It would be great if somebody would provide a central source (on S3 or anywhere else) for current .pak files for EU and world, but I don't see a community movement to crowd source the cost right now. May be if mor services where to use gosmore as a backend?