Handling extant resources in Terraform

Terraform is a Hashicorp tool which embraces the Infrastructure as Code model to manage a variety of platforms and services in today’s modern, cloud-based Internet.  It’s still in development, but it already provides a wealth of useful functionality, notably with regards to Amazon and Digital Ocean interactions.  The one thing it doesn’t do, however, is manage pre-existing infrastructure very well.  In this blog post we’ll explore a way to integrate extant infra into a basic Terraform instance.

Note that this post is current as of Terraform v0.3.6.  Hashicorp has hinted that future versions of Terraform will handle this problem in a more graceful way, so be sure to check those changelogs regularly. 🙂


A full example and walk-through will follow; however, for those familiar with Terraform and just looking for the tl;dr, I got you covered.

  • Declare a new, temporary resource in your Terraform plan that is nearly identical to the extant resource.
  • Apply the plan, thus instantiating the temporary “twinned” resource and building a state file.
  • Alter the appropriate id fields to be the same as the extant resource in both the state and config files.
  • Perform a refresh which will populate the state file with the correct data for the declared extant resource.
  • Remove the temporary resource from AWS manually.
  • Voilà.

faster and more dangerous, please.

Walking through the process and meticulously checking every step? Ain’t nobody got time for that!

  • Edit the state file and insert the resource directly – it’s just JSON, after all.


In the examples below, the notation [...] is used to indicate truncated output or data.

Also note that the AWS cli tool is assumed to be configured and functional.


The extant resource in this case is an S3 bucket called phrawzty-tftest-1422290325. This resource is unknown to Terraform.

$ aws s3 ls | grep tftest
2015-01-26 17:39:07 phrawzty-tftest-1422290325

Declare the temporary twin in the Terraform config:

resource "aws_s3_bucket" "phrawzty-tftest" {
    bucket = "phrawzty-tftest-1422353583"

Verify and prepare the plan:

$ terraform plan -out=terratest.plan
Path: terratest.plan

+ aws_s3_bucket.phrawzty-tftest
    acl:    "" => "private"
    bucket: "" => "phrawzty-tftest-1422353583"

Apply the plan (this will create the twin):

$ terraform apply ./terratest.plan
aws_s3_bucket.phrawzty-tftest: Creation complete

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
State path: terraform.tfstate

Verify that the both the extant and temporary resources exist:

$ aws s3 ls | grep phrawzty-tftest
2015-01-26 17:39:07 phrawzty-tftest-1422290325
2015-01-27 11:14:09 phrawzty-tftest-1422353583

Verify that Terraform is aware of the temporary resource:

$ terraform show
  id = phrawzty-tftest-1422353583
  acl = private
  bucket = phrawzty-tftest-1422353583

Alter the config file:

  • Insert the name of the extant resource in place of the temporary.
  • Strictly speaking this is not necessary, but it helps to keep things tidy.
resource "aws_s3_bucket" "phrawzty-tftest" {
    bucket = "phrawzty-tftest-1422290325"

Alter the state file:

  • Insert the name (id) of the extant resource in place of the temporary.
            "resources": {
                "aws_s3_bucket.phrawzty-tftest": {
                    "type": "aws_s3_bucket",
                    "primary": {
                        "id": "phrawzty-tftest-1422290325",
                        "attributes": {
                            "acl": "private",
                            "bucket": "phrawzty-tftest-1422290325",
                            "id": "phrawzty-tftest-1422290325"

Refresh the Terraform state (note the ID):

$ terraform refresh
aws_s3_bucket.phrawzty-tftest: Refreshing state... (ID: phrawzty-tftest-1422290325)

Verify that Terraform is satisfied with the state:

terraform plan
Refreshing Terraform state prior to plan...

aws_s3_bucket.phrawzty-tftest: Refreshing state... (ID: phrawzty-tftest-1422290325)

No changes. Infrastructure is up-to-date. This means that Terraform
could not detect any differences between your configuration and
the real physical resources that exist. As a result, Terraform
doesn't need to do anything.

Remove the temporary resource:

$ aws s3 rb s3://phrawzty-tftest-1422353583/
remove_bucket: s3://phrawzty-tftest-1422353583/

S3, faster.

For the sake of this example, the state file already contains an S3 resource called phrawzty-tftest-blah.

Add the “extant” resource directly to the state file.

            "resources": {
                "aws_s3_bucket.phrawzty-tftest": {
                    "type": "aws_s3_bucket",
                    "primary": {
                        "id": "phrawzty-tftest-1422290325",
                        "attributes": {
                            "acl": "private",
                            "bucket": "phrawzty-tftest-1422290325",
                            "id": "phrawzty-tftest-1422290325"


$ terraform refresh
aws_s3_bucket.phrawzty-tftest: Refreshing state... (ID: phrawzty-tftest-1422290325)
aws_s3_bucket.phrawzty-tftest-blah: Refreshing state... (ID: phrawzty-tftest-blah)


$ terraform show
  id = phrawzty-tftest-1422290325
  acl = private
  bucket = phrawzty-tftest-1422290325
  id = phrawzty-tftest-blah
  acl = private
  bucket = phrawzty-tftest-blah

That’s that.

HAProxy Puppet module (phrawzty remix)

As part of a big Logstash project at Mozilla (more on that to come), I was looking for an HAProxy module for Puppet, stumbling across the official Puppetlabs module in the process.  I’m told that this module works fairly well, with the caveat that it sometimes outputs poorly-formatted configuration files (due to a manifestly buggy implementation of concat).  Furthermore, the module more or less requires storeconfigs, which we do not use in our primary Puppet system.

Long story short, while I never ended up using HAProxy as part of the project, I did remix the official module to solve both of the aforementioned issues.  From the README :

This module is based on Puppetlabs’ official HAProxy module; however, it has been “remixed” for use at Mozilla. There are two major areas where the original module has been changed :

  • Storeconfigs, while present, are no longer required.
  • The “listen” stanza format has been abandoned in favour of a frontend / backend style configuration.

A very simple configuration to proxy unrelated Redis nodes :

  class { 'haproxy': }

  haproxy::frontend { 'in_redis':
    ipaddress       => $::ipaddress,
    ports           => '6379',
    default_backend => 'out_redis',
    options         => { 'balance' => 'roundrobin' }

  haproxy::backend { 'out_redis':
    listening_service => 'redis',
    server_names      => ['node01', 'node02'],
    ipaddresses       => ['node01.redis.server.foo', 'node02.redis.server.foo'],
    ports             => '6379',
    options           => 'check'

If that sounds interesting to you, the module is available on my puppetlabs-haproxy repo on Github. Pull requests welcome !

How to use Puppet like an Adult

Hello friends,

Recently, Ben Kero (a fellow Mozillian) and I were invited to present a talk at Linux Conf Australia.  To say that we were excited about presenting at one of the best Libre / Open Source conferences in the world is an understatement.  We knew that we’d have to bring our A-game, and in all modesty, I like to think that we did.  If you were there in person, I’d like to personally thank you for coming out, and if you couldn’t make it, that’s ok – the organisers have made many videos from the 2013 LCA available online, including ours, entitled “How to use Puppet like an Adult“.

We cover a variety of topics, including parametrisation, how to select good pre-built modules, and how you can build eco-systems around Puppet itself.  Please feel free to drop us a line, either on Twitter or here on the blog.  Thanks !

quickly generate an encrypted password

Hi everybody !  Here’s a quick method for generating encrypted passwords that are suitable for things like /etc/passwd .  I realise that this isn’t terribly complex, but honestly, I always forget how to do this until I actually need to do it – so here’s a reminder for all of us. 🙂


if [ "x$1" == 'x' ]; then
echo "USAGE: $0 'password'"
exit 1

# Get an md5sum of the password string; this is used for the SHA seed.
md5=$( echo $1 | md5sum )

# Calculate the SHA hash of the password string using the extracted seed.
mkpasswd -m SHA-512 "$1" "$extract"
exit $?

Nagios plugin to parse JSON from an HTTP response

Update 2015-10-07: This plugin has evolved – please check the latest README for up to date details.


Hello all !  I wrote a plugin for Nagios that will parse JSON from an HTTP response.  If that sounds interesting to you, feel free to check out my check_http_json repo on Github.  The plugin has been tested with Ruby 1.8.7 and 1.9.3.  Pull requests welcome !

Usage: ./check_http_json.rb -u  -e  -w  -c 
-h, --help                       Help info.
-v, --verbose                    Additional human output.
-u, --uri URI                    Target URI. Incompatible with -f.
    --user USERNAME              HTTP basic authentication username.
    --pass PASSWORD              HTTP basic authentication password.
-f, --file PATH                  Target file. Incompatible with -u.
-e, --element ELEMENT            Desired element (ex. foo=>bar=>ish is foo.bar.ish).
-E, --element_regex REGEX        Desired element expressed as regular expression.
-d, --delimiter CHARACTER        Element delimiter (default is period).
-w, --warn VALUE                 Warning threshold (integer).
-c, --crit VALUE                 Critical threshold (integer).
-r, --result STRING              Expected string result. No need for -w or -c.
-R, --result_regex REGEX         Expected string result expressed as regular expression. No need for -w or -c.
-W, --result_warn STRING         Warning if element is [string]. -C is required.
-C, --result_crit STRING         Critical if element is [string]. -W is required.
-t, --timeout SECONDS            Wait before HTTP timeout.

The --warn and --crit arguments conform to the Nagios threshold format guidelines.

If a simple result of either string or regular expression (-r or -R) is specified :

  • A match is OK and anything else is CRIT.
  • The warn / crit thresholds will be ignored.

If the warn and crit results (-W and -C) are specified :

  • A match is WARN or CRIT and anything else is OK.
  • The warn / crit thresholds will be ignored.

Note that (-r or -R) and (-W and -C) are mutually exclusive.

Note also that the response must be pure JSON. Bad things happen if this isn’t the case.

How you choose to implement the plugin is, of course, up to you.  Here’s one suggestion:

# check json from http
define command{
 command_name    check_http_json-string
 command_line    /etc/nagios3/plugins/check_http_json.rb -u 'http://$HOSTNAME$:$ARG1$/$ARG2$' -e '$ARG3$' -r '$ARG4$'
define command{
 command_name    check_http_json-int
 command_line    /etc/nagios3/plugins/check_http_json.rb -u 'http://$HOSTNAME$:$ARG1$/$ARG2$' -e '$ARG3$' -w '$ARG4$' -c '$ARG5$'

# make use of http json check
define service{
 service_description     elasticsearch-cluster-status
 check_command           check_http_json-string!9200!_cluster/health!status!green
define service{
 service_description     elasticsearch-cluster-nodes
 check_command           check_http_json-int!9200!_cluster/health!number_of_nodes!4:!3:

RabbitMQ plugin for Collectd

Hello all,

I wrote a rudimentary RabbitMQ plugin for Collectd.  If that sounds interesting to you, feel free to take a look at my GitHub.  The plugin itself is written in Python and makes use of the Python plugin for Collectd.

It will accept four options from the Collectd plugin configuration :

Locations of binaries :

RmqcBin = /usr/sbin/rabbitmqctl
PmapBin = /usr/bin/pmap
PidofBin = /bin/pidof

Logging :

Verbose = false

It will attempt to gather the following information :

From « rabbitmqctl list_queues » :


From « pmap » of « beam.smp » :

memory mapped
memory writeable/private (used)
memory shared

Props to Garret Heaton for inspiration and conceptual guidance from his « redis-collectd-plugin ».

CPAN RPMs in RHEL / CentOS : generation, conflict, and solutions

Hello all !  Today we’re going to take a look at a somewhat obscure problem that – once encountered – can cause nothing but headaches for a system administrator.  The problem relates to conflicts in CPAN RPM packages, and what can be done to work around the issue.  If you’ve made it this far, i’m going to assume a couple of things : you’re comfortable with RPMs and repositories, have worked with a .spec file before, and you know what Perl modules are.  Good ?  Ok, let’s go.

Edit : About a week after i posted this article, the pastebin i uploaded the examples to disappeared.  Maybe it will come back – i don’t know – but if not, sorry for the broken links…

CPAN is an enormous collection of Perl modules.  If you’ve ever written a Perl script, there’s a good chance you’ve used a module that – at one point or another – came from this archive.  One of the really neat features of CPAN is the interactive manner in which modules can be downloaded and installed from the archive using Perl right from the command line (frankly, if you’re reading this post, there’s a good chance you’ve used this feature, too).  This is a fairly common way to install new modules and add functionality to your system, especially if you’re coding for local use (i.e. on your personal box).

It’s useful, but it’s not perfect, and one of the key areas where it starts to fail is scalability : if you’ve got a bunch of machines, and you need to SSH into each one to interactively install a CPAN module or two, it’s going to be a hassle.  Likewise, CPAN doesn’t often find its way into the hearts and minds of enterprise Red Hat or CentOS environments, where the official policy is often to install software via RPM only (for support, administration, and sanity reasons, this is often the case).

Luckily, some of the most commonly used CPAN modules exist as RPMs in the default repositories.  Some, but not all (and not even « many ») – for this, there are other repositories available.  Some examples :

That last one – Magnum – is particularly interesting given the subject of our post today.  From their info page :

At Magnum we have a firm rule that all CPAN modules on our machines are installed from RPMs. The Fedora and Centos projects build RPMs for many CPAN modules, but there are always ones missing and the ones that are available often lag behind the most up to date versions.  For that reason, we build a lot of RPMs of CPAN modules. And we don’t want to keep that work to ourselves, so on these pages we make them available for anyone to download.

Their RPMs are generated automagically using a great tool called « cpanspec », which does exactly what you think it does : given a CPAN tarball, it will generate a .spec file suitable for building an installable RPM.  It is available in the standard repositories, and can be installed easily via YUM as normal, so go ahead and do that now.  Ok, example time : say you needed HTML::Laundry, but after a quick peek through your repositories, it becomes readily apparent that an RPM is not available.  Thanks to cpanspec, all is not lost :

[build@host-119 ~]$ wget http://search.cpan.org/CPAN/authors/id/S/ST/STEVECOOK/HTML-Laundry-0.0103.tar.gz
[build@host-119 ~]$ cpanspec --packager "build <build@domain.ext>" HTML-Laundry-0.0103.tar.gz

We just downloaded the tarball right from the CPAN website, and ran cpanspec against it.  The « –packager » argument simple defines the person who’s generating the .spec, and doesn’t necessarily have to be anything accurate.  Go ahead and try it for yourself.  Now take a look at the resulting .spec file (or on the a pastebin here).  As you can see, it fills in all the fields, including the critical (and often tricky-to-determine) « BuildRequires » and « Requires » items.  Frankly, it’s solid gold, and it has made the lives of CentOS / RHEL admins all over the world much easier.

That said, it’s not perfect, and there are times when you might run into problems.  Actually, you may run into two problems in particular.  The first is conflicts over ownership, which arises when multiple RPMs claim to be responsible for the same file (or files, or directories, or features, or whatever).  The second is more nefarious : an RPM that writes files to the system without declaring ownership for them – a condition often referred to as « clobbering ».  The former is irritating, but at least it’s not destructive, unlike the latter, which can cause all manner of headaches.  To illustrate these two problems, let’s take a look at another example (this one being decidedly more real-world than that of Laundry above) : CGI.pm.

The .spec file that is generated from this tarball is functional and correct, and we can build an installable RPM out of it, so at first all appears well.  Again, go ahead and try for yourself – i’ll wait.  You may wish to capture the build output for review – otherwise, check the pastebin.  I’d like to draw your attention to the « Installing » lines.  By trimming the « Installing /var/tmp/perl-CGI.pm.3.49-1-root-root » element from each of those lines, we can see the actual paths and files that this RPM will install to.  Examples :


At first glance this looks perfectly acceptable.  But look what happens when we try to install the resulting RPM (clipped for brevity) :

[root@host-119 build]# rpm -iv /usr/src/redhat/RPMS/noarch/perl-CGI.pm-3.49-1.noarch.rpm
Preparing packages for installation...
file /usr/share/man/man3/CGI.3pm.gz from install of perl-CGI.pm-3.49-1.noarch conflicts with file from package perl-5.8.8-27.el5.x86_64
file /usr/share/man/man3/CGI::Cookie.3pm.gz from install of perl-CGI.pm-3.49-1.noarch conflicts with file from package perl-5.8.8-27.el5.x86_64
file /usr/share/man/man3/CGI::Pretty.3pm.gz from install of perl-CGI.pm-3.49-1.noarch conflicts with file from package perl-5.8.8-27.el5.x86_64

As it turns out, the Perl package that comes with RHEL / CentOS already contains CGI.pm.  This is normal, since it’s so popular, and is included as a convenience.  Thus, RPM – in an attempt to preserve the coherence of the package management system – refuses to install overtop of the existing owned files.  This is a fine illustration of the first of the two problems previously noted : conflicts over ownership.  As i mentioned above, it’s aggravating, but it’s not a bug – it’s a feature, and it’s doing exactly what it’s designed to do.  Irritating, but not ultimately dire.

If you look carefully, though, it’s also an illustration of the second problem.  Note the list of files that are conflicting.  Look back to the list of files that the package contains – notice anything missing from the conflicts list ?  That’s right – the actual module files (*.pm) are not showing conflicts, which means they’d get overwritten without complaint by RPM.  You might be thinking « who cares ? that’s what i want » right now, but trust me, it’s not what you want.  Imagine this CGI package, with this version of CGI.pm gets installed, and then later you upgrade the Perl package – your CGI.pm files will get overwritten by the Perl package, because as far as RPM is concerned, Perl owns those files.  All of a sudden, things break because you had scripts that relied on your particular version, but since you just upgraded Perl, you think (quite naturally) that the problem could be anywhere – where do you even start looking ?

Imagine the headache if there are multiple administrators, multiple servers, multiple data centres, and multiple clients paying multiple dollars.  No fun at all.

So how can we upgrade CGI.pm, using an RPM, without running into these problems ?  As is often the case, the answer is deceptively simple, but not immediately obvious.  Ultimately what we want to accomplish is twofold :

  • Avoid the man conflicts.
  • Ensure that the existing owned module files are not clobbered by our new package.

Concerning the man pages – and i’m going to be perfectly blunt here – the solution is to simply not install them, since, of course, they’re already there.  As for avoiding a clobbering condition, this requires a little bit of investigation into how Perl modules and libraries are stored on an RHEL / CentOS machine.  Consider the following output :

[root@host-119 ~]# ls -d /usr/lib64/perl5/*
/usr/lib64/perl5/5.8.8  /usr/lib64/perl5/site_perl  /usr/lib64/perl5/vendor_perl

What’s it all mean ?  Well, the « 5.8.8 » directory is the default directory as defined by the Perl architecture, and is system and platform-agnostic, which is to say that it’s (supposed to be) the same on every system.  The « vendor_perl » directory contains everything that specific to RHEL / CentOS (the « vendor » of the distribution).  As you may recall from the rpmbuild output above, this is where the RPM wants to install the modules (thus creating the clobbering condition).

There’s a third directory there, promisingly named « site_perl » ; as the name implies, this is where site-specific files are stored, which is to say items that are neither part of the default Perl architecture, nor part of the RHEL / CentOS distribution.  As you’ve no doubt guessed by now, site_perl is where we’re going to put our new modules.

Luckily for us, the only thing that needs to be changed is the .spec file – and we even get a headstart, since cpanspec does most of the heavy lifting for us.  Examining the .spec file once more, we see the following lines of note (again, cut for brevity) :

%{__perl} Makefile.PL INSTALLDIRS=vendor

These indicate that the target installation directory is that of the vendor, which is normally the case, and thus the default setting.  Since we want to install to the site directory, we make the following changes :

%{__perl} Makefile.PL INSTALLDIRS=site

That solves our clobbering problem quite nicely, but what about the man files ?  As i mentioned above, the idea is to simply avoid installing them altogether, but since they’re generated automatically during the build process, how can we exclude them ?  What i’m about to present is a bit of a hack, but it’s absolutely effective, and ultimately quite clean : we delete them after they’ve been generated, and then don’t declare them in the file list.  Some items are already being potentially deleted by default, so let’s go ahead and add our own line into the mix :

find $RPM_BUILD_ROOT -depth -type d -exec rmdir {} 2>/dev/null ;
# destroy manified man, man.
find $RPM_BUILD_ROOT -type f -name '*.3pm' -exec rm -f {} ;

This will look for all of the « manified » man files and just remove from the build tree.  All that’s left now is to remove them from the file list.  This is as simple as deleting (or commenting out) their sole declaration :


Another option is to simply install use the « –excludedocs » argument when installing the RPM.  I opted to remove the docs altogether in order to ensure that the package can be installed without errors by anyone else without needed to know about the argument requirement ahead of time (and to facilitate automated rollouts).

What you’ll end up with is a .spec file that looks like this.  Go ahead and build your RPM – it’ll install without conflicts and without danger.  This is a technique that can be used for other CPAN packages as well, so go ahead and install everything you’ve always wanted.

how to deal with broken time zones during a CentOS 5.3 kickstart

Hello again fair readers !  Today’s quick tip concerns the problem with missing time zones when deploying CentOS 5.3 (and some of the more recent Fedoras) in a kickstart environment.  It’s a known problem, and unfortunately, since the source of the problem (an incomplete time zone data file) lies deep in the heart of the kickstart environment, fixing it directly is a distinct pain in the buttock region.

There is, however, a workaround – and it’s not even that messy !  The first step is to use a region that does exist, such as « Europe/Paris », which will satisfy the installer – then set the time zone to what you actually want after the fact in the « %post » section.  So, in the top section of the kickstart file, we’ll put :

# set temporarily to avoid time zone bug during install
timezone --utc Europe/Paris

The « –utc » switch simply states that the system clock is in UTC, which is pretty standard these days, but ultimately optional.  Next, in the %post section towards the end, we’ll shoe horn our little hack fix into place :

# fix faulty time zone setting
mv /etc/sysconfig/clock /etc/sysconfig/clock.BAD
sed 's@^ZONE="Europe/Paris"@ZONE="Etc/UTC"@' /etc/sysconfig/clock.BAD > /etc/sysconfig/clock

So, what’s going on there ?  Let’s break it down :

  • In the first line, we’re just backing up the original configuration file, to use in the next line…
  • The second line is the important one – this is the actual manipulation which will fix the faulty time zone, setting it to whatever we want.  In this example « Etc/UTC » is used, but you can pick whatever is appropriate.
    • The tool being used here is « sed », a non-interactive editor which dates back to the 1970’s, and which is still used by system administrators around the world every day.
    • The command we’re issuing to sed is between the single quotes – astute readers will notice that it’s a regular expression, but with @’s instead of the more usual /’s.  In it, we simply state that the instance of « ZONE=”Europe/Paris” » is to be replaced with « ZONE=”Etc/UTC” ».
    • This change is to be made against the backup file, and outputted to the actual config.
  • Finally, we run « tzdata-update » which, as you’ve no doubt guessed, updates the time zone data system-wide, based (in part) on the newly-corrected clock config.

And that, as they say, is that.  Happy kickstarting, friends, and i’ll see you next time !

where to specify ethtool options in Fedora

Hi everybody – here’s a super-quick update for you concerning « ethtool », and how to use it to set options in Fedora properly.  Ethtool is a great little tool that can be used to configure all manner of network interface related settings – notably the speed and duplex of a card – on the fly and in real time.  One of the most common situations where ethtool would be used is at boot time, especially for cards which are finnicky, or have buggy drivers, or poor software support, or.. well, you get the idea.

Times were that if you needed to use ethtool to configure a NIC setting at boot time, you’d just stick the given command line into « rc.local », or perhaps another runlevel script, and forget about it.  The problem with this approach is (at least) twofold :

  • Frankly, it’s easy to forget about something like this, which makes future support / debugging of network issues more of a pain.
  • Anything that automatically modifies the runlevel script (such as updates to the parent package) may destroy your local edits.

In order to deal with these issues, and to standardise the implementation of the ethtool-at-boot technique, the Red Hat (and, thus, Fedora) maintainers introduced an option for defining ethtool parameters on a per-interface basis via the standard « sysconfig » directory system.  Now, this actually happened a number of years ago, but the implementation was poorly announced (and poorly documented at the time), and thus, even today a lot of users and administrators don’t seem to know about it.

Now, there’s a very good chance that you already know this, but just to refresh your memory : in the sysconfig directory, there is another directory called « network-scripts », which in turn contains a series of files named « ifcfg-eth? », where « ? » is a device number.  Each network device has a configuration file associated with it ; for example, ifcfg-eth1 is the configuration file for the « eth1 » device.

In order to specify the ethtool options for a given network interface, simply edit the associated configuration file, and add a « ETHTOOL_OPTS » line.  For example :

ETHTOOL_OPTS="autoneg off speed 100 duplex full"

Now, whenever the network service initialises that interface, ethtool will be run with the specified options.  Simple, easy, and best of all, standardised.  What could be better ?

pohmelfs pt. 2, return of pohmelfs !

Hello again fair readers.  Today i’m going to re-visit POHMELFS, which i introduced in an earlier blog post.  I received a comment on that post which basically asked for more information on some of the more interesting (read : advanced) features of POHMELFS, such as distributed storage, and the like.  Well, today is the day !  If you need a refresher, be sure to skim over my previous post, as we’re going to dive in now right where i left off last time.

patch for the win

One of the reasons that there was a bit of a delay between my last POHMELFS post and this one was because i hit a bug.  Given that we’re working with staging-level code here, that’s to be expected – luckily, thanks to some quick work by Evgeniy Polyakov on the POHMLEFS mailing list, there is still hope – hope in the form of a tasty little patch.

diff --git a/drivers/staging/pohmelfs/trans.c b/drivers/staging/pohmelfs/trans.c
index eab7868..bf7b09a 100644
--- a/drivers/staging/pohmelfs/trans.c
+++ b/drivers/staging/pohmelfs/trans.c
@@ -467,7 +467,8 @@ int netfs_trans_finish_send(struct netfs_trans *t, struct pohmelfs_sb *psb)

-		if (psb->active_state && (psb->active_state->state.ctl.prio >= st->ctl.prio))
+		if (psb->active_state && (psb->active_state->state.ctl.prio >= st->ctl.prio) &&
+				(t->flags & NETFS_TRANS_SINGLE_DST))
 			st = &psb->active_state->state;

 		err = netfs_trans_push(t, st);

Basically, this patch fixes a minor, but ultimately crippling bug related to writing to multiple servers.  The details are not important – what’s important is that we apply the patch and keep the dream alive.  First, you’ll need to copy and paste that block of code into a text file on one of the systems (in « ~/pohmel.diff, for example »).  Then, in order to apply the patch, we’ll need to use a standard tool called (appropriately) « patch » :

[root@host_75 ~]# cd /usr/src/linux
[root@host_75 ~]# patch -p 1 < ~/pohmel.diff
patching file drivers/staging/pohmelfs/trans.c

Now, just as we did last time, we must play the kernel and module compilation and installation game (fun!).  If you need a refresher on how to do this, just go back to my previous post.  Note that this time around, the whole process will be much faster, since only the POHMELFS components need to be recompiled – everything else will stay the same.  As a result, you can skip the part where you archive the entire kernel tree and copy it over – instead, just patch and recompile on each server and the client.  It’s your call.

Once that’s out of the way we’ll reboot, and then it’s off to the races.

a new challenger appears !

It’s now time to add a third machine into the mix (« host_147 » in this case).  Using this new box, we’ll create a simple sort of setup which is, in fact, quite representative of how things might work in the real world : two storage servers and a client.  As you no doubt recall, one of the neat features of POHMELFS is that it can be employed in a parallel fashion, meaning that a file which appears to the client to be in one place, is actaully located in more than one storage medium.  A general way of describing these ideas is by using the terms « logical » and « physical » ; the logical medium is the filesystem that the client sees, and the physical medium is the actual hard drive upon which the data is stored.

In this case, host_75 and host_166 will be the servers, each containing one copy of the data on their respective physical mediums (i.e. hard drives), and host_147 will be our client, which will access the data via the logical medium (i.e. the POHMELFS export).  The new machine was set up in the same way as host_166 was, so we’ll skip over that, and get right to the good stuff.

A new directory should be created on each of the machines : « /opt/pohtest ».  This will serve as the export directory on the servers, and the mount directory on the client – don’t put any data in it yet, though.

server config

On the servers, we’ll initiate the server daemon.  Unlike our first test, where we just let the defaults ride, this time around we’ll configure things a bit more intelligently :

[root@host_75 (and host_166) ~]# fserver -r /opt/pohtest -l /var/log/pohmelfs.log -d

In the above example, « -r » defines the directory to export, « -l » is where to output the logs to, and « -d » puts the process into the background, instead of on our console as before.  This is normally how things would work, so it’s good to get used to it now.  Now, we can follow the log files on each machine by using « tail » :

[root@host_75 (and host_166) ~]# tail -f /var/log/pohmelfs.log
Server is now listening at

client config

With the servers up and ready to go, we can now turn our attention on the client.  Don’t forget to load the pohmelfs module first !

[root@host_147 ~]# modprobe pohmelfs
[root@host_147 ~]# cfg -A add -a -p 1025 -i 1

Now we mount.  It’s important that we mount before we attempt to add the second server into the mix – trying to do it ahead of time will only result in terrible, crippling failure.

[root@host_147 ~]# mount -t pohmel -o idx=1 none /opt/pohtest/

No output means it worked (as usual), so let’s verify :

[root@host_147 ~]# df | grep poh
none                 154590376  10018492 144571884   7% /opt/pohtest

Great, now let’s add the other server :

[root@host_147 opt]# cfg -A add -a -p 1025 -i 1

Now we must wait at least 5 seconds for the synchronisation to occur.  In reality it’s shorter than that, but 5 seconds is an easy number to remember, and it’s safe.  So far this looks exactly the same as before, but there’s a bit of a conceptual twist – as you can see, both of those new add statements have the same index (as denoted by the -i).  This means that they’re grouped together as part of the same logical medium.  We can check on this by using the « show » action :

[root@host_147 ~]# cfg -A show -i 1
Config Index = 1
Family    Server IP                                            Port     
AF_INET                                         1025
AF_INET                                        1025

Everything seems on the up and up so far, so we can go ahead and try our first mount.  A series of options will be passed to the mount line, notably « idx=1 », which means index 1 (as seen above) – this is very important to specify, as without it, POHMELFS won’t be able to determine which logical group you’re talking about.

And if we take a look at the log output on the servers, we’ll see that the client connection has been accepted.  Both of the logs should show the accepted line, but with different port numbers (the trailing digits at the end) :

Accepted client

There are other diagnostics we can run to take a look at what we’ve got running.  At this stage they won’t tell us anything we don’t already know, but it will give us some practice with the tools and data, so that when the time comes to debug problems down the road, we’ll be ready.

For example, POHMELFS will write some handy information to « mountstats », which is exactly what it sounds like :

[root@host_147 ~]# cat /proc/1/mountstats
device none mounted on /opt/pohtest with fstype pohmel
idx addr(:port) socket_type protocol active priority permissions
1 1 6 1 0 3
1 1 6 1 0 3

It’s not lined up very nicely, but the interesting column right now is « active », which lists « 1 » in both cases, meaning the connections are open.  The « permissions » column lists « 3 » for both nodes which, in this case, means that they’re both available for reading and writing (as opposed to being read or write-only, which are also valid options).

but will it blend ?

Accepting the connection is one thing – successfully reading and writing files is entirely another.  Let’s do some tests ; first we’ll use the client to create an empty file in mount :

[root@host_147 ~]# cd /opt/pohtest/
[root@host_147 pohtest]# touch FILE
[root@host_147 pohtest]# ls

Great, now let’s take a look at our servers :

[root@host_166 pohtest]# ls -l
total 0
-rw-r--r-- 1 root root 0 2009-07-06 16:58 FILE
[root@host_166 ~]#

And the other :

[root@host_75 ~]# ls -l /opt/pohtest/
total 0
-rw-r--r-- 1 root root 0 2009-07-06 16:46 FILE
[root@host_75 ~]#

Now, during my limited tests, i noticed a small lag time between my manipulations on the client, and when those actions were reflected on the servers.  At this stage of the game i’m not sure whether that’s normal or not, or exactly what’s causing it – so don’t be alarmed if you see a small lag as well.  I’ll be sure to post further updates on this point once i’ve got more information.

Update : As per Evgeniy on the mailing list :

This delay is not a bug, but feature - POHMELFS has local cache on
clients and  data written on client is stored in that cache first and
then flushed to the server when client is under memory pressure or when
another one requests updated but not yet flushed data.

To force client to flush the data one can 'sync' on client or use
'flush' utility on the server. The latter will invalidate data on the
client (which implies it to be flushed to the server first), so server
update will become visible next time client reads that data.

how not to do it

Let’s do another little test.  On one of the servers, we’ll perform a manipulation in the POHMELFS export directory :

[root@host_75 ~]# touch /opt/pohtest/host75file
[root@host_75 ~]# ls -l /opt/pohtest/
total 4
-rw-r--r-- 1 root root 5 2009-07-06 16:46 FILE
-rw-r--r-- 1 root root 0 2009-07-06 16:57 host75file
[root@host_75 ~]#

Great, but if we take a look at the other server :

[root@host_166 ~]# ls -l /opt/pohtest/
total 4
-rw-r--r-- 1 root root 5 2009-07-06 16:59 FILE

And the client :

[root@host_147 ~]# ls -l /opt/pohtest/
total 0
-rw-r--r-- 1 root root 5 2009-07-06 20:47 FILE

We notice that it’s not there.  Why ?  Unfortunately, like so much bureaucracy, we didn’t go through the proper channels.  Recall that our client has certain software running on it that allows it to speak to both servers, and that the mountpoint uses that software to ensure consistency between across the shared filesystem.  In the example above, we wrote directly to the underlying filesystem of the server – completely avoiding said software – and thus POHMELFS had no way of knowing that a manipulation had occured.

In short – if you want to keep things consistent, you must interact via a client.  But what if we want our servers to be able to interact with the data as well ?  Well, there’s nothing stopping us from setting up client processes on our servers, too.  This, however, will have to wait for the next instalment.

See you on the intertubes !