HAProxy Puppet module (phrawzty remix)

As part of a big Logstash project at Mozilla (more on that to come), I was looking for an HAProxy module for Puppet, stumbling across the official Puppetlabs module in the process.  I’m told that this module works fairly well, with the caveat that it sometimes outputs poorly-formatted configuration files (due to a manifestly buggy implementation of concat).  Furthermore, the module more or less requires storeconfigs, which we do not use in our primary Puppet system.

Long story short, while I never ended up using HAProxy as part of the project, I did remix the official module to solve both of the aforementioned issues.  From the README :

This module is based on Puppetlabs’ official HAProxy module; however, it has been “remixed” for use at Mozilla. There are two major areas where the original module has been changed :

  • Storeconfigs, while present, are no longer required.
  • The “listen” stanza format has been abandoned in favour of a frontend / backend style configuration.

A very simple configuration to proxy unrelated Redis nodes :

  class { 'haproxy': }

  haproxy::frontend { 'in_redis':
    ipaddress       => $::ipaddress,
    ports           => '6379',
    default_backend => 'out_redis',
    options         => { 'balance' => 'roundrobin' }
  }

  haproxy::backend { 'out_redis':
    listening_service => 'redis',
    server_names      => ['node01', 'node02'],
    ipaddresses       => ['node01.redis.server.foo', 'node02.redis.server.foo'],
    ports             => '6379',
    options           => 'check'
  }

If that sounds interesting to you, the module is available on my puppetlabs-haproxy repo on Github. Pull requests welcome !

LCA, the event that was

Linux Conf Australia 2013, (alias linux.conf.au, alias LCA) was an amazing experience, and it’s difficult to summarise it succinctly, but there are definitely some highlights and important take-aways that I’d like to make special note of.

My fundamental reason for attending was that Ben Kero and I were invited to do a talk regarding Puppet (an important infrastructural tool that we use at Mozilla).  The talk, like all of the presentations given by Mozillians at LCA this year (and there were more than a few), was very well-received, and delivered to a packed auditorium – in fact, people had to be turned away at the door beforehand !

Since our talk bumped up against the lunch period, we had occasion to stay in the space and launch an informal panel discussion that included other Mozillians as well as representatives from a number of other Open Source companies and organisation.  We engaged on a variety of topics, ranging from IT-centric to questions about the future of the web, and the importance of open standards and market competition.  It was only when the organisers forced us out that the auditorium was finally cleared.

Interestingly, from the moment I arrived in Australia, and through until the very last day, I found myself acting as an ambassador for Mozilla.

For example, a day before the conference even started, I was part of an informal debate concerning the future of mobile and the importance of FirefoxOS within it.  Other participants included a Ubuntu employee and a number of hardware hackers who – as it turned out – were already trying to port FirefoxOS to other types of phones.

Another example even occurred at a local cricket match.  I was wearing my Firefox T-shirt (which usually generates interest), and ended up doing a some impromptu demos of Firefox for Android to a couple of different groups – the highly satisfying result was that a handful of people downloaded and installed it on the spot !

These, and more, are all opportunities to engage people about Mozilla, and in many cases, to introduce our values and mission to entirely new audiences.  During this and other conferences, I’ve found that even long-time users of Firefox and Thunderbird (for example) are often not even aware of who and what Mozilla is, and what we’re about.  I try my best to be a good ambassador, and I’m proud to represent the organisation wherever, and whenever I can.

From a personal perspective, the conference had two major benefits: it was an excellent learning experience, and a fantastic opportunity to spend time in-person with a number of my co-workers in the IT/Ops team.

Of note, I had the pleasure of attending a number of talks and presentations about technology that we use in our environment that have had a direct impact in my work-flow already.  Beyond the directly-applicable benefits, I was particularly inspired by a presentation about programming for embedded devices (shout-out to Bunnie Huang) which, due to peculiar technical constraints, requires a very precise and measured approach to development.  These ideals could – and should – be carried through outside of the embedded world.

In summary, LCA 2013 was an excellent opportunity to learn, teach, and engage the public concerning technology in specific, Mozilla in particular, and open source in general.  I’m already looking forward to next year !

How to use Puppet like an Adult

Hello friends,

Recently, Ben Kero (a fellow Mozillian) and I were invited to present a talk at Linux Conf Australia.  To say that we were excited about presenting at one of the best Libre / Open Source conferences in the world is an understatement.  We knew that we’d have to bring our A-game, and in all modesty, I like to think that we did.  If you were there in person, I’d like to personally thank you for coming out, and if you couldn’t make it, that’s ok – the organisers have made many videos from the 2013 LCA available online, including ours, entitled “How to use Puppet like an Adult“.

We cover a variety of topics, including parametrisation, how to select good pre-built modules, and how you can build eco-systems around Puppet itself.  Please feel free to drop us a line, either on Twitter or here on the blog.  Thanks !

Puppet at Mozilla, the podcast

Hello everybody !

If you’re interested in learning a little about Puppet at Mozilla, I highly recommend that you check out Puppet Podcast #7, where Brandan Burton (@solarce) and I (@phrawzty) talk with Puppet Labs’ Mike Stahnke (@stahnma) about just that topic.  It was our first podcast together, and frankly, it was more difficult that I thought it’d be.  That said, it was a great experience, and I hope to do it again sometime.  Hope you enjoy it !

quickly generate an encrypted password

Hi everybody !  Here’s a quick method for generating encrypted passwords that are suitable for things like /etc/passwd .  I realise that this isn’t terribly complex, but honestly, I always forget how to do this until I actually need to do it – so here’s a reminder for all of us. 🙂

#!/bin/bash

if [ "x$1" == 'x' ]; then
echo "USAGE: $0 'password'"
exit 1
fi

# Get an md5sum of the password string; this is used for the SHA seed.
md5=$( echo $1 | md5sum )
extract="${md5:2:8}"

# Calculate the SHA hash of the password string using the extracted seed.
mkpasswd -m SHA-512 "$1" "$extract"
exit $?

Elasticsearch backup strategies

Update: This is an old blog post and is no longer relevant as of version 1.x of Elasticsearch. Now we can just use the snapshot feature.

Hello again! Today we’re going to talk about backup strategies for Elasticsearch. One popular way to make backups of ES requires the use of separate ES node, while another relies entirely on the underlying file system of a given set of ES nodes.

The ES-based approach:

  • Bring up an independent (receiving) ES node on a machine that has network access to the actual ES cluster.
  • Trigger a script to perform a full index import from the ES cluster to the receiving node.
  • Since the receiving node is unique, every shard will be represented on said node.
  • Shutdown the receiving node.
  • Preserve the /data/ directory from the receiving node.

The file system-based approach:

  • Identify a quorum of nodes in the ES cluster.
  • Quorum is necessary in order to ensure that all of the shards are represented.
  • Trigger a script that will preserve the /data/ directory of each selected node.

At first glance the file system-based approach appears simpler – and it is – but it comes with some drawbacks, notably the fact that coherency is impossible to guarantee due to the amount of time required to preserve /data/ on each node. In other words, if data changes on node between the start and end times of the preservation mechanism, those changes may or may not be backed up. Furthermore, from an operational perspective, restoring nodes from individual shards may be problematic.

The ES-based approach does not have the coherency problem; however, beyond the fact that it is more complex to implement and maintain, it is also more costly in terms of service delivery. The actual import process itself requires a large number of requests to be made to the cluster, and the resulting resource consumption on both the cluster nodes as well as the receiving node are non-trivial. On the other hand, having a single, coherent representation of every shard in one place may pay dividends during a restoration scenario.

As is often the case, there is no one solution that is going to work for everybody all of the time – different environments have different needs, which call for different answers.  That said, if your primary goal is a consistent, coherent, and complete backup that can be easily restored when necessary (and overhead be damned!), then the ES-based approach is clearly the superior of the two.

import it !

Regarding the ES-based approach, it may be helpful to take a look at a simple import script as an example.  How about a quick and dirty Perl script (straight from the docs) ?

use ElasticSearch;

my $local = ElasticSearch->new(
    servers => 'localhost:9200'
);
my $remote = ElasticSearch->new(
    servers    => 'cluster_member:9200',
    no_refresh => 1
);

my $source = $remote->scrolled_search(
    index => 'content',
    search_type => 'scan',
    scroll      => '5m'
);
$local->reindex(source=>$source);

You’ll want to replace the relevant elements with something sane for your environment, of course.

As for preserving the resulting /data/ directory (in either method), I will leave that as an exercise to the reader, since there are simply too many equally relevant ways to go about it.  It’s worth noting that the import method doesn’t need to be complex at all – in fact, it really shouldn’t be, since complex backup schemes tend to have too many chances for failure than is necessary.

Happy indexing!

Send your logs to the cloud; Loggly vs. Papertrail

N.B. This post is from 2011 – the landscape has changed since then…

 

Centralised cloud-based logging.  It sounds tasty – and it is – but who should you go with?  Well, Loggly and Papertrail are the only games in town when it comes to the aforementioned service; the only other competitor in this space is Splunk Storm, but their offering – well-pedigreed though it may be – is strictly in private beta at this time, and therefore cannot really be considered a valid option.

The fact of the matter is that Loggly and Papertrail are, at a high level, functionally identical. They offer more or less the same bouquet of functionality, including alert triggers, aggregate visualisation, and even map reduce tools for data mining and reporting. Loggly has been around longer, and has a better track record for open-source involvement, meaning that the eco-system around their service is more mature; however, that doesn’t mean that they are necessarily superior to Papertrail in terms of the actual service.

My suggestion: If you’re in a hurry, flip a coin and go with one or the other. If you have the time, you should go ahead and try both out for a bit; Papertrail has a 7-day free trial programme, and Loggly is free (in perpetuity) for sufficiently small amounts of data and retention (which is no problem if you’re just poking around).

I’m very interested in hearing about actual user experiences with either or both, so please don’t hesitate to add a comment or drop me a line directly via the contact form.

Edit: From @pyr : « you  can also consider @datadoghq which has a different take on the issue but might fit the bill. »

Edit 2: From the comments, there’s also Logentries, which I don’t personally have any experience with, but which appears to offer a reasonably comprehensive offering as well.