Observability as code: dashboards

I recently received a really great question from a Datadog customer that I’d like to share concerning Observability as Code—specifically managing dashboards via Terraform in this case, but the question and answer apply more broadly.

While we use Terraform for 100% of our Datadog monitors, we’ve found [that] developers are less likely to update dashboards when they see a mistake in the UI, because they need to open a branch, run a pipeline and merge it in. This is a lot of overhead for a quick dashboard fix. Do you have any thoughts on this?

SRE Team Lead

First, the standard disclaimer: There’s no one true way to do anything, and every recommendation or “best practice” out there needs to be weighed against the reality of the environment and organisation that you’re a part of. 🙂

That said, there are a few ways to look at this. If you’re part of a small shop with only a handful of resources, then it might be entirely reasonable to manage everything by hand—not just the dashboards. Scale, therefore, is a big part of the equation—and that notion extends to both computers and people. If you’re part of a small team that works either in isolation, or on things which are so specific as to be unusable to others, then here again, it’s entirely reasonable to manage those dashboards by hand. Still, even here one could make an argument for treating them as code: it’s self-documenting (for values of documentation), and it serves as a backup mechanism for when one of those manual edits inevitably goes awry.

As scale increases, so does the value of managing the dashboards as code. A prime example is repeatability. Many organisations end up deploying multiple iterations of very similar stacks. Creating the dashboards for these stacks by hand becomes tiresome very quickly. Furthermore, as the stack itself changes, having to go through each dashboard set to update them can be a nightmare (not to mention making sure that future stacks get the correct dashboard updates as well).

Some organisations may have more tightly controlled environments than others, and traceability for modifications is important. Even in environments that aren’t particularly controlled, there may be a healthy peer review culture that, far from being a burden, is a huge asset in ensuring that even small changes have more than one set of eyes in order to ensure completeness and suitability. In these cases, having tight feedback loops and rapid CI/CD mechanisms is paramount—if engineers feel like they’re fighting heavyweight tooling for “small changes”, then that’s a problem that should be addressed beyond the issue of dashboards specifically.

If you want to learn more about Datadog, Hashicorp Terraform, and Observability as Code, I invite you to check out our upcoming joint webinar on the topic on 23 September at 12:00 UTC. Hope to see you there!

Devopsdays Berlin 2018; recap

The sixth edition of DevOpsDays Berlin was held over the 12th and 13th of September 2018, and despite the fact that it’s one of the most consistent, longest-running Devopsdays events on the calendar, this year was my first ever opportunity to attend. The tl;dr is that this was a very fine event, and the local organising team clearly has their act together (though they are actively recruiting for new volunteers).

The talks were not recorded, so for the benefit of those who couldn’t make it, I’ve refined my notes from the talks on Day 2 (unfortunately I had to miss Day 1).

Dirk, on trust

The day started with a presentation from Dirk Lehmann of SAP entitled “Trust as Foundation of DevOps”. Dirk has been a DevOps practitioner and booster for years, and I’ve had the pleasure of seeing him speak a handful of times before – as usual, he didn’t disappoint. The central thesis of this talk is right there in the title: that trust, that fickle, difficult, human emotional thing, is the fundamental element of DevOps. He proposed a few ways to think about trust beyond a philosophical concept, and introduced a way of measuring the level of trust within an organisation: by measuring speed. Fast decision making, execution, and turnaround are hallmarks of a trust-based organisation. In terms of fostering a trusting environment, he proposed three key elements: managing team size, cultivating diversity, and embracing “positive failure culture”.

Regarding team size, he looked to the seminal Mythical Man Month, as well as the work of evolutionary psychologist Robin Dunbar (see: Dunbar’s Number). Dirk’s conclusion? 15 people is the largest a team can get before it it breaks – and often it should be smaller. Concerning diversity, the take-away was that teams should organise around projects, products, or goals, instead of the traditional “business unit” (or work-type silos) that have been commonplace for much of the post-industrial era. Finally, citing Google’s research that investing in psychological safety – more than anything else – was critical to making a team work, Dirk argued that embracing failure as a learning experience is paramount.

Yuwei, on mentoring

The next talk was by Yuwei Ren, a Production Engineer at Facebook, entitled “Mentorship: From the receiving end”. Despite the title, most of her talk was about mentorship from the perspective of the mentor, and it was chock full of good tips for anybody engaging in a mentoring relationship.

For example, as a mentor, how do you get off to a good start with your mentee? Start by establishing career goals, and then ask “How can I help get you there?”. Ensure that the expectations are set, and that outcomes are clearly defined by both parties. This is done by establishing a plan to achieve those goals, and setting up milestones along the way. During the process, schedule regular check-ins, and stick to them! Set the bar high, and along with those lofty expectations, make sure that both parties are accountable to each other. The mentor/mentee relationship is a big responsibility, and it must be treated with respect and commitment.

Both parties must acknowledge that disagreements and misunderstanding are inevitable, so empathy is key. Assuming good intentions is important, but that’s only the first part of the equation – both parties need to actively listen and respect each other both professionally and as human beings. Finally, the mentee needs to take responsibility for their career, actively seeking and providing feedback, and acting on the advice of the mentor.

Ken, on DevOps transformations

Next up was Ken Mugrage of Thoughtworks (and fellow member of the DevOpsDays global core) with a talk entitled “Want to successfully adopt DevOps? Change Everything”. His talk was very dense, and to be honest I didn’t quite capture all of it. His four key points to successful DevOps adoption were: redefine words, change your organisation, change your architecture, and use CD to safely deploy more often.

Within a team specifically, and an organisation in general, agreeing on specific definitions is critical. In terms of defining the word DevOps, Ken argued that DevOps should not define a toolset, nor a role, nor a team. Instead, DevOps is verbs: developING and operatING. If you need a noun, why not CAMS or CALMS (upon which much has been written before). Ken then offered his interesting description of DevOps:

A culture where people, regardless of title or background, work together to imagine, develop, deploy, and operate a system.

Words are important, but action is critical, and the best way to implement CA(L)MS is to change how your organisation is, well, organised. Ken invoked Conway’s Law and by way of illustration, explained how just renaming the Ops Team to DevOps team isn’t going to help (nor is any other action that just leaves silos in place). Teams need to be organised differently, which means that the whole organisation might need to be organised differently as well. Furthermore, the success of those teams must be measured by how they deliver business value (not just whether their software functions or not).

Ken then moved into continuous delivery versus continuous deployment, and this is where things got fuzzy for me.


Three sponsors had an opportunity to say a few words on stage: Chef, Google Cloud, and Quest. I mention this because it’s important to acknowledge the important role that sponsors play in the DevOpsDays series – without them, the events simply wouldn’t be able to happen. Thanks, sponsors!

Tiago, on Microsoft

The final full presentation of the day was from Tiago Pascoal from the Azure DevOps Customer Advisory Team, speaking about how a unit within the Microsoft empire experienced its own DevOps transformation. This one started slow, but once Tiago got going, it ended up being very interesting – so I’ll skip right to the good stuff.

Within the Azure DevOps unit, teams are assembled into 10 to 12 person groups, the composition of which is heterogeneous and organised around products (see: Dirk’s argument above). These teams are self-forming, so everybody decides which team and/or manager they want to work with. Apparently, while 100% of the group gets a choice, typically less than 20% choose to change from their existing situation. (To me, this is wild, and I’d be very interested to try something like this out at some point in my career.)

Tiago invoked Daniel Pink’s popular book Drive, noting that need teams 3 things: autonomy, mastery, and purpose. He added another layer, which he referred to as “Alignment vs Autonomy”. Autonomy is everything related to planning and execution,  whereas alignment is the organisation, the team structure, and most importantly, the cadence of work. For example, they work in 3-week sprints, and at the end of the sprint, whatever is in the master branch gets deployed – so be ready.

At an organisational level they track metrics; but NOT the classics like burndown, velocity, original estimate, completed hours, team capacity, or number of bugs found. Individual teams can track those if they wish, but at a high level there’s only one metric that matters: impact. Are they delivering value? Speaking of measuring, in terms of code quality, Tiago explained the “Bug Cap”: n * 5 = x where n is # of engineers, and x is maximum bug limit. If the bug count exceeds the bug cap at any time, the team stops working on new features and moves immediately to stabilise. Neat!


After lunch, it was time for everybody’s favourite DevOpsDays tradition: the ignites! I was preparing for my own ignite so I missed the first one (sorry).

Daiany, on relaxing at speed

This was an excellently-delivered lightning that told the (true?) story of two teams and their relative stress levels heading into the GDPR cut-off date. It was fun, light-hearted, and carried a strong message: when you’re doing the DevOps at high velocities, even massive EU legislation can’t stress you out! Well done, Daiany.

Daniel, on serverless

I did a lightning talk on serverless. It was fun. 🙂


First off, thanks to the local DevOpsDays Berlin volunteer group. Also, thanks (again) to the sponsors. Finally, thank-you to my employer, Datadog, for sending me out there to speak. There are so many moving parts to make even a small conference happen.

I hope to make it out next year!

So long, and thanks for all the fish.

I joined Mozilla nearly five years ago, which was an interesting period culturally for the company – we were transitioning from being a scrappy, disorganised group of hackers into a proper corporation with a global reach.  One of my first bugs was to audit our load balancers – another was to help fix the tap in the office beer keg.  It was an interesting time, fraught with mistakes, but heady with the promise of opportunity.

For the first few years, I was part of the WebOps team, and as such, got to work with a wide range of projects and people around the world.  Every day was a trial by fire, and while it was often times difficult or frustrating, the lens of hindsight now allows me to understand how valuable it was as a learning experience.

From there, I moved into Services Engineering, and had the opportunity to work on the Crash Stats project – a humbling experience if ever there was.  I was exposed to some brilliant technologists and learned more about resilient systems design than I ever dreamed possible.  I was also exposed to some great managers, and will remain eternally grateful for the mentorship I received both directly and indirectly.

Finally, I landed with the Cloud Services and Operations teams, and even though I’d been with the company for a long time, it was like my first days all over again: new and complex systems, new and complex people.

Speaking of people, I’d be remiss if I didn’t say this: Mozilla is full of some of the finest human beings I’ve ever had the honour and pleasure of working with.  The spirit, drive, and desire to do good is something that is more than rare in the Valley (or anywhere), and the importance of this aspect cannot be overstated.

Mozilla has treated me well.  I’ve travelled the world, honed my skills, and learned so much, both professionally and personally.  It’s been a wild, interesting, life-changing ride.

Today, more than ever, the Internet needs Mozilla.  Even though my personal journey is taking me in a different direction, the mission and manifesto are more important than ever before.  The show must go on.
Much love to all, and thanks for everything.

Genesis: Terraforming a New Home for Firefox Crash Reporter

Last year, my esteemed colleague JP Schneider and I were invited to keynote at a couple of conferences last year. We gave two variations on the same talk, entitled “Genesis: Terraforming a New Home for Firefox Crash Reporter”, at each of Hashiconf 2015 and Velocity EU 2015 respectively.

The blurb for these talks is as follows:

Everyone loves talking about different stacks used in migrating applications to DevOps methodologies, but the most important and valuable change is not a technology stack; instead, it is the human stack.

In this keynote address, we will cover the intersection of technology and humans, and walk the audience through a real life example of the move of Firefox crash reporter. The three engineers tasked with this had to build the plane while it was in the air, all without landing or crashing.

As with many projects, hard deadlines and requirements made the team work through a lot of tough decisions and compromises while simultaneously training devs, product managers, managers, and other engineers in the new world of DevOps and Continuous Delivery.

The talks were a lot of fun and were well received by both audiences.  We keep things light (including images and quotes from Shakespeare to Mobb Deep) and we keep them honest (both our successes and failures).

The Velocity talk, which at 25 minutes in length is the shorter of the two, is aimed at a more general audience; the Hashiconf talk is longer, and includes a lot more detail about the Hashicorp tools that we used to reach our goals. I hope you enjoy either or both of them. 🙂

Handling extant resources in Terraform

Terraform is a Hashicorp tool which embraces the Infrastructure as Code model to manage a variety of platforms and services in today’s modern, cloud-based Internet.  It’s still in development, but it already provides a wealth of useful functionality, notably with regards to Amazon and Digital Ocean interactions.  The one thing it doesn’t do, however, is manage pre-existing infrastructure very well.  In this blog post we’ll explore a way to integrate extant infra into a basic Terraform instance.

Note that this post is current as of Terraform v0.3.6.  Hashicorp has hinted that future versions of Terraform will handle this problem in a more graceful way, so be sure to check those changelogs regularly. 🙂


A full example and walk-through will follow; however, for those familiar with Terraform and just looking for the tl;dr, I got you covered.

  • Declare a new, temporary resource in your Terraform plan that is nearly identical to the extant resource.
  • Apply the plan, thus instantiating the temporary “twinned” resource and building a state file.
  • Alter the appropriate id fields to be the same as the extant resource in both the state and config files.
  • Perform a refresh which will populate the state file with the correct data for the declared extant resource.
  • Remove the temporary resource from AWS manually.
  • Voilà.

faster and more dangerous, please.

Walking through the process and meticulously checking every step? Ain’t nobody got time for that!

  • Edit the state file and insert the resource directly – it’s just JSON, after all.


In the examples below, the notation [...] is used to indicate truncated output or data.

Also note that the AWS cli tool is assumed to be configured and functional.


The extant resource in this case is an S3 bucket called phrawzty-tftest-1422290325. This resource is unknown to Terraform.

$ aws s3 ls | grep tftest
2015-01-26 17:39:07 phrawzty-tftest-1422290325

Declare the temporary twin in the Terraform config:

resource "aws_s3_bucket" "phrawzty-tftest" {
    bucket = "phrawzty-tftest-1422353583"

Verify and prepare the plan:

$ terraform plan -out=terratest.plan
Path: terratest.plan

+ aws_s3_bucket.phrawzty-tftest
    acl:    "" => "private"
    bucket: "" => "phrawzty-tftest-1422353583"

Apply the plan (this will create the twin):

$ terraform apply ./terratest.plan
aws_s3_bucket.phrawzty-tftest: Creation complete

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
State path: terraform.tfstate

Verify that the both the extant and temporary resources exist:

$ aws s3 ls | grep phrawzty-tftest
2015-01-26 17:39:07 phrawzty-tftest-1422290325
2015-01-27 11:14:09 phrawzty-tftest-1422353583

Verify that Terraform is aware of the temporary resource:

$ terraform show
  id = phrawzty-tftest-1422353583
  acl = private
  bucket = phrawzty-tftest-1422353583

Alter the config file:

  • Insert the name of the extant resource in place of the temporary.
  • Strictly speaking this is not necessary, but it helps to keep things tidy.
resource "aws_s3_bucket" "phrawzty-tftest" {
    bucket = "phrawzty-tftest-1422290325"

Alter the state file:

  • Insert the name (id) of the extant resource in place of the temporary.
            "resources": {
                "aws_s3_bucket.phrawzty-tftest": {
                    "type": "aws_s3_bucket",
                    "primary": {
                        "id": "phrawzty-tftest-1422290325",
                        "attributes": {
                            "acl": "private",
                            "bucket": "phrawzty-tftest-1422290325",
                            "id": "phrawzty-tftest-1422290325"

Refresh the Terraform state (note the ID):

$ terraform refresh
aws_s3_bucket.phrawzty-tftest: Refreshing state... (ID: phrawzty-tftest-1422290325)

Verify that Terraform is satisfied with the state:

terraform plan
Refreshing Terraform state prior to plan...

aws_s3_bucket.phrawzty-tftest: Refreshing state... (ID: phrawzty-tftest-1422290325)

No changes. Infrastructure is up-to-date. This means that Terraform
could not detect any differences between your configuration and
the real physical resources that exist. As a result, Terraform
doesn't need to do anything.

Remove the temporary resource:

$ aws s3 rb s3://phrawzty-tftest-1422353583/
remove_bucket: s3://phrawzty-tftest-1422353583/

S3, faster.

For the sake of this example, the state file already contains an S3 resource called phrawzty-tftest-blah.

Add the “extant” resource directly to the state file.

            "resources": {
                "aws_s3_bucket.phrawzty-tftest": {
                    "type": "aws_s3_bucket",
                    "primary": {
                        "id": "phrawzty-tftest-1422290325",
                        "attributes": {
                            "acl": "private",
                            "bucket": "phrawzty-tftest-1422290325",
                            "id": "phrawzty-tftest-1422290325"


$ terraform refresh
aws_s3_bucket.phrawzty-tftest: Refreshing state... (ID: phrawzty-tftest-1422290325)
aws_s3_bucket.phrawzty-tftest-blah: Refreshing state... (ID: phrawzty-tftest-blah)


$ terraform show
  id = phrawzty-tftest-1422290325
  acl = private
  bucket = phrawzty-tftest-1422290325
  id = phrawzty-tftest-blah
  acl = private
  bucket = phrawzty-tftest-blah

That’s that.

Sysadvent 2014: What Sysadmins can learn from Community Management

The annual Sysadvent series of blog posts is well under-way and this year features, for the first time ever, an entry authored collaboratively by myself and Jonathan Clarke entitled « What System Administrators Can Learn From Community Management ».  This post was borne from notes collected over numerous Devops Paris meet-ups and other local community get-togethers, as well as from the personal experiences of both Jon and I in these respective domains.  While the article is certainly valuable, it is by no means definitive, and my feeling is that it represents the beginning of the discussion – not the end.  I’d love to hear from others who are involved in one or both of these areas, as there is so much more to explore concerning the interesting relationship and parallels between these interesting domains.  Feel free to comment here, or better yet, hit me up on the Twitters.

For the sake of posterity, I will re-post the article here.


At first blush it might not seem like there are any parallels between system administration and community management, but the two fields have much more overlap than you might think! Before we begin, let’s define community management briefly, and to do so, I’m going to lean on this this excellent post by Deb Ng. It’s not marketing, nor social media, nor tech support, though these things can be involved. A community manager is a strategist, a content creator, and above all, a communicator. In this article, we will explore a handful of key principles from the community management world and see how they apply to system administration.

It’s important to realise that both fields are concerned with systems, which is to say, “an assemblage or combination of things or parts forming a complex or unitary whole” (source). In the case of sysadmins, one normally thinks of the relationship between nodes in a cluster, or clusters in a farm, or farms in a data centre, or – well, you get the idea. For community managers, one normally thinks of the relationship between members of a forum, or forums in a site, or sites in a community, or – well, here again, you get the idea. In both cases, the relationships between the individual elements of the whole can be complex, and often require specialised tools and years of experience to get a handle on.

Managing groups at scale

Scale is an important concept in community management. What works for managing a group of two people might not work for a group of 20 or 200 people. It’s generally easier to discuss a topic in a small group and achieve consensus when there are mutually-shared interests. Small groups can function directly together since the number of individual voices is low and the space needed for them to work is commensurate with their size. Logistically speaking, a small group can make use of pretty much any communication platform that happens to be convenient – a common list of people to CC when emailing and a Google document might well be enough.

Contrast that with managing a community of, say, 50 people – that’s a big list of people to keep tabs on every time you need to write an email (not to mention the amount of noise that might get generated with an errant reply- all), and as anybody who’s ever tried to play a fast-paced game of multiplayer text editing knows, that’s a lot of cooks in the kitchen when it comes to a shared document. Even at this modest scale, some sort of more purpose-built tooling is required, such as a proper mailing list or forum site. This necessity is borne from two important effects of larger groups which, for our purposes, we can refer to as quantity and quality.

Quantity is straightforward: more people means more conversations! A handful of emails per week is trivial for any one person to deal with, but multiply that by a factor of ten or 100, and the trickle becomes a raging torrent. The more information that’s being generated, the more difficult it becomes for any one member of the group to process it effectively.

Quality refers to the nature of those conversations, and in particular, their level of diversity. Each person in the community will bring their own particular histories, viewpoints, cultural perspectives, and so forth to the group. Good tooling can make all the difference when trying to manage such a heterogenous system. A straightforward example of this is mature forum software that has conversational threads, thematic areas, and a robust back- end for gathering statistics and metrics about usage.

From here it shouldn’t be difficult to make the mental leap to system administration. Two servers sitting on your desk are pretty straightforward to deal with – even if they’re totally different, there’s only two of them, so you probably don’t need any special infrastructure to manage them. Scale that up to a dozen servers, however, and the situation becomes much more complex. The dual effects of quantity and quality are instantly applicable. At even relatively small scales the importance of a coherent configuration management tool becomes evident: something that allows the different nodes in the system to be organised as operational or thematic groups, and with a robust back-end for gathering statistics and metrics about usage. Sound familiar?

We’re going to come back to scale later, but for now, let’s move on to something a little bit more esoteric.

Writing a mission statement

I know what you’re thinking: “mission statement? That’s just corporate-speak nonsense!” And yes, that can certainly be the case, but it doesn’t have to be. In fact, a good mission statement is essential for any community, and explicitly stating it can help to crystallise the community in powerful ways.

Most organic communities assemble around an initial idea or principle. While this shared element is enough to form the community, it’s rarely enough to maintain it, and almost certainly insufficient for real growth – the secret sustainability sauce is made up of the values that original germ represents. In other words, people come for the brilliant spark, but they stay for the ideas generated from the spark. A mission statement is a way to codify and present both.

Sound a little high level? Consider that at some point in your life you’ve probably come across an open source project that was little more than a few hundred lines of code and a README. But something about this project made it different from the others. The documentation, ersatz though it might have been, clearly elucidated the problem and how this project acted as a (potential) solution. This simple addition to the repository made all the difference in the world because it helped you to understand how the project could fit your particular situation. Even without a deeper analysis of the code in question, your interest was piqued and your desire to learn more was set into motion. What happened there? Believe it or not, what captured your attention was, fundamentally, a mission statement.

Most established companies have mission statements. These are organisation- wide screeds that may or may not have any direct relationship with the work that any one contributor effects within that organisation. Is it possible to bring the granularity level down a notch or two, and if so, what might the benefit be? Could it be useful to a system administration team?

Imagine you’re just starting on a new project: a simple but powerful collection of scripts to manage virtual machines that’s going to make everyone’s day-to-day work much easier. Right now, even before you’ve begun writing any code, is the perfect time to write a mission statement. Start with something simple like “I’m fed up of typing in a dozen long and complicated commands every time I need to migrate a VM in our infrastructure, especially in the middle of the night, when on call. I intend this to be a small wrapper tool that can do normal migrations for us by typing in just one command. This should reduce the amount of time we waste on manual migrations and help to eliminate typing errors”. This brief, simple statement encapsulates both the problem and the solution, clearly defines the scope of the project, and describes both the initial idea germ and – critically – the parameters for success.

There are numerous benefits to drafting your mission statement at the earliest possible stage. Crucially, by acting as a sort of combination compass and mold, it acts as a guide to keep the contributors focused (even if you’re the only one), which helps to nip things like feature creep in the bud straight away. Keep it short and don’t overthink it. Much like setting up some new test infrastructure, or working on a new program, the best thing is to start small and iterate over time as necessary. Finally, writing a mission statement should be easy; if you’re embarking on a project, then you know best what problem you’re trying to solve, and at those early stages is when you’ll have the most clarity about it. If for no one else, write the mission statement for your future self – a few weeks down the line you may have gone so deep you completely forgot the original problem you were trying to solve!

A good mission statement can also be used to generate excitement which, as we’ll see below, can be a powerful tool for getting things done.

Encouraging involvement

A key principle of community management is actively encouraging contribution and involvement from the community. A good community manager will generate excitement and foster enthusiasm. Cultivating a positive emotional connection to the community will make members more loyal and encourage them to participate even more – a virtuous cycle that in turn makes the community even stronger. That said, while a general feeling of excitement is, like that initial idea germ, enough to create a spark, it’s insufficient to maintain the community. The excitement needs to be backed up by concrete measures to encourage active involvement.

There is a fairly well-known philosophy in the open source world called “Release early, release often” that was popularised by Eric S. Raymond in the late 90’s. The essence of the philosophy is that sharing versions of a software project early and often will ultimately improve the overall quality of the software. This happens because beta-testers and early adopters can have fun playing with new features while contributors can learn the codebase by fixing obvious bugs – that early feedback will help influence development in useful ways. In fact, as long as they’re not fatal, it can be helpful to leave some bugs in there deliberately! Providing low hanging fruit can enable an enthusiastic contributor to jump into the project by giving them the chance to fix that bug themselves.

Looking at this from a systems perspective, this is basically a process with a feedback loop, and by releasing often the loop is accelerated. In other words, excitement is generated by rapid development, and the rapid development is encouraged by excitement – it’s almost too easy!

As a system administrator, while you may certainly be leading up a project for which the rapid release model is applicable, that won’t always be the case. That’s no reason not to adopt this principle, however; by actively encouraging your co-workers to get involved, you help to build a trust relationship that will pay dividends down the line. Consider the development vs. operations silo for example – how nice would it be if the two groups actually worked together on projects right from the start?

So how do you generate excitement? The easiest thing is to talk to your co- workers. Write a mission statement – or, better yet, draft one in collaboration with some potentially interested parties. Ask questions, discuss ideas, and continuously act on that feedback. Share design documents, proofs of concept, and initial implementations. Don’t be shy about unfinished releases or weird bugs, and don’t let perfection be the enemy of good – the important thing here is that both the features and the bugs are clearly explained.

Whatever the scope of your next project, try sharing the idea with your co- workers (or whomever) before you get started and see how they react. Chances are, you’ll get their support a lot more easily that by showing them hundreds of lines of script that don’t yet solve the whole problem – and they probably won’t nitpick about your chosen scripting language or variable naming while reading the mission statement either.

Of course for others to be able to fix bugs in your code, they need to know how to get that bug fix back into the code. Which brings us to our next topic”¦

Establishing community guidelines

A simple search with the term “community guidelines” will reveal hundreds of thousands of examples from across the web. They encompass, either implicitly or explicitly, communities of just about every conceivable size on more or less every topic there is. It’s difficult to find a legitimate site that doesn’t have one – but, what are they? Simply stated, they are the basic tenets that frame the social contract between all of the members of the community, and they serve as ground rules to help that community to interact well. For example, an increasingly popular type of guideline is the conference code-of-conduct, which most conferences have now to help ensure that respectful and inclusive behaviour prevails.

Responsible community management is, in many respects, predicated on the existence of community guidelines. It’s one of the most important tools that leaders can use to act decisively and consistently.

When technical folk collaborate, be it in writing code or in administrating a server, we all tend to do things slightly differently from one another. How so? A simple question: where do you put custom binaries on a server?


Now, chances are you’re thinking “I don’t need to write out guidelines – we’re a tiny group”. You may well be right – your group probably can function just fine, for now; however, consider when a new person is joining your team. They will need to understand “how to do stuff around here”. Thinking further, the future could include a larger expansion, perhaps open sourcing a tool you wrote, or your company being acquired or merging with a larger one. In these cases, having planned ahead and written down the key points on how to work together could be a life saver – for both yourself and the new arrivals.

As with everything, the key is to start simply: put a minimum viable set of guidelines together and then iterate as necessary. Some good starting points might include things like “all code must live in a given version control system.” That’s a simple, effective guideline that is both easy to follow and hard to misinterpret. Stipulate the conventions up front: paths on file systems, tabs vs. spaces for indentation, and so forth. That’s a great start.

Next, explain how to contribute; this could be by sending a pull request on GitHub, a patch by email, or whatever. The important thing is to set up a consistent framework. Elucidate expectations for responses to contributions, too – it’s better to say something like “I only review patches on Mondays, so please be patient” than to leave an enthusiastic first time contributor hanging for six days. Examples of such standards documents (that have open source licenses) include Google’s style guide, or for a more simple example, Rudder’s technical guidelines.

These contribution guidelines obviously tie into lowering the barrier to entry to your project too (see previous section). The underlying goal is to facilitate collaboration by avoiding pointless debates about which indentation style to use, when the project could be accepting contributions and moving forwards. There are no hard and fast rules governing the content or structure here, but as a general rule the idea is to be concise – use clear, brief language that expresses the core principles accurately.

Similar guidelines can, and should, be drawn out for social interaction. Don’t fall into the trap of defining complex rules that are hard to follow and, most importantly, assume good intentions by default. Great relationships are built on trust, and the best way to get trust is to give it in the first place. There is really no one-size-fits-all set of rules and each community will grow around the shared values that have particular meaning to them; however, it is very important to point out what is not OK. Again, exact details here depend on who you are as a community but make sure that as a bare minimum you explicitly forbid any kind of discriminatory behaviour or harassment.

Last but not least, remember that as one of your community’s leaders – however big or small that community is – you should be setting the example through action. Respect your own conventions. Welcome newcomers. Communicate clearly, consistently, and often. Be mindful that your words carry weight, and choose them deliberately and carefully when you write every email, blog post, or README.


The web is full of resources about community management – if you’re curious to learn more, some good starting points (for system administrators) include the Producing Open Source Software book, Eric S. Raymond’s The Cathedral and The Bazaar, and The Mozilla Manifesto. Dawn Foster’s blog also has some great insight on managing a community from a sysadmin perspective.

Devopsdays Belgium 2014; recap

Devopsdays Belgium 2014 was held over 27 and 28 October in the charming town of Ghent (or Gent, or Gand), Belgium.  This event marked the five-year anniversary of the Devopsdays series of conferences – and with such an important milestone in play, expectations were running high.  I’m happy to report that those expectations were met.

First off, the basic details:

  • The Devopsdays format is half-day presentations and half-day open spaces (the programme is available here).
  • All of the presentations were recorded by BMC and are available on Ustream for your viewing pleasure.
  • Perusing #devopsdays on Twitter is probably the fastest way to tap into the over-mind of the conference attendees, both in real-time and in retrospect.
  • The popular Hangops show/podcast did a recap of the event featuring speakers, attendees, and even Patrick Debois himself (if briefly).

Looking at the programme, it’s interesting to note that there were no strictly-technical talks (the same was true, more or less, for the ignites as well).  Broadly speaking, the topics ranged from management skills, to brain chemistry, and as far as a post-modern analysis of the Devops movement itself.  Indeed, in terms of content, this was one of the most human-centric conferences I’ve ever attended.  Five years ago this wouldn’t have been the case and it is an indicator of just how far we’ve come as a community – in other words, huzzah, we’ve finally managed to stop talking about tools constantly.

Concerning the presentations specifically, much has already been written, and I don’t feel like I can add much more to the existing discourse.  While I would recommend watching the videos themselves, for those looking for an executive summary of the two mornings, you could do worse than Carlos Sanchez’s posts at InfoQ here, here, and here.

The best part of almost any conference, for me, is the so-called hallway track: those serendipitous encounters near the coffee machine or spontaneous introductions around a lunch table.  In many ways, the open space format is a formalisation of the hallway track, and one that works for everybody (not just people who are comfortable approaching strangers).  In either case, when combined with the highly-local nature of Devopsdays, it made for a great opportunity to meet and speak with people whom I’d normally never encounter – the perfect environment for challenging discussions and generating fresh ideas.

A great example of this environment came courtesy of Bridget Kromhout, who (bravely) challenged the conference at large to use more inclusive language, especially when referring to groups within our industry (i.e. say “people” instead of “guys”).  The subsequent conversation was civil, reasoned, thought-provoking, and – notably – itself inclusive, not only in terms of gender, but culture as well.  This was interesting since it reminded everybody that, hey, different languages have different ways of dealing with this, and that while the problem exists everywhere, specific characteristics can vary wildly from region to region.

I also had the pleasure of meeting Dave Mangot who turned me on to something called the “1-1-1 model” (as pioneered by Salesforce).  Basically the idea is to donate 1% of profits, product, and time (respectively) to charitable ends.  This got us thinking about how we could apply the model to Devopsdays, and by extension, tech conferences in general.  Put another way, is it possible to leverage our industry’s many social gatherings as a force for social good?  I would submit that the answer is a resounding yes.  For example, following the model, a given conference could:

  • Donate 1% (or more, there’s no upper limit!) of the profits to charity.
  • Set aside a pool of tickets for people who otherwise wouldn’t be able to afford to attend – or better yet, put together a grant programme to help for the cost of transport, accommodation, etc.
  • Facilitate an open space where attendees can volunteer to assemble care packages, sort clothing or food donations, or other types of manual activities along those lines.

Personally, I’d love to see Devopsdays adopt a model such as this one.  Given the local, grassroots nature of the conference series, it seems like a perfect fit.  Perhaps even the upcoming Devopsdays Paris, scheduled for April of 2015?  Something to think about.

As always, if you want to chat, feel free to leave a comment, or hit me up on the twitters.  Salut!

DevOpsDays Paris 2013, the event that was

I recently helped organise the inaugural Paris edition of the popular DevOpsDays series of conferences.  It was a great event, and I’d like to share some of my thoughts and observations here.

The DevOpsDays series of events is, as the name implies, centred around the “devops” movement, and is intended as a way to introduce people to this style of IT workflow, project, and people management.  Since this is not related to a programming language, this conference falls outside of the normal sorts of events that Mozilla generally finds itself involved in.  I believe this to be a very good thing, and I am proud to have represented Mozilla both as a sponsor of the event, and as a European devops community member.

The audience was mostly French in composition with a not-insignificant number of attendees from both Francophone and non-Francophone European countries.  Said audience was a healthy mix of developers, IT operations, and managers across a wide spectrum of company sizes, types, and even industries; frankly, I was impressed at how broad the composition was, and it was refreshing to see interest in the devops movement from such a wide group.

Though the event was held in Paris, all of the talks (with the exception of some of the ignites) were done in English.  This was by design – a internally contentious decision that, in my opinion, ultimately proved itself to be the correct one.  The open spaces during the afternoon were in a mix of English and French in order to ensure that everybody could participate equally.  Concerning the open spaces, we weren’t sure if the format would work here in France, but they were a smash success!  Everybody seemed to really enjoy the format as a platform for discussion, debate, and idea-generation.  I’d wager that for many of the attendees, it was the first time they’d ever been exposed to such a thing, and my hope is that they can bring the format to others in the future.

Since devops is so new to France, the majority of the presentations themselves were entry-level, and thus not particularly interesting to me directly.  That said, there were two presentations that really stood out (and would have held their own even at a more “advanced” event): “CustomerOps” by Alexis Lê-Quôc, and “Map & Territory” by Pierre-Yves Ritschard.

Alexis’ presentation on “CustomerOps” centred around the concept of providing customer support using engineering principles – and, indeed, delivered by engineers themselves.  This really hit home for me because in Mozilla IT/Ops, we’re not only the people who build and provide technical infrastructure, but are also the people who provide direct support to the consumers of that infrastructure – a situation that is absolutely not a given in many other companies (i.e. the admins and the customer reps are not the same people).  Alexis illustrated the importance of communication, and how to measure success (read: customer satisfaction) in meaningful ways.

Pierre-Yves’ presentation was based on a very interesting philosophical conjecture: that our mental model of the world is not the same as the reality said model attempts to describe.  Put another way, a map isn’t actually land, it’s a representation of the territory it describes (hence the title).  Therefore, the most valuable models are the ones that can describe reality in useful ways, and it’s in defining “usefulness” that the real effort must be made.  In a more applicable sense his thesis was simple: identify your “key metrics” – the numbers which literally describe the success or failure of your business – and make sure you are collecting, analysing, and modelling them above all.  Every other metric is either secondary or potentially uninteresting in the first place.

Personally, I spent a lot of time mingling with the attendees, talking about Mozilla, our projects, and our mission.  Generally speaking, the first question was, “Can I have one of those Firefox stickers?”, but the second question was, “When can I get my hands on a FirefoxOS phone?”  As usual, everybody wanted to see one, and (unfortunately), as usual, I didn’t have one to show them.  The more events that I attend on behalf of Mozilla, the more I realise that continues to be a wasted opportunity to promote our most exciting new project.  I’ll have to work on this going forward.

Of course, since this was a devops-related event, people were also very curious about if and how Mozilla is implementing devops internally.  The overarching theme of devops is communication, so this event was an excellent opportunity to talk about IT at Mozilla, and to promote not only our successes, but dig into our failures as well.  This sort of interaction is vital in order to avoid stagnation.

In summary, it was a fine showing for our first Parisian event, and I am looking forward to the next edition.  Hopefully I’ll see you there!

hangops, european edition

Today roidrage and I hosted the inaugural European edition of the popular Hangops podcast / hangout sessions.  Hangops.eu (as we’ve taking to calling it) is functionally the same its North American counterpart – with the exception that “our” 11:00 is that of Western Europe, not California. 😉

If you’re not familiar with the Hangops, now’s the time for you to get with the programme.  The format is simple : get a group of talkative Ops people together on Google Hangouts, and see what happens !  Sometimes there are moderated talks, other times there are special guests, but the basic idea never changes – it’s fundamentally a chance for nerds to present ideas, debate topics, learn a little something, and have a good time.

The sessions are simulcast to Youtube, so if you don’t want to join the hangout itself, you’re invited to listen and participate in the IRC channel (freenode/#hangops) at your leisure.

Hope to see you out next time!