Puppet at Mozilla, the podcast

Hello everybody !

If you’re interested in learning a little about Puppet at Mozilla, I highly recommend that you check out Puppet Podcast #7, where Brandan Burton (@solarce) and I (@phrawzty) talk with Puppet Labs’ Mike Stahnke (@stahnma) about just that topic.  It was our first podcast together, and frankly, it was more difficult that I thought it’d be.  That said, it was a great experience, and I hope to do it again sometime.  Hope you enjoy it !

Send your logs to the cloud; Loggly vs. Papertrail

N.B. This post is from 2011 – the landscape has changed since then…

 

Centralised cloud-based logging.  It sounds tasty – and it is – but who should you go with?  Well, Loggly and Papertrail are the only games in town when it comes to the aforementioned service; the only other competitor in this space is Splunk Storm, but their offering – well-pedigreed though it may be – is strictly in private beta at this time, and therefore cannot really be considered a valid option.

The fact of the matter is that Loggly and Papertrail are, at a high level, functionally identical. They offer more or less the same bouquet of functionality, including alert triggers, aggregate visualisation, and even map reduce tools for data mining and reporting. Loggly has been around longer, and has a better track record for open-source involvement, meaning that the eco-system around their service is more mature; however, that doesn’t mean that they are necessarily superior to Papertrail in terms of the actual service.

My suggestion: If you’re in a hurry, flip a coin and go with one or the other. If you have the time, you should go ahead and try both out for a bit; Papertrail has a 7-day free trial programme, and Loggly is free (in perpetuity) for sufficiently small amounts of data and retention (which is no problem if you’re just poking around).

I’m very interested in hearing about actual user experiences with either or both, so please don’t hesitate to add a comment or drop me a line directly via the contact form.

Edit: From @pyr : « you  can also consider @datadoghq which has a different take on the issue but might fit the bill. »

Edit 2: From the comments, there’s also Logentries, which I don’t personally have any experience with, but which appears to offer a reasonably comprehensive offering as well.

Heavyweight tilt : GitHub vs. Bitbucket

When it comes to code hosting on The Internets today, GitHub is absolutely the hottest, trendiest service going – but it’s not alone. Right now, the primary direct competitor to GitHub is Bitbucket, and choosing the best service for you or your company can be a less than obvious scenario – so let’s break it down, shall we?

GitHub is generally considered to be the most popular code hosting and collaboration site out there today. They have an excellent track record for innovation and evolution of their service, and they put their money where their mouth is, notably by promoting and releasing their own internal tools into the open source community.  Their site offers a buffet of ever-improving facilities for collaborative activity, notably including an integrated issue tracker and excellent code comparison tools, among others. To be fair, not every feature has had the same level of care and attention paid to it, and as a result, some elements feel quite a bit more mature than others; however, again, they never stop trying to make things better.

Bitbucket looks a lot like GitHub.  That’s a fact.  I don’t honestly know which one came first, but it’s clear that today they’re bouncing off of each other in terms of design, features, and functionality.  You can more or less transpose your user experience between the two sites without missing too much of a beat, so for a casual user looking to contribute here and there, you get two learning curves for the price of one (nice).  Bitbucket’s pace of evolution is (perhaps) less blistering, but they too are capable of rolling out new and improved toys over time.

let’s get down to brass tacks

Both services offer the same basic functionality, which is the ability to create an account, and associate that account with any number of publicly-accessible repositories; however, if you want a private repository, GitHub will make you pay for it, whereas BitBucket offers it gratis.  There, as it is said, lies the rub.  More on this later.

One of the big differences between the two services lie in their respective origins: GitHub remains an independent start-up, whereas Bitbucket (although once independent) was acquired by – and is now strongly associated with – Atlassian (of JIRA fame). It is my opinion that this affects the cultural make-up of Bitbucket in subtle ways, leading to a more corporate take on development, deployment, and importantly, community relations and involvement.  Take a look at their respective blogs (go ahead, I’ll wait).

A quick scan of the past few months from each blog will reveal some important differences:

  • GitHub’s release schedule is more aggressive, with improvements and new features coming more regularly, whereas Bitbucket places greater emphasis on their tight integration with JIRA, Jenkins, and other industry tools.
  • Bitbucket advertises paid services and software on their blog, whereas GitHub advertises open source projects.
  • Bitbucket’s blog has one recent author, whereas GitHub’s blog has many recent authors.
  • GitHub hosts more community events (notably drinkups, heh) over a greater geographic area than Bitbucket (and their posts have more community response overall).

Also, check out GitHub’s “about us” page – brogrammers abound!  I’d compare the group to Bitbucket, but as it so happens, they don’t have an analogous page.

Previously I mentioned that GitHub would like you to pay for private repositories.  This is obviously part of their revenue scheme (and who can blame them for wanting to get that cheese?), but it also has the side-effect of making people choose to willingly host their projects publicly.  This has ended up creating a (very) large community of active participants representing a variety of languages and interests, which in turn results in more projects, and so on and so forth.  This feedback loop is interesting since it auto-builds popularity: as more people use it, the more people will use it.

These observations are, in no way, objective statements of the superiority of one platform over the other – they are, however, indicative of cultural differences between the two companies.  This is (or, at least, should be) a non-trivial element when deciding which service is right for you or your organisation.  For example, I’m a beer-drinking open source veteran that works in start-ups and small companies, so culturally my preferences are different than those of a suit-wearing system architect, working for a thousand-person consulting firm.  One isn’t necessarily better than the other – they’re just not the same (and that’s OK).

but wait, there’s more

Alright, here comes the shocker: for paid services (i.e. private repositories), GitHub is much more expensive than Bitbucket.  As in nowhere near the same price.  At all.  How can this be?  Well, I’m not privy to the financials of either company (if I were, I doubt I’d have written this post), but hey, the money for all those great open source projects, drinkups, and (bluntly) salaries have to come from somewhere – and while Bitbucket has Atlassian’s pockets backing them, GitHub has to stand on their own successes, and live with their own failures.

The two services are not dissimilar technically speaking, so it’s really up to you to decide which culture is better suited for your project.  Do you just need a spot to put your private project, that you program alone, isolated from the greater Internet?  BitBucket.  Do you have a public project that you’d like other people to discover, hack on together, and build a community around?  GitHub.  As for paid services, well I suppose that comes down to whether you want to pay extra to support what GitHub is doing or not.

Now, let’s be fair, for a lot of companies, “culture” is an irrelevant factor in their purchasing department – cost is the only concern.  Fair enough.  But let’s say you’ve got a team of developers, all of whom already have their own projects on GitHub, are familiar with the tools and processes, and have a network of fellow hackers built-in and ready to go.  In that case, perhaps culture is worth something after all.

Improvements in Cassandra 1.0, briefly stated

Datastax recently announced the availability of Cassandra 1.0 (stable), and along with that announcement, they made a series of blog posts (1, 2, 3, 4, 5) about many of the great new features and improvements that the current version brings to the table.

For those of you looking for an executive summary of those posts, you’re in luck, cause I’ve got your back on this one.

  • New multi-layer approach to compression that provides improvements to both write and (especially) read operations.
  • Said compression strategy also yields potentially significant disk space savings.
  • Leverages the JNA library in order to provide in-memory caching; this procedure is optimised for garbage collection, resulting in a more efficient collection and a smaller overall footprint.
  • A much improved compaction strategy results in less costly compaction runs, improving overall performance on each node.
  • Fewer requests are made over the network, and said requests are smaller in size, improving overall performance across the cluster.

In short, 1.0 is a very significant, very important upgrade to 0.8 (et al.), and one which will likely bring it to the forefront of the hardcore big data / nosql scene at large.

pohmelfs update

Hello again ! You may be wondering when the next update in the POHMELFS series is coming – well, rest assured that i’m working on it even as you read this, and that it will be worth the wait.

Remember that we’re working with Staging-level code, and that sometimes things don’t always go as well as one might hope – in this case, there are some discrepancies between the code that the POHMELFS devs are using, and that which was released in the 2.6.30 code (according to the devs, at least).

I’ve recently received a nice new patch from one of the devs, and once i’ve got that squared away, we’ll continue with our exercise.

how to be properly lazy, with perl !

One of the wonderful things about Perl is that it enables the busy System Administrator to be lazy – and that’s a good thing ! Of course, i don’t mean lazy as in unmotivated, or possesed of a poor work ethic, i mean it in the sense that Perl lets us do as little work as possible in a wide variety of situations. Let’s examine this idea, shall we ?

In the computer world, one often finds themselves doing the same sorts of things over and over again, such as adding a new user to the network, or verifying that the backups executed properly. Usually, these are relatively simple processes which are less about problem solving, and more about following the same set of steps over and over until the desired goal is attained. It is in these situations that the (properly) lazy admin identifies a way to automate as much as possible these processes, so that he or she can get back to more brain-intensive work (this has the net effect of improving overall efficiency and value – see how laziness pays off in the end ? 🙂 )

There are, of course, as many scripting and programming languages as there are grains of sand on a beach, but despite the many competitors and alternatives out there, Perl remains the language of choice for many Linux admins around the world. This is in no small part due to Perl’s ability to manipulate data in a rapid, logical, and easily deployable manner – the most obvious example of this being the vaunted « Perl One-Liner ».

example !

There comes a time in every admin’s life when they must take a bunch of text files, and systematically swap some of the text within with new data – commonly known as searching and replacing.  You could certainly do this by hand using an editor or by using a relatively straightforward C program if you were so inclined.  But there is another way – a better, smarter, lazier way : the Perl search & replace one-liner !  Let’s take a look at the code, then break down each component.

$ perl -p -i -e 's/oldandbusted/newhotness/' *.txt

That’s it, you’re done – take a lap and hit the showers.  So, what exactly just happened there ?  We employed a classic and very common usage method in command-line Perl which can easily be remembered as « pie » :

  • « -p » : In a nutshell, this tells Perl to loop through each line of input, then perform the desired action (in this case, the search & replace) against each of those lines.
  • « -i » : This instructs Perl to actually edit the input files directly (or « in place »), instead of just displaying the changes on the screen.
  • « -e » : This describes exactly one line of code – in this case, the search and replace regular expression…
  • « ‘s/old/new/’ » : This is the regular expression (or « regex ») which Perl will use to perform the search & replace.  (What’s a regex ?  Wikipedia has the answers you seek !)
  • « *.txt » : The target filename – in this case, a simple glob.  (What’s a glob ?  Wikipedia has the answer !)

The key to this whole operation was the fourth bullet point – the regex.  Don’t worry if your regex-fu is not yet strong – this is just an example, and it could have been anything – the point is that Perl can be used to rapidly execute regular expressions on data in simple, easy to execute ways, such as the search & replace one-liner above.  This sort of thing comes in handy on a daily basis, and thus, the perl one-liner is a powerful tool in the System Administrator’s toolbox.

For more one-liners, use the Google : http://www.google.fr/search?q=perl+one-liners