Pete Cheslock

DevOps, RelEng, DevTools, Automation, Randomness

Why Your Company Should Have Internal HackDays

Recently the Sonian DevOps team (Yes, we call them our DevOps team - they write/deploy code and manage systems) took part in an internal hackday. Internally we call it a hackday, but if you are going to float the idea to your engineering management, calling it a “codefest” might be an easier sell.

About a month ago my Jira board grew with more and more stories asking for new or monitors, bug fixes and additional metrics for Sensu. We’ve had a few large projects start recently with came with tight deadlines and large resource needs, so I didn’t expect to complete these stories for at least a few months. Based on the schedule, I thought if we can have everyone spend one day to work on some of these stories/tickets, the larger projects shouldn’t be delayed.

We have engineering-wide codefests during our company meetups three times a year. These are fantastic opportunities for team members across all parts of our engineering teams (devs and non-devs alike) to work on and present new solutions and ideas to the entire company. We needed a day to hack on a specific project (our Sensu monitors and metrics), and I didn’t want to waste codefest on that. I needed a separate day, just to hack on a specific project.

I presented the idea to the head of engineering, who signed off on our experiment as long as we had some clearly defined goals we were working towards. I wanted specific tasks defined before we started so that we wouldn’t waste time planning when we should be hacking. So, I started planning for our hackday about 3 weeks in advance. I made sure to coordinate with all the rest of the teams leads to make sure they were aware and would send any requests directly to myself for the day. I would then act as the buffer for any 3rd level support or other assistance our DevOps teams provides.

Over the next 3 weeks, everyone on the DevOps team added ideas to a wiki document; things that were keeping them up at night, code they wanted to refactor, etc… Two days before our hackday, we reviewed the list of items and assigned ownership to all the tasks (Pairing was encouraged). This allowed the geographically diverse team to be able to hit the ground running when they logged on for the day (Since everyone starts working at different times). On the day of our hackfest - everyone had their task list and began working on the various fixes we had lined up. We had a quick standup mid-way thru to check on status, blockers, etc, and to make sure we were making progress.

So - what were the results?

Comparing the diff for our branch on github - we saw the following: 47 changed files with 689 additions and 478 deletions

Now - what did we really get done?

  • We created 7 new monitors for things we had not tracked in the past
  • We converted the last few of our nagios checks over to use the sensu-plugin. Huzzah! - no more Nagios anything!
  • We consolidated 5 monitors down to one which simplifies our codebase while still providing full insight into our systems.
  • We cleaned up our metrics - removing nearly 20 separate ones that we found did not provide us with actionable information. And since we use Librato - this is real $$$ saved. But instead of saving that money - we decided to reinvest and increase the metric check interval - so our metrics could capture more data more often.
  • We added new metrics - creating about 12 new metrics for our indexing clusters - everything from memory and load as well as application level data, health etc…

Overall - the team enjoyed taking a break from our big projects to fix things that keep us up in the middle of the night (literally). And I enjoyed a much smaller backlog of work as well as personally fixing and adding new sensu checks.

We were able to complete about a week’s worth of work in a single day. We added in a few hours of testing and validation, and deployed the code out to production the next day (Oh, the joys of DevOps). The success of this project has already sparked the interest of other engineering teams who are considering doing similar hackdays. My team has already begun to discuss doing this again once every couple months with different focuses each time.

Sonian is continually committed to Sensu as an open source project - which is why we share many of our checks with the rest of the community - you can check out many of the new and refactored checks (the result of our hackday) by heading over to Sensu’s community plugins page.