Hi, I'm Pete 👋

one year in devops a retrospective

7 minutes
May 11, 2012

A little over one year ago - I was chosen to take a new role within Sonian to lead our Development Operations team. Previously at Sonian I had a role that changed constantly, from being the technical lead with our sales and business development teams to taking 3rd level support issues and working with our OEM partners on custom API integration options. The goal for this new role was to bring some structure to the team, attract new talent and retain existing talent, and ensure the overall health and efficiency of our application.

Needless to say - I had a lot on my plate when I first started. Here are just some of the things that I started working on.

  1. Hiring two new team members.
  2. Creating structure around how issues get escalated to us
  3. Revamp and replace our task tracking system
  4. Increase our level of monitoring and automation
  5. Keep our application optimized and highly available

I went down many paths over the course of last year. Some things worked for us, many things did not. It was important that I stayed as flexible and able to adapt to changes. As I look back on the past year, and the successes that we have accomplished as a team, I find there are a few specific things that helped us along the way.

If you want to build the best team - you need to be willing to hire people from anywhere. It is nearly impossible to build a top tier team unless you can be flexible on hiring remote staff. Additionally, look internally for people that may be a good fit. You may find some hidden gems in your organization that you can invest in to build your team.

My entire team works remotely. We keep in contact with IRC, Skype, Google Hangouts. We do standups everyday to review the status of our projects, blockers, etc. Most of the Sonian engineering team is remote, and we bring them out to the HQ in Newton, MA a couple times a year to meet, plan and spend time in person working together. Managing remote teams works if you make it work. It’s not easy, but it’s one of the only ways to build the best team possible.

Every organization needs process, the hard part is creating just the right amount of process without burdening your team with the overhead associated with it. When I started managing the DevOps team, our previous ticket management system had no policy around how new projects can get put on the board. This resulted in a Kanban board that had thousands of tasks that people had placed there over the past many months - some things were simple configuration changes that would take minutes to solve, and in many cases were solved in already deployed code.
I declared a “ticket bankruptcy” for our old system, and helped to move our team’s tasks over to Jira. The rule was simple - if we weren’t going to complete it in 30 days the task died in the old system. After the move I took a hard stance towards the stories that could get put on the board - and I wouldn’t let anyone put a story on the board without talking to me first. This helped us create clearer and more concise stories for the DevOps team - and what we put on our board had more value and could be prioritized appropriately. Over time our processes evolved and this requirement was no longer needed to ensure quality stories were getting created.

Additionally we stopped putting small configuration changes as stories on our board, instead if someone needed a change - they could reach out to us directly and we would do our best to solve the problem right there (often times I would make these changes myself - to keep the team from being distracted). This helped unblock our development teams and increased their story velocity. One thing we strive for in DevOps is to help out our engineering teams with simple changes they need. Instead of having them dump a story on the board where we would “get to it later” - we would help them out on the spot.

I think every team should look at the tools and technologies they work with and ask if they can be provided to the open-source community. When we had reached our pain point with Nagios, we came up with the idea of Sensu. Before even starting the project, I knew this would only work if we created it with the intention of it being an application that we open-source. Luckily, my VP of Eng and CEO both understood the value of open-source applications and the benefit they can provide to the community.

We tasked two of our team members to spend the next few months building out, testing and deploying Sensu to all the nodes in our organization. We continue to make that investment in Sensu my devoting 20% of one of our engineers time to maintaining and leading the community. Over the past year we have also made contributions to Fog, adding in support for IBM SmartCloud, as well as updates to the AWS VPC code. We hope to make available other tools and chef cookbooks to the open-source community.

Ownership of tasks is a big deal for me. Since we write the code that automates our systems as well as deploying all other code, it’s imperative that we own the deploys. Over the past year, we have built a culture around deploying our code and changes quickly (we operate in a 3 week develop/deploy sprint) and methodically. Since we started this initiative about 8 months ago - we have deployed new software to our various stacks without unplanned or unintended results, every 3 weeks.

The solution in the end was quite simple - we created a detailed deployment checklist that required signoff from each team who had code going out the door. But in the end, it was the team of DevOps engineers that were in charge of that checklist and gave final approval (or rejection). When it came time for the deploy, they simply “followed the checklist”. Over time as we increased our automation we required less and less items in our checklists. Changes and optimizations in our deployment procedures helped to decrease our deploy times in half. We got to spend more time writing code and building apps and less time deploying - always a plus.

There is a fantastic book by Robert Greenleaf about Servant Leadership and I try to incorporate his principals into my management style. I’ve detailed below a few parts of the book that I have adapted to my needs, and so far, it works quite well.

  1. Listening - It’s important to listen to your team and help support them in their decision making process, your team has the best chance or growing when they can identify a problem at hand as well as the solution required. Simply telling them what the problem is and how to fix it can only makes them more dependent on you as a manager.

  2. Persuasion - I do my best to help convince my team of the best course of action. I don’t believe that my ideas are always the right ones and welcome anyone on my team to tell me that I’m wrong (I often am). But at the same time, If I feel that a particular path needs to be taken, I do my best to convince them and bring them to my side.

  3. Commitment to the growth of the team - I give my team a significant amount of trust and support to solve problems and complete tasks. They are given the freedom to go and make mistakes and learn new ways of managing our systems and applications. I often urge the team to pair while coding as well to promote learning and knowledge transfer.

Finally I act as the “buffer” for the team. They are most successful when they are able to work without unnecessary distractions. If there are tasks that need to be worked on (such as hotfixes, systems/infrastructure updates or support, etc) I try to complete these tasks without pulling them away from their projects.
We do this to keep support, engineering and our customers happy - which in the end is the most important thing we do as a DevOps team.

Interesting in joining us? We’re always looking for new members to add to our team. Here is a general idea of what we do.