I’ve worked at a good number of startups in my career (eight of them by last count). And every single time that I go and join a startup, I spend countless hours trying to learn more about what kind of information I need to get to understand my stock options grant. Startup fundraising and how stock works at a startup can be a challenging game to win since none of the participants are willing to share any information that might weaken their position.
While scrolling through Twitter the other day, I ran across this tweet, which talked about how it’s a privilege to leave a job when you maybe disagree with a decision or a course of action. Hi friends, just a friendly reminder that not everyone has the privilege to leave a job when they disagree with a decision. There are many reasons that folks stay at a job, and it does not mean they agree with what's happening there.
Back in 2009, I joined a startup that wanted to solve a problem around email archiving. At that time Amazon Web Services had only been publically available for a few years – with SQS, EC2, and S3 being a few of their very early services. This tiny company wanted to use the economies of the cloud to solve the problem of storing data for long periods of time. Being able to scale compute and storage as the company grew was a huge reason that they were one of the earliest and largest users of S3 and EBS.
TL;DR - You are not going to get rich at your startup. Well, you might, but the odds are about as good as winning the lottery. (So don’t make THAT the reason you go work at a startup) Edit: If you want to skip this most and learn more about how startup equity works - check out this awesome Github repo: https://github.com/jlevy/og-equity-compensation This whole thing started when I saw a blog post titled “Advice to Grads: Join A Winning Startup”.
This was originally posted on the Threat Stack blog - added here for continuity. DevOps is a term that has absolutely blown up in the last 5 years. As someone who’s been involved with that community from the earlier days, it’s been interesting to watch the conversations around DevOps evolve over time. For many people, they had an immediate adverse reaction towards Yet Another Buzzword – especially when the core concepts that people described as being “DevOps” were things that many people had already been doing for years.
This was originally posted on the Threat Stack blog - added here for continuity. Early on at Threat Stack, we focused on giving engineers the tools and ownership over their applications that would empower them to deploy and manage their applications in a safe way without causing customer downtime or other issues. As a small, but rapidly growing company, this is necessary for survival. For most of the last four years, Threat Stack has only had a two- to three-person operations team.
This was originally posted on the Threat Stack blog - added here for continuity. One of the most important things that any company can do to benefit from DevOps is define and implement useful, actionable metrics for visibility into business operations. This is already standard practice in most areas of the average organization. KPIs drive sales and marketing teams, finance groups, and even HR. Yet, at many companies, having metrics for the application that brings in the money is an afterthought — or is not prioritized at all.
This was originally posted on the Threat Stack blog - added here for continuity. Many organizations struggle with how and when to deploy software. I’ve worked at some companies where we had a “deploy week.” This was at least a week (or sometimes even longer) that was completely devoted to deploying huge amounts of software. The changes were so large and complex that deploying them would cause massive amounts of pain and suffering.
This was originally posted on the Threat Stack blog - added here for continuity. As Senior Director of Operations at Threat Stack, I am repeatedly asked one question by our customers: “How does Threat Stack ‘do’ DevOps?” One of my long-time pet peeves has been the abuse of the term “DevOps.” You can be a DevOps engineer, you can be a Director of DevOps, you can buy DevOps tools. But when people ask me “How does Threat Stack ‘do’ DevOps?
There is an immense value to taking time off. It’s not something that can be measured. But there is something genuinely amazing that happens when you stop working. The most vacation I’ve ever taken while employed was 3 weeks. And I was only able to get that because I negotiated it in as part of my employment package due to an upcoming wedding and honeymoon. But even then, taking time off was much, much different than my most recent experience with Funemployment.
Friday, May 16th was my last day working for Dyn.
Monday (today), May 19th is the first day I’ve been unemployed in nearly a decade.
No plans. No job. No idea what my next step is going to be.
I used to be Director DevOps…Twice. Both times I changed my title later. I even ran a DevOps team - although the team was already called that when I took over.
I have fallen victim to the the abuse of the DevOps title. And I see it all the time; in the devops twitter hashtag, people contacting about DevOps job opportunities. It’s got to be a real thing, right? I mean, we totally have people that are doing DevOps these days, right?
I recently lead an Open Space session on this very topic at DevOpsDays Austin. We talked for about 40 minutes, with about 35 people in attendance and I wasn’t able to find anyone with an dissenting option. Now that doesn’t mean that I’m right, or even that everyone in attendance is right. But there is one very important reason why you might want to think twice before using “DevOps” as part of your job title.
There was an interesting topic that came up in last Fridays' HangOps session that I wanted to expand on in a blog post. We were talking about Gene Kim’s new book The Phoenix Project, and I mentioned how I believe an interesting point in the book was how they stopped work in order to get a handle on their work in progress (WIP). I had some unique experience in doing something similar while I was running the Ops team at Sonian.
I grew up in Michigan. The city was like most mid-west cities, as you drove 20-30 minutes outside in any direction you would likely hit farmland. I had a few friends over the years that grew up on farms, or simply large plots of lands where they were able to keep animals like chickens, cows, goats, deer, etc… One of my friends learned at an early age to not name the animals.
One of the most important parts of any organization is the need for accountability. It’s not just for the Sales teams to be accountable for the amount of revenue they are generating for the company or the Marketing team to be accountable for the number of leads they are bringing to the organization. It’s paramount that everyone feels that they are accountable for their actions. Many TechOps teams manage complex environments and with more people using configuration management (such as Chef, Puppet or CFEnging) there can be disastrous consequences for even the smallest of changes.
One year ago today - Sonian released Sensu into the wild and into the arms of the open source community. It has been amazing to see how far along this project has come from the early days of Aug 2011 when we started. I still remember the conversation I had on standup that day. I was laying down my normal daily diatribe about my hatred for all things Nagios and posed a simple question to the team, what should we do now?
When running a full physical infrastructure, the idea of cost is one that comes up during the procurement process. Calculating out the needs of the business, the expected growth, and purchasing systems with enough advanced notice that you will be able to meet those expected demands. I’ve been out of the physical infrastructure world for over 3 years now, but in the past we would purchase hardware, and depreciate that over a 3 year time period. If you were a particularly analytical person you could easily determine the per month, per day, and per hour cost of your infrastructure, but largely it would not matter much as they money has already been spent. When it comes time to add new services or applications you only need to determine if the existing systems have the available capacity (memory, cpu, disk IO) in order to support that application. If your existing systems can handle the load from the new app, then essentially the cost of that application is near to zero (not counting admin overhead).
I was very excited when I heard the recent news regarding Google’s entry into the public cloud provider space. Right now there are only a handful of true public clouds out there (AWS, Rackspace, IBM SmartCloud, Microsoft Azure and a couple others), with Amazon’s AWS platform holding a significant advantage over any of it’s current competitors. When I started looking into their pricing and features I was a bit disappointed (I understand it’s still in Beta).
A little over one year ago - I was chosen to take a new role within Sonian to lead our Development Operations team. Previously at Sonian I had a role that changed constantly, from being the technical lead with our sales and business development teams to taking 3rd level support issues and working with our OEM partners on custom API integration options. The goal for this new role was to bring some structure to the team, attract new talent and retain existing talent, and ensure the overall health and efficiency of our application.
Recently we were testing with AWS VPC, and a requirement for our project was that we needed to allow nodes within a VPC access to S3 buckets, but deny access from any other IP address. Specifically this was accessing of data that was going to be secured using AWS IAM keys. We needed to make sure that even with the AWS access key and secret key, data could only be retrieved while inside the VPC.
Recently the Sonian DevOps team (Yes, we call them our DevOps team - they write/deploy code and manage systems) took part in an internal hackday. Internally we call it a hackday, but if you are going to float the idea to your engineering management, calling it a “codefest” might be an easier sell.
About a month ago my Jira board grew with more and more stories asking for new or monitors, bug fixes and additional metrics for Sensu. We’ve had a few large projects start recently with came with tight deadlines and large resource needs, so I didn’t expect to complete these stories for at least a few months. Based on the schedule, I thought if we can have everyone spend one day to work on some of these stories/tickets, the larger projects shouldn’t be delayed.
We have engineering-wide codefests during our company meetups three times a year. These are fantastic opportunities for team members across all parts of our engineering teams (devs and non-devs alike) to work on and present new solutions and ideas to the entire company. We needed a day to hack on a specific project (our Sensu monitors and metrics), and I didn’t want to waste codefest on that. I needed a separate day, just to hack on a specific project.
We manage our AWS assets across many different accounts. This helps us keep data and access controls separate depending on the type of data we are controlling. One of our AWS accounts is a non-production account where we spin up and down test systems to support new feature testing and other activities to support development. Our build cluster (which lives in a separate AWS account) needs access some S3 buckets that live in our non-production account.
After quite a long absence from blogging I have decided to return to discussing some of the new and (hopefully) interesting technologies that my team and I work with. I work for Sonian as the Director of Technical Operations, and I manage 5 very skilled individuals who assist me with the operation of our systems in the cloud. We work with lots of cutting edge technology (Such as Chef from Opscode), and where there is no software to do what we need - we create it ourselves (See Sensu for an example).
Amazon recently announced a new tier of storage available within their web services cloud infrastructure. Amazon’s current storage solution, S3, is truly the gold standard for durable cloud based storage that provides 99.999999999% durability (which if my math is right, means that for every 100 Billion objects stored in S3, Amazon “may” lose a single object every year). Amazon is listening to their customers, and now provides a lower cost (33% cheaper) S3 storage solution called Reduced Redundancy Solution (RRS).
I had started this blog initially as a way to discuss storage and virtualization solutions while working as a technology consultant. But recently a new opportunity presented itself, and I’ve now made the transition out of consulting, and back to the start-up world. This most recent adventure is with a company called Sonian which provides a cloud based data archiving and eDiscovery solution. What is so wonderful about this new venture is we leverage the Amazon Web Services cloud providing us the ability to consume storage and computing by the granule. We don’t need to make huge capital outlays in data centers, storage, servers, etc… And since we don’t need to buy and maintain all this hardware (which will eventually be refreshed in 3-5 years), we can keep the costs low and pass on those savings to our customers.
A few months before the vSphere release VMware showed some amazing stats in regards to the increased level of I/O that can be attained in a virtual infrastructure. They posted this info on their blog and the outcome of the testing was impressive. They were able to achieve 350,000 I/O operations per second on a single vSphere host (ESX 4.0) and with just 3 virtual machines. Their testing utilized the EMC Enterprise Flash Drives, which have an incredibly high throughput. They talked about how the VMware Paravirtual SCSI (PVSCSI) adapter was able to achieve 12% more throughput with 18% less CPU cost compared to the LSI virtual adapter.
vSphere was just released to general availability today, and one of the best features of this upgrade is the addition of VMware Fault Tolerance. From the VMware site:
VMware Fault Tolerance is leading edge technology that provides continuous availability for applications in the event of server failures, by creating a live shadow instance of a virtual machine that is in virtual lockstep with the primary instance. By allowing instantaneous failover between the two instances in the event of hardware failure, VMware Fault Tolerance eliminates even the smallest of data loss or disruption.
I ran into an very interesting issue today with a client who is using Veeam Backup and Replication to keep their virtual machines replicated to a remote ESX server for disaster recovery. Veeam starts a replication job and will take a snapshot of the virtual machine and then replicate the main VMDK disk file to the remote site. When the backup job finishes Veeam will tell VMware to remove the snapshot until the next replication schedule runs. Since we are replicating our VM’s across a slow WAN connection (600Kbps optimized with Citrix WANScalers) the replication can often timeout, or hang. Today I noticed that the replication had not updated since last night. So I needed to stop the replication and re-start it. Since the Citrix WANScalers can cache as well as compress, restarting a failed replication job is usually pretty quick, as most of the data was previously cached on the Citrix boxes. Here are the details of what I found, and how I fixed it…
I am frequently asked the question about how to grow a VMware virtual disk (VMDK) and have it be recognized by the operating system. If you are trying to simply extend a non-system volume within Windows (ie, anything other than the C:\ drive), then the process is pretty simple (refer to MS KB 325590). But when you are trying to grow a C:\ with windows, you need to get around the limitation of extending the system partition. This is just one more instance where VMware shows how powerful and flexible it truly is.
I just saw today that it looks like VMware ESX 3.5 Update 4 was released a couple of days ago. I’m pretty excited about this upgrade as it includes an updated vmxnet adapter.
From the VMware site:
**Expanded Support for Enhanced vmxnet Adapter **— This version of ESX Server includes an updated version of the VMXNET driver (VMXNET enhanced) for the following guest operating systems:
- Microsoft Windows Server 2003, Standard Edition (32-bit)
- Microsoft Windows Server 2003, Standard Edition (64-bit)
- Microsoft Windows Server 2003, Web Edition
- Microsoft Windows Small Business Server 2003
- Microsoft Windows XP Professional (32-bit)