RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...

Friday

Jun252010

Talk notes: DevOps Requires Visbility (#DevOps Day)

Friday, June 25, 2010 at 4:06PM

DevOps Day, Santa Clara, CA
June 25, 2010

I'm here to participate on a panel called "DevOps Outside of Web Operations."

DevOps Requires Visibility

Javier Bowles (Appscio)
Gareth Soltero (SpringSource/VMware)
Jyoti Bansal (AppDynamics)
Matt Ray (Zenoss)
Eishay Smith (KaChing)

Moderator: Damon Edwards (DTO Solutions)

Holy cow. All these companies are hiring. That's a very interesting anecdotal evidence that successful companies embrace the values embraced by DevOps.

Question: "DevOps requires situational awareness, very much like military."
- Eishay: "We do 92 deployments per day. We can stop the line at any time, rollback to a known good state. When we see a breach, we add a new layer of controls."
- Matt: "Situational awareness requires instrumenting bare metal, etc. Monitor everything."
- Jyoti: "Three layers: business metrics, ..."
- Javier: "You can have all the visibility in the world, but you have the keep the audience/consumer in mind. You can end up with a lot of misaligned interests. You can have complete visibility, but if you can't prove that a certain area ISN'T THE PROBLEM, then what good have you done?" (haha)
- Gareth: "challenge: how do you represent all this data?"
Question: "Data everywhere, but not lots of knowledge. We sit up here and talk about tying up to the business metrics, but as an industry, we seem to not do it. New vendors are coming, but why is it different this time, and what causes the disconnect?"
- Javier: "Web Ops really is the business, so they have a better chance at making good metrics than the traditional IT ops organizations. Culture of instrumenting and manageability was shared equally between Dev and Ops. Dev would be reluctant to deploy code without certain amount of telemetry. That's Dev's job, not the product manager's job: he/she doesn't know how. Only Dev does."
- Jyoti: "late binding of business metrics is sometimes useful."
- Matt: "Ops will often challenge Dev on answering what metrics needs to be exposed: so we can tell when we're slow, both at hardware and app level. These people know that they can't deploy with Ops buying in."
- Eishay: "Easy. Hire best engineers. Then give them root passwords. Ops then becomes the team who automates things for Dev. Engineers know best what data needs to go where. Enable engineers to do what they want, and don't get in the way. If you can't trust the engineers, then fire them." (Holy cow. )
Question: "what are some of the best practices for creating a culture of visibility?"
- Eisay: "Culture of quality. When test fails, there are blinking lights everywhere. Penguin shouts 'the build is broken.' Everything revolves around this, knowing we can't release something if it's broken. That's when it's ingrained into the culture."
- Matt: "As open source project, you want to foster sharing. 'Here's how I did it, here's how you can do it.' That's the culture we need to build."
- Jyoti: "Info sharing is a key cultural attribute, as opposed to info hoarding. That inhibits visibility. At Netflix, what impresses me is visibility at top-level. In lobby, you see user-satisfaction scores, where the bottlenecks are, and shared throughout the organization. Avoids the blame game."
- Javier: "Money. Pay people well. When hiring for ops, QA and engineering, and engineering is hoarding all the money, because dev is expensive. The truth is, that's a recipe for disaster, because you have dev rockstars that rule the roost. Ideally, incentives and compensation are structured so people link site outages, release performance hit pocketbooks. One exec should own both dev and ops goals."
Question: "okay panel, you guys make it sound easy. But at my shop, we're going through a KPI measure exercise that is futile. Bonus exercise is just a racket. We'll massage the data in the end to show we got five-nines. We have one guy who was at Netflix who sees where we need to go, but they had a full-time person dedicated to it. Monitoring is a racket, but without customizing event handlers, it's a joke."
Question: "metrics are often misguided. who cares about memory usage. focus on business metrics like transactions."
Question; "Hey, Eisay, how do you get away with developers with root when you're dealing with financial transactions platform."
- Eisay: "Lots of reporting. We made the reporting system easy to use." (But how do you assert on the integrity of the system and data?)
- Javier: "there are creative ways to achieve compliance objectives."
Question: "What do you think of one exec owning dev, QA, and ops?"
- Jyoti: "look at effectiveness mesasures like MTTR"
- Javier: "look for execs who have had this type of responsibility before. As a community, we need to create that skillset and experience set."
Tip: "1) Best time to intro metrics is at feature launch time, so you know whether just launched widget is wanted, and how much CPU is required to ease CapEx budgeting process. 2) We got a sitedown issue last week. Why didn't we see that coming? Sounds like we need a graph for that."

Gene Kim | Comments Off |

DevOps,

talks

Friday

Jun252010

Talk notes: Making The Business Case (#DevOps Day)

Friday, June 25, 2010 at 3:21PM

DevOps Day, Santa Clara, CA
June 25, 2010

I'm here to participate on a panel called "DevOps Outside of Web Operations."

Making The Business Case

Jay Lyman, 451 Group (analyst covering enterprise software)
Kurt Milne, IT Process Institute
Jody Mulkey, Shopzilla (CIO, including Bizrate)
Rolf Andrew Russell, ThoughtWorks (lead DevOps engagements)

Moderator: Damon Edwards (DTO Solutions)

What are the anti-patterns
- Jody: "focus on what is creating the most unplanned work" (it's so cool to hear all these mentions of Visible Ops. It's totally making my day!)
- Rolf: "don't mention DevOps. Mention cycle time, throughput, feature delivery. Language of your audience"
- Kurt: "business executives think in terms of outcomes. IT will think of the process."
- Jody: "IT ops as cost center view is interesting. It's really about time to market and innovation. Those are the drivers and pain points to hit." (great point. Can't believe I didn't say this in my talk.)
What are the business drivers? As opposed to just making pagers go off less?
- Jody: "making freaking graphs to show how it impacts revenue. I call it the 'rabbit feeder', because it connects the entire company to what we do: sending leads to merchants, connecting buyers and sellers. Making data democratic and tying everyone to obvious business outcomes." (haha.)
- Rolf: "the client global CIO needs the 'double double.' The current cycle time is too slow and the projected work load is projected to double. I'm trying to make that reality."
- Kurt: "DevOps could be pitched as a new way to run IT. In business, it fulfills the intent of a 'low cost probe' to experiment/course-correct. As opposed to traditional model which is like building a satellite. That's a great story."
Question: "IT people feel like they're measuring everything, while business thinks IT is bad at measuring anything. What should IT be measuring?"
- Rolf: "So dependent on the type of organization. Velocity conference talks alot about tying downtime to lost dollars, especially in media companies. I like to focus on what work you're doing (WIP), how long are your release cycles, and how much stuff can you get out in each release."
- Kurt: "Time to capability: business wants X, how long does it take to get it? Speed is one aspect. Variability is another expect: how long vs. promised. If you can reduce variance, that builds confidence"
- Jody: "We're really proud of what we've built at Shopzilla, around scaffolding. It takes a lot of work to build it, but then when you track the time to market. You spend a couple of weeks, then a couple of hours. That's a remarkable improvement that's great to show. You first have to be able to measure it and get that telemetry, and then use it as a goal metric for infrastructure work that supports continuous integration. Financial industry: pay yourself first to build capabilities."
Question: "Re: paying everyone on the performance of the entire system. It sounds good, helping business to support its goals. But it's frustrating to be in a position not to affect sales, but performance is based on it."
- Jody: "Shared goals was around performance. It's great when you hit bonus. But when you goose-egg. Then blame-game. It's code problem, no it's infrastructure. The point is, we're in business together. It doesn't matter whose fault, because we have a common goal. You can make the goals more granular, but we've chosen not to."
- Jay: "It could be wrong model, because it could backfire, especially if you feel powerless to engage."
  - "We're ops. We can make trains run on time. We can't over-exceed. We can't make sure the trains are full." (interesting. yet, ops should make sure that ops bottleneck is not wasted!)
  - Damon: "if we're great at our jobs, it's awesome when we can get paid and rewarded on company profitability."
Question: Israel: "too many metrics. Focus on the outcomes: dollars. If you can't focus on dollars, then focus on outputs. Second: quality metrics of what you're doing, otherwise today's success turns into tomorrow's disaster. Third: cost: all you need to do is look at relationship between output/quality and cost. If you can do all three, then you have governance frameworks you need without 18 metrics"
Question: "war story: just ask business what your KPI should be. I sat down as head of development with business, and asked what KPI should be. We were able to quickly get agility/TCO/availability/resourcing on new features vs. maintenance. We achieved it without a lot of psychic pain."

Gene Kim | Comments Off |

7 References |

DevOps,

talks

Friday

Jun252010

Talk notes: Ignite Talks #2 (#DevOps Day)

Friday, June 25, 2010 at 2:48PM

DevOps Day, Santa Clara, CA
June 25, 2010

I'm here to participate on a panel called "DevOps Outside of Web Operations."

Ignite Talks #2

5 minutes, all slides advance every 30 seconds

I wasn't able to catch everyone's attributions.

Netomata: like puppet for network equipment
Petascale storage: DIY! (@tlossen: http://openstoragepod.org)
- open source hardware project for scaleable storage
- software by training, but find hardware by interest
- 988 exabytes: amount of data that will be produced in 2010
- Cost of a petabyte (see pic)
- Use laptop disks
- Wow, ATX style motherboard, and stack 12TB on top of it in one enclosure
- Only needs 600W
- Put them into pods of racks: 1.2 petabytes (6 KW of power)
- (This is hilarious...)
- Okay, let's scale it down so normal people might find it useful. So you can put it in your living room.
Systems thinking & Value stream mapping:
- talking about tools he brings into large engagements
- MIT sloan beer game: retailer, wholesaler, distributor, factory
  - talking about the overordering happens, resulting in huge backlogs and then huge excess. stable system turns very unstable.
- software production pipeline
- continuous integration, functional testing, UAT, staging, production
- stumbled upon lean value stream mapping: optimize flow (concept into cash)
- model current state: collect metrics: p/t, l/t, c&a (time wasted, production issues raised)
  - note that it crosses organizational boundaries
- envision future state
cloud supporting tools: http://libcloud.org
- cloud service providers all have different APIs
- we now provide/support 16 different cloud service providers
- APIs
  - list_nodes(), reboot_note(), destroy_node(), ...
- Add data: location, price-per-hour
- deployment steps: mercury for drupal
- want to go beyond just booting machines
- it's open source: in Apache incubator
- related projects
  - jclouds (author is here)
  - Apache Deltacloud
A simple story of how one company spent 10M dollars and saved nothing
- called the "zero effect"
- the victim: Gloria (like a movie script)
- the hero: Daryl Zero
- the villain: Gregory Start
- the company: High's (haha)
- the vendor: Rivoli (haha)
- story begins: $8.9M won out of $10M RFP
- the analysis: $20K
  - Windows: "we've already got one" (Sponge Bob)
  - AIX dudes: "too busy, won't call back" (Teenage Ninja Turtles)
  - Oracle guys: "you are not worthy"
  - Security guys: "you want freaking root"
  - Linux folks: "hello, you guys should check out Nagios"
  - HP dudes: "they were interested" ("they liked us because no one cared about them")
  - Bob the CIO: "I want my damned dashboard"
- The mothership steps in and decides to do end-to-end operations analysis
- final cost: $10M
- two years later, they switched to BMC.
- Now that's a tragic story!
Erik Sowa, Lyris: Latent Code: Lessons Learned Implementing Feature Bits (@eriksowa)
- latent code: stuff not activated, stuck in "concept to cash" cycle
- no big bangs: at mercy of lowest performing team
- the deployment pipeline
- Inspiration
  - John Allsaw and Paul Hammond video
  - Jez Humble and David Farley book
- technology allows prod manager to turn it on
  - feature flags from Flickr and Twitter
- lessons learned
  - design pressure is good: need loosely coupled and tightly cohesive
  - manage the lifecycle: retire aggressively
  - maintain production quality: code hidden behind feature bits is subject to same requirements as production code
  - default state: decouple code rolls
  - naming convention matters
  - do not overload: for latent code, not for controlling who has gold/silver feature
  - limit the overhead (dependencies)
  - customer facing releases
  - beta- and split-testing: you can use this to do customer testing, turning on for particular group of users

Gene Kim | Comments Off |

DevOps,

talks

Friday

Jun252010

Talk notes: Your Mileage May Vary (#DevOps Day)

Friday, June 25, 2010 at 12:24PM

DevOps Day, Santa Clara, CA
June 25, 2010

Stefan Apitz, LinkedIn
Burzin Engineer (Shopzilla)
Ernest Muller (National Instrumental)
Dan Nemec (Silverpop)

Moderator: Andrew Schafer (Cloudscaling)

I'm here to participate on a panel called "DevOps Outside of Web Operations."

Sorry guys. Trying to get mentally prepared for my panel. I'll be back after my talk. :-)

Your Mileage May Vary

Burzin: looking to find other kanban practitioners
Israel: "don't focus on culture. Focus on behaviors. Changing culture will taken too long."

Gene Kim | Comments Off |

1 Reference |

DevOps,

talks

Friday

Jun252010

Talk notes: Ignite Talks (#DevOps Day)

Friday, June 25, 2010 at 11:50AM

DevOps Day, Santa Clara, CA
June 25, 2010

I'm here to participate on a panel called "DevOps Outside of Web Operations."

Ignite Talks

5 minutes, all slides advance every 30 seconds

Adam Rosien (@arosian)
- "undeployed code == wasted warehouse space"
- deploy canaries: self test
- auto rollback, exponential deploys
- commit messages, deploy services
- zookeeper, json, collectd, nagios, hudson, ant, rpm, yum, jcollectd, rrrdtool, type-systems, jmx, rabbitmq, esper
- "splunk lite": all exceptions in the last 30 minutes
- we're hiring: jobs@kaching.com
Alex Honor, DTO Solutions: deployment tool chains: http:/google.com/group/devlops-toolchain
- dev: specify packages they need in operations
- operations: perform release and deployment
- dev need: app code, configs, third party packages
- release managers: issue tracking status, QA approval, change control sched, promote the artifacts
- ops need: assess scheduling conflict, decide how to batch updates, deploys packages
- managers need: responsibilities and boundaries, enforce process through auth, audit and trace changes
- need: self-service: each role has service service
- keep it simple: use freely available tools, reflect roles and process, easy to understand
- tool: the "meta" package: coupled set of packages
- tool: yum repository: central storage and index, resolve and install package dependencies
- tool: source is in SCM: developers have commit access
- tool: CI job; devs modify/run job when desired
- tool: runbook jobs: promote by release manager and deploy for ops admin
- process: specify package needs
- audit compliance
The Cloud is a Trendy Mainframe: Erica from Bitnami: Erica Brescia
- 1956: $23K/month for 4.17MB of storage
- Now: $0.10/month for 1GB
- Backup: then: tape
- Now: Amazon EBS
- I/O: 1.44 MB vs. ship storage device to Amazon.com
- CPU
- Distribution: acoustic modem vs. instant
- Hardware: messy heat/power vs. AWS, Joyent, etc.
- Price: Commodore PET for $2.7K @ 1MHz vs. 1 GHz w/1.7GB @ 9.5 cents/hr
- App deployment: BitNami creates prepackaged builds
Really fast moving software, Clint Byrum, Canonical Ubuntu Server team: http://fewbar.com: clint@ubuntu.com
- API Contracts: they used to be stable, dependencies were few and loosely coupled
- how do we keep our sanity? test coverage: continuous integration, automatic dependency resolution
- Reality check: if you don't have the right culture, you don't have it
- the good: new versions are typcially more stable and faster
- the bad: stuff breaks, repeated integration cost
- case study: libmemcached: 0.31 v 0.40 releases
- MongoDB: Ubuntu included a version, rapidly judged "don't use"
- So why do we bother packaging? for predictability, so you know what's on the OS: like Southwest Airlines: one type of jet
- What can authors do: "stop breaking your APIs! You're killing us!"
- What can distributors do? Cry.
- What is better way? Something sysadmins already do
- Not "no more handwaving"
- Leverage core compentency
  - Python people don't want distro python
- Rather than build
  - PPA or "personal package archive"
  - launchpad service to easily build packages and then deploy them easily
- Author participation: get involvement early
- Derivatives made easy?
Metrics Simplified: Mark Lin, mlin@admob.com
- hard
- bottlenecks: ops need to do all of this; tough on ops people
- graphite, mabbitmq, graphite local proxy
- path to graph
  - now developers implement metrics before ops even asks (cool)
- graph = post event forensics
- rocksteady, metric as event
- revelation
- beyond simple metric
- what we learned
  - make metric sending simple
  - nice UI to make sense of data
  - real time processing of metric rock

Gene Kim |

5 Comments |

41 References |

DevOps,

talks