About Gene Kim

I've been researching high-performing technology organizations since 1999. I'm the multiple award-winning CTO, Tripwire founder, co-author of The DevOps Handbook, The Phoenix Project, and Visible Ops. I'm an DevOps Researcher, Theory of Constraints Jonah, a certified IS auditor and a rabid UX fan.

I am passionate about IT operations, security and compliance, and how IT organizations successfully transform from "good to great."

SEARCH BLOG
« Talk Notes: Artur Bergman on SSDs in the Data Center: 2011 Velocity Conference | Main | My 2011 Velocity Presentation: "Creating the Dev/Test/PM/Ops Supertribe: From Visible Ops To DevOps" »
Friday
Dec302011

Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference

2011 Velocity Conference: 6/14-16, Santa Clara, CA

I'm reviewing some of the awesome talks I missed while I was at the conference. Videos are available at the O'Reilly website.

And I'm using this opportunity to use a kickass iPad app written by @_flynn to do simultaneous notetaking and tweeting. Cool!

This is the Post-Mortem Roundtable, chaired by Mandi Walls, Admeld (@lnxchk). The other panel members are:

  • Mark Imbriaco, Director Prod Ops, Heroku (@markimbriaco: ex-37 Signals, gave amazing talk on Heroku architecture and ops, and also the attempts to run Heroku infrastructure on top of Heroku (!!))
  • Matt Hackett, VP Engr, Tumblr (@mhkt: we serve 21MM blogs, growing 20% per month, 7.5B page views/month, high growth)
  • Teresa Dietrich, Director Tech Ops, WebMD (@teresadg: ex-AOL, "lots of time spent on outage calls")
  • John Allspaw, VP Ops, Etsy: (@allspaw: "I'm here for balance". Both @lnxchk and @allspaw on prog committee. Kickass panel!)

  • @lnxchk: "Q: what's required for crises?" "Mark: our target is 20m max betw updates. If we miss, we'll say 'more in 4h'"

    • @markimbriaco: "We assign someone to update, usually in support. Also we have role of incident commander."
    • @allspaw: "We have status blog and Twitter feed as well. We track blip -> degradation -> outage, which escalates"
    • @allspaw: "...as outage grows, can trigger Dev or Ops; for severe issues, community team, who are good at words."
    • @teresadg: "at WebMD, less structured policies, but we do notify upon service down and publish restoration times."
    • @teresadg: "Important is internal notifcation: 2K employees, countless doctors affected. Notifying/transparceny critical"
    • @teresadg: "Getting Ops to be transparent was challenging; but CTO demanded visibility, best info on restoration time, etc"
    • @mhkt: "We're small, but Ops is always 1st stop. When any risk of large impact, we page 24x7 community support whos oncall"
    • @mhkt: "During our 24h Tumblr outage, I wish we had Twitter updates. Our lack of transparency was criticized widely"
    • @mhkt: "We don't believe our outage desc should be technical: 'MySQL failed" not "incorrect setting, cluster failed""
  • @lnxchk: "Q: @allspaw showed IRC log used during outages: instant documentation, free timestamps: do y'all use IRC?"

    • @markimbriaco: "We use Campfire, but new prob: we use skype, we lose our instant record; maybe need to echo notes into Campfire"
    • @teresadg: "We use Microsoft Lync. Don't laugh. It works. Auto-populates, phone/video chat, messaging window, draw whiteboard."
    • @teresadg: "It really works. If you have licenses try it. Goes way behind Communicator."
    • @mhkt: "We use Hipchat. It's like Campfire, lots of clients; records chats, all company Notices email: Ops/Dev/CEO/Community"
    • @mhkt: "This is the highest level record of outage that I refer back to all the time"
  • @lnxchk: "Q: How do you put knowledge into institutional knowledge to prevent future screwups?"

    • @markimbriaco and @mhkt both say "We use Wiki, and we suck at it" (haha)
    • @allspaw: "Yahoo! did this very well, which I miss. We use Wiki, but for start/end/detect times goes into Google spreadsheet"
    • @allspaw: "Yahoo! did this very well, which I miss. We use Wiki, but start/end/detect times goes into Google spreadsheet"
    • @allspaw: "But all associated media (Skitch screenshots, IRC logs) goes into Pastebin or Gist, and goes into Wiki"
    • @allspaw: "Though I hate Wikis, everyone knows how to use it, it's available."
    • @teresadg: "We formed SRE team: jumps in during stuck releases, outages; SRE assigned to outage; will do data gathering"
    • @teresadg: "Post-outage, they'll dive deep, pull all logs, analyze; what were builds on servers, what changed, time to.."
    • @teresadg: "..detect, fix; then asks 'what do we need to change'; yields request to dev for more monitoring, config chg"
    • @teresadg: "Or procedure change or documentation change: all those reccs driven by SRE, instead of ignored after crisis"
  • @lnxchk: "Q: how do you keep post-mortems from becoming too emotionally charged, people screaming on desks, etc.?"

    • @mhkt: "Since Dev chg often causes issue, they'll drive it, pulling in Ops when necc. Will tell story, no blame, study..."
    • @teresadg: "Allow time for sleep before data gathering/discussion to prev sysadmin from throttling dev who caused failure"
    • @markimbriaco: "Campfire/IRC enables easy data gathering; run post-mortem as chrono review, passed out ahead of time"
    • @markimbriaco: "..this 'world according to mark' helps start conversation; real world example during Amazon EBS failure:"
    • @markimbriaco: "...after 67h outage, no point in further discussion; would have just caused PTSD. sometimes not necc"
    • @allspaw: "+1 for 'no rigid process'; I know I won't get true story and details necc to improve until people feel safe"
    • @allspaw: "State it: I'm not going to fire you, dock pay, or get benched; My boss is CTO and supports my policy"
    • @allspaw: "If you have standing room only for your post mortems, you know you're doing something right; self-imrovement"
    • @allspaw: "When Dev pushes their own code, chgs happening all time; we want this to reflect Etsy, we value authorship"
    • @allspaw: "Dev have pride of authorship and confidence and name attached to commits; fingerpointing
    • @allspaw: "Ideal: here's what i did, here's what I thought would happen, here's what went wrong; then people offer it up" (Nice!)
    • @teresadg: "I've been doing this for 15 yrs; too many people fear getting fired; I've seen really stupid stuff..."
    • @teresadg: "...like tools that if you type in wrong window, sends to all routers; firing only happens when there's malice"
    • @teresadg: "How did we know malice? We could see in logs him testing it, knowing full well the effects of script"
    • @teresadg: "Fear of making mistake, coming into work and thinking 'it'll be anit-Joe day' is real. Safety is needed"
    • @markimbriaco: "I'm constantly worried about issues between Dev and Ops; want Dev to be able to say here's what happened"
    • @mhkt: "Desire to create institutional knowledge and learning; but also has catharsis needs: lv room feeling better"
    • @mhkt: "How to decide in-person or email? Know I'm doing it right when ppl say 'can we do in-person post-mortem?"
  • @lnxchk: "Frameworks from nuke, chem industry, like Five Why: Which methodologies do you use? Too cold? Useful?"

    • @allspaw: (after no one else says "yes"): "After using various methods, some from high risk industries, like for nuc power
    • @allspaw: "..during early days: structured, mathematical. eg. Fault/event tree analysis vs. risk mgmt; 5 Whys came about..
    • @allspaw: "...because of (Taichi Ohno I think, John) method of asking why on plant floor. Opp of rigorous fishbone diagram
    • @allspaw: "For web, growth has been so fast: we choose efficiency vs. rigor; not worth 40h mtgs for couple slow web pages
    • @allspaw: "I think this decision makes sense; not like 'oh, we amputated left leg instead of right" (or reactor meltdown)
    • (FWIW, @kevinbehr's fave root cause analysis is Apollo Method)
  • @lnxchk: "What's worst thing you've ever seen happen in root cause meeting?"

    • @allspaw: "I've seen some RCA that's extremely finger-pointy; previous company: defense mechanism up before meeting!"
    • @allspaw: "'...I don't know why I should be there, but I'll go, b/c educational'" <-- shields/defense up upon invite!
    • @markimbriaco: "As young sysadmin at bank, tons of VPs: no one asked me question, even though it was a tech issue!"
    • @teresadg: "During malicious event, lots of staff got computers/disks confiscated due to data hiding; mgmt didn't say why"
    • @teresadg: "Maybe sysadmin didn't intend to cause as much damage, but he hid tracks; caused cascading problems"
    • @mhkt: "Bad post-mortems focus on how we fixed instead of process: "eg: saw this fault, and we fixed it"
    • @teresadg: "we like time to detect, notification, respond, troubleshoot, repair. Run those #s everytime; Find outliers"
    • @teresadg: "Things are always going to go wrong; that's why Ops people will always have jobs" (Nice!)

End of talk! Great job @lnxchk, @teresadg, @mhkt, @markimbriaco, @allspaw! Will publish link when I find it tomorrow!

BTW, I love that O'Reilly makes videos avail to everyone. Awesome conference. Will go again next yr!

References (32)

References allow you to track sources for this article, as well as articles that were written in response to this article.
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: WrdQKtYs
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: jasa backlink
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: jasa backlink
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: hcg houston
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: miami seo
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: probiotic vitamins
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: Androx
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: video
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: Auckland Lawyers
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: titanfall keys
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: scrapebox vps
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: sports gamblers
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: windshield repair
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: whitepages.com
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: attorney jobs
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Response: Calgary SEO
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...
  • Response
    Talk Notes: Roundtable On Post-Mortems (Mandi Walls, John Allspaw, Mark Imbriaco, Matt Hackett, Teresa Dietrich): 2011 Velocity Conference - RealGeneKim Blog - Home page of RealGeneKim (Gene Kim): Tripwire founder and CTO, Visible Ops co-author, and more...

Reader Comments (1)

Are these their official Twitter accounts?

January 2, 2012 | Unregistered CommenterCheap Flyers Printing
Comments for this entry have been disabled. Additional comments may not be added to this entry at this time.