What to do when complex systems fail: A guide to chaos management during a major incident

Aish Dahal ยท Full Stack Fest 2017

Operational success of software is taken for granted while failure is frowned up. As technology gobbles up the world around, it is important for technologists to be able to understand the human element in dealing with major outage and for learning operational techniques in making the experience less painful. The aim of this talk is to explain in detail about operational anti- patterns while dealing with major system outages, and also provide the attendees with a framework based on the United States National Incident Management System (NIMS), that can be applied when dealing with a major system failure.