Analyzing Software Failures Activity

Overview

Software failures can have severe consequences, ranging from minor inconveniences to catastrophic system failures. In this activity, students will analyze historical and contemporary software failures, assess their root causes, and explore ways to mitigate such failures in future software designs.

In addition to independent research, students will use AI tools (e.g., ChatGPT, Copilot, and automated analysis tools) to identify other possible contributing factors, industry insights, and preventive strategies.

Objectives

Identify and classify real-world software failures.
Analyze failures based on their technical, human, and systemic causes.
Use AI tools to enhance analysis, refine findings, and explore alternative perspectives.
Propose software engineering best practices to prevent similar failures.

Activity Steps

Step 1: Initial Research and Failure Selection

Break into teams based on the following failures:

Each team will select a significant software failure from the provided list.

Ariane 5 Rocket Failure (1996) – Integer overflow in guidance system.
Therac-25 Radiation Machine (1985-1987) – Race conditions leading to overdoses.
Knight Capital Group (2012) – Deployment of an outdated trading algorithm, leading to a $440M loss.
Toyota Unintended Acceleration (2000s) – Software-related braking and acceleration issues.
Amazon Web Services (AWS) Outage (Multiple Occasions, pick one) – Single point of failure in cloud infrastructure.
Boeing 737 MAX MCAS System Failure (2018-2019) – Poorly designed flight control system leading to crashes.

Step 2: Manual Root Cause Analysis

For the chosen failure, analyze the following:

What was the failure?
What was the impact? (e.g., financial, human safety, security)
What were the technical causes? (e.g., race conditions, memory leaks, poor error handling)
What were the human factors? (e.g., bad UI/UX, lack of testing, poor requirements analysis)
What could have been done differently? (Prevention strategies)

Step 3: AI-Assisted Analysis

After conducting the manual analysis, students will use AI tools to enhance their findings:

ChatGPT / Copilot for Root Cause Expansion
- Ask AI to suggest additional technical explanations for the failure.
- Example Prompt: “Explain how integer overflow led to the Ariane 5 failure and suggest alternative design approaches.”
AI for Industry Insights & Prevention Strategies
- Use AI to find modern best practices in software engineering that could have prevented the failure.
- Example Prompt: “How do modern aerospace companies prevent software failures in guidance systems?”
AI for Comparative Analysis
- Compare the selected failure to another software failure to find patterns in failure causes.
- Example Prompt: “Compare Therac-25 to other medical device software failures.”

Step 4: Present Findings

Each team will present:

A summary of their selected failure.
Key insights from their manual analysis.
New insights gained from AI-assisted research.
Proposed best practices to prevent similar failures.

Assessment Criteria

Thoroughness of root cause analysis (30%)
Use of AI to expand insights and compare failures (25%)
Quality of proposed solutions and prevention strategies (25%)
Clarity and engagement of presentation (20%)