We look at the system not as theorists,
but as engineers responsible for stability.
Will you be the first to know about
a crash, or will you have to wait
for a support email?
What happens if a failure occurs? Is there a clear recovery path, or is everything hanging on by a thread?
Can a single coding error bring down
the entire project?
Can a single coding error bring down the entire project?
Are there any holes in the system that could lead to a disaster if a random error occurs?
What can be fixed with a single code fix
to forget about late-night phone calls?
What can be fixed with a single code fix to forget about late-night phone calls?
How efficiently is the infrastructure budget being spent?