0: Incident / Event Response Overview
·1 min
Table of Contents
What’s in here?
- Troubleshooting
- Restoring operations
- Automating event mgmt + alerting
- Implement automated healing
- Event-driven automated actions
Which Whitepapes to Study?
Related Services?
- CloudWatch: Detect issues, automate events
- Monitoring thresholds trigger events
- OpsWorks (stacks): Auto-heal failed instances
- (It’s in stack layer settings)
- Auto-scaling: Monitor metrics (scale or heal instances)
- CloudFormation: Store templates in multiple regions for redundancy
- AWS Health Dashboard: Availability and Operations of services