RPO 5 min, RTO 15 min, guaranteed.
Traqo's multi-region Disaster Recovery and Business Continuity Plan ensures uninterrupted freight operations even through catastrophic infrastructure failures, regional outages, or cyberattacks — with automated failover, immutable WORM backups, scenario-based runbooks, and quarterly tested recovery procedures aligned to ISO 27001, SOC 2, SEBI CSCRF, and RBI IT frameworks.
Systems in scope
All production services that constitute the Traqo platform are covered — every component critical to customer operations has a defined recovery strategy, tested failover procedure, and validated backup mechanism.
Recovery objective targets
Contractually committed SLAs backed by automated failover and quarterly DR drills.
| Metric | Target | How it is achieved |
|---|---|---|
| Recovery Point Objective (RPO) | 5 minutes | Continuous WAL archiving + streaming replication to DR region |
| Recovery Time Objective (RTO) | 15 minutes | Automated DNS failover, one-click DB promotion, K8s pod scale-up |
| Mean Time to Recovery (MTTR) | 30 minutes | Automated runbooks, PagerDuty, cross-trained SRE team |
| Availability SLA | 99.9% | Multi-AZ active-active; ≤ 8.76 hrs unplanned downtime per year |
| DR Test Frequency | Quarterly | Tabletop + full failover simulation; annual full-site DR drill |
Risk scenario assessment matrix
Probability, impact rating, and primary mitigation for each identified scenario. Reviewed quarterly.
| Risk Scenario | Probability | Impact (1–5) | Risk Score | Mitigation Strategy |
|---|---|---|---|---|
| Data center outage (single AZ) | Medium | 3 | Medium | Multi-AZ deployment across 3 AZs; automatic K8s rebalancing; no SPOF |
| Database corruption | Low | 5 | Medium | Continuous WAL archiving; PITR; automated integrity checks; immutable snapshots |
| Network partition | Low | 4 | Medium | Multi-AZ networking; redundant VPN tunnels; auto-failover to alternate paths |
| DDoS attack | High | 3 | High | AWS Shield Advanced; WAF rate limiting; CloudFront edge; auto-scaling |
| Ransomware | Low | 5 | High | Immutable WORM backups; network segmentation; EDR; air-gapped backup copies |
| Cloud provider regional failure | Low | 5 | High | Cross-region DR (ap-south-2); Route 53 auto-failover; continuous replication |
| Third-party API failure (ERP/carrier) | Medium | 3 | Medium | Circuit breakers; exponential backoff; graceful degradation; offline queue |
| Human error (misconfig/deletion) | Medium | 4 | High | IaC + GitOps; RBAC least privilege; automated rollback; backup retention |
Critical service priority classification
Tier determines restoration order during any failover event.
| Tier | Services | Recovery Target | Justification |
|---|---|---|---|
| Tier 1 — Critical | Tracking, Integration Gateway, API Gateway | 0–15 minutes | Real-time visibility & ERP integration directly impact customer supply chains |
| Tier 2 — High | Orders, Indent, Auction, Settlement | 15–30 minutes | Core transactional workflows; brief delays acceptable but must restore quickly |
| Tier 3 — Medium | Analytics, Reports, AI Engine | 30–60 minutes | Decision-support services; operational workflows continue during recovery |
| Tier 4 — Low | Admin Console, Notifications, Internal Tooling | 1–4 hours | Support functions; no direct impact on freight operations during recovery |
_1777711377206.png)