TRAC-3036 - RCA – INC168031 - IOT Direct – Data Sessions on Aeris A-LH Services
Event Start Date/Time (UTC): March 11, 2026 03:40 AM
Event End Date/Time (UTC): March 11, 2026 07:14 AM
Date/Time Reported (UTC): March 11, 2026 03:40 AM
Severity: High (Severity 2)
Services Affected: 2G, 3G, and 4G Data Sessions
Case Number: TRAC-3036 - INC168031
Duration: 3 Hour 34 minutes
Description of Failure:
On March 11, 2026, beginning at approximately 03:40 AM UTC, Aeris A-LH Services experienced a service interruption affecting data session establishment and overall session stability for 2G, 3G, and 4G services which triggered after the following implementation of change IOTCHG-7912 (AT&T Production Migration – Larger APNs from VPN to AT&T MPLS, MW-2) where an unexpected AAA authentication failure storm occurred outside the planned maintenance window.. The Aeris engineering team declared an incident and initiated an internal troubleshooting bridge, with service restored after approximately 3 hour and 34 minutes.
Impairment Cause:
The incident occurred during a planned maintenance activity that required terminating all active sessions on the AT&T PGW. This triggered a large‑scale reconnection attempt from approximately 520,000 devices, creating a surge in authentication requests that exceeded the processing capacity of the AAA infrastructure and resulted in temporary authentication congestion and partial service impact.
Impairment Resolution:
The impairment was resolved by change rollback.
Corrective Action Items
Action 1:
Pre‑Maintenance Capacity Planning: Future maintenance activities that involve session termination or PDP context flushes will include explicit AAA capacity impact assessments to ensure sufficient resources are available prior to execution.
Action 1 Completion Date:
To be determined