Global Micro Solutions
Active Incident

Updated 18 minutes ago

DNS




Operational

Hosted Exchange




Operational

MxVault




Operational

SyncVault




Operational

IDSync




Operational

RecoveryVault Enterprise




Operational

RecoveryVault Express




Operational

Hosted Virtual Servers




Operational

Odin Control Panel




Operational

Support Portal




Operational

Webroot Console




Operational

Incident Status

Operational

Components

Hosted Exchange

Locations

Teraco Data Centre



April 18, 2018 3:12PM SAST
[Identified] ISSUE: Our Hosted Exchange service team has identified an isolated group of mailbox servers which are not responding within normal parameters. IMPACT: Users with mailboxes hosted on the affected servers may be unable to access their mailboxes. Inbound mail will continue to queue until access to the mailboxes is restored. NEXT STEPS: Engineers have been dispatched to the Teraco data centre to troubleshoot the issue.

April 18, 2018 3:40PM SAST
[Identified] Engineers have arrived on site at Teraco Centre and will be commencing their investigation in the next 10 minutes.

April 18, 2018 3:57PM SAST
[Identified] Issue is isolated to a single blade enclosure. Engineers are conducting diagnostics of the enclosure.

April 18, 2018 4:30PM SAST
[Identified] Engineers have confirmed that both primary and secondary management interconnects in the affected enclosure have failed. Spare management interconnects are on available on site and are being requisitioned from stores.

April 18, 2018 5:02PM SAST
[Monitoring] Our engineers have replaced the first management interconnect. TEMPORARY RESOLUTION: The affected mailbox servers are now OPERATIONAL. Queued mail will now be delivered to the mailboxes which were offline.

April 18, 2018 6:15PM SAST
[Monitoring] The root cause as two why both primary and secondary management interconnects failed simultaneously is still unknown. As a precaution, we have commenced a controlled migration of all the workloads serviced by the affected enclosure. The enclosure will be permanently retired at the end of the process. The migration is expected to be completed within the next 6 hours. CUSTOMER IMPACT: No customer impact is anticipated (Under 90 second transition time). The fail-over process will momentarily disconnect users from their mailboxes. They will automatically start to re-establish connectivity after 90 seconds.

April 18, 2018 8:51PM SAST
[Monitoring] We have picked up further anomalies in the affected enclosure. It is causing intermittent access problems to the mailboxes that have yet to be migrated off the enclosure. The exchange engineering team is continuing with the migration of the workloads off the affected enclosure.

April 18, 2018 10:08PM SAST
[Monitoring] We have managed to mitigate the anomalies in the affected enclosure that were causing intermittent access problems to the mailboxes that have yet to be migrated off the enclosure. The exchange engineering team is continuing with the migration of the workloads off the affected enclosure.

April 19, 2018 12:17AM SAST
[Monitoring] All mailboxes have been successfully migrated from the affected enclosure. We will now continue with the remaining workloads on the affected enclosure.

April 19, 2018 6:55AM SAST
[Monitoring] We have received some reports of users unable to connect to their mailboxes. It does not appear to be related to yesterdays issue. We are investigating.

April 19, 2018 7:40AM SAST
[Identified] We have confirmed a significant number of CPU Spikes. Some of these spikes are large enough to cause Virtual Machines and their associated mailbox stores to migrate to another host. This is causing the intermittent access issues to mailboxes.

April 19, 2018 8:07AM SAST
[Monitoring] Update: We are continuing to monitor the CPU spikes which have diminished significantly. All mailbox stores are mounted and all mailboxes are accessible.

April 19, 2018 8:23AM SAST
[Monitoring] Systems are operating within normal parameters. We have had no reported disconnects for the last 30 minutes. This is consistent with our instrumentation and logs.

April 19, 2018 9:27AM SAST
[Monitoring] Systems are operating within normal parameters.

April 19, 2018 10:59AM SAST
[Monitoring] Systems are operating within normal parameters.

External Services

AWS EC2

AWS S3

CloudFlare

Google Search

Intercom

MS Office365

Status.io

YouTube

Yubico