Corefinity - Notice history

Google Cloud Platform Stacks (GCP) - Operational

100% - uptime
May 2023 · 100.0%Jun · 100.0%Jul · 100.0%
May 2023
Jun 2023
Jul 2023

Amazon Web Services Stacks (AWS) - Operational

100% - uptime
May 2023 · 100.0%Jun · 100.0%Jul · 100.0%
May 2023
Jun 2023
Jul 2023

Microsoft Azure Stacks - Operational

100% - uptime
May 2023 · 100.0%Jun · 100.0%Jul · 100.0%
May 2023
Jun 2023
Jul 2023

Civo Stacks - Operational

100% - uptime
May 2023 · 100.0%Jun · 100.0%Jul · 100.0%
May 2023
Jun 2023
Jul 2023

Control Panel (manage.corefinity.com) - Operational

100% - uptime
May 2023 · 100.0%Jun · 100.0%Jul · 100.0%
May 2023
Jun 2023
Jul 2023

Deployment Pipelines - Operational

100% - uptime
May 2023 · 100.0%Jun · 100.0%Jul · 100.0%
May 2023
Jun 2023
Jul 2023

Notice history

Jul 2023

No notices reported this month

Jun 2023

Maintenance in europe-west2-c
  • Resolved
    Resolved

    This incident has been resolved and the maintenance window is now complete.

  • Monitoring
    Monitoring

    We have found the underlying issue and mitigation efforts are actively in progress.

    Although there is a maintenance currently on-going from GCP this is expected and average frequency of maintenance within the GCP platform for sole tenanted environments is every 4 to 6 weeks and you can read more about this at : https://cloud.google.com/compute/docs/instances/host-maintenance-overview#maintenanceevents

    However there is a live migration feature that will stop disruption to servers when maintenance is on going, this means that servers will live migrate to another host during the migration with 0 impact to service - you can read about this feature and its limitations here: https://cloud.google.com/compute/docs/instances/live-migration-process

    The servers affected, do not meet any of those limitations hence we have been at a loss as to why these services are restarting rather than live migrating as expected.

    We have as of a few minutes ago, found the root cause of this being that these node pools have a feature enabled that has been done in order to reduce latency and improve performance called "Compact Placement" - This feature ensures that all nodes in each of our customers clusters are close to each other within the data centre ensuring the lowest latency possible when nodes talk to each other.

    Corefinity enabled this feature for clients last year given its potential benefits and at the time from the documentation, lack of disadvantages.

    However - we are now being told that enabling this feature means that servers will not live migrate during Google Cloud routine maintenance but will simply restart, a limitation that is clearly missing from the documentation on limitations of live migration - causing the few minutes of drop outs our clients in this zone have experienced today and earlier in May.

    This note is now on the documentation around compact placement (https://cloud.google.com/kubernetes-engine/docs/how-to/compact-placement) and Google have promised to add to the documentation page around limitations of live migration shortly.

    Corefinity will be performing emergency maintenance on all of its affected infrastructure within London GCP zone (europe-west-2) in order to permanently disable the compact placement feature and ensure this issue does not reoccur.

    We will be sending out notification in regards to the maintenance window shortly and we expect this work to be done in early hours of the morning with only a few minutes of disruption to service.

  • Investigating
    Investigating

    Corefinity has been experiencing elevated error rates within its customers hosted on Google Compute Platform specifically in the europe-west2-c zone.

    Customers who are hosted on a multi zone cluster are not affected.

    We are currently in direct contact with GCP support and will update this status ASAP.

May 2023

Google Compute Platform Maintenance
  • Resolved
    Resolved

    This incident has been resolved and GCP maintenance has been completed.

  • Identified
    Identified

    There is currently a Google Compute Engine Maintenance in London.

    We are observing multiple nodes at a time from different clusters being restarted with the following log:

    …pe-west2-xx-xxx-xxxx-xxxxx-xxxxxxx-xxxxx system@google.com Instance terminated during Compute Engine maintenance.

    We have an escalated support case with Google to determine the length of the maintenance.

    Depending on the size of your cluster, most restarts will not cause a dropout due to the redundancy in place, however if multiple nodes in your cluster restart at the same time this could cause a dropout for a few minutes.

    We apologies for the inconvenience caused, We did not have any notice of this maintenance and will follow up.

  • Investigating
    Investigating

    We are currently investigating higher than normal errors rates on stacks in the following zones on GCP (europe-west2-c and europe-west2-b).

May 2023 to Jul 2023

Next