Corefinity - Notice history

Google Cloud Platform Stacks (GCP) - Operational

100% - uptime
Dec 2022 · 100.0%Jan 2023 · 100.0%Feb · 99.52%
Dec 2022
Jan 2023
Feb 2023

Amazon Web Services Stacks (AWS) - Operational

100% - uptime
Dec 2022 · 100.0%Jan 2023 · 100.0%Feb · 100.0%
Dec 2022
Jan 2023
Feb 2023

Microsoft Azure Stacks - Operational

100% - uptime
Dec 2022 · 100.0%Jan 2023 · 100.0%Feb · 100.0%
Dec 2022
Jan 2023
Feb 2023

Civo Stacks - Operational

100% - uptime
Dec 2022 · 100.0%Jan 2023 · 100.0%Feb · 100.0%
Dec 2022
Jan 2023
Feb 2023

Control Panel (manage.corefinity.com) - Operational

100% - uptime
Dec 2022 · 100.0%Jan 2023 · 100.0%Feb · 100.0%
Dec 2022
Jan 2023
Feb 2023

Deployment Pipelines - Operational

100% - uptime
Dec 2022 · 100.0%Jan 2023 · 100.0%Feb · 100.0%
Dec 2022
Jan 2023
Feb 2023

Notice history

Feb 2023

Corefinity investigating high loads across a number of GCP environments
  • Resolved
    Resolved

    We have also found a huge (3x to 5x) spike in traffic across many of our Magento 2 applications at the same time as this incident. We are conducting a separate investigation into this however there is no cause for concern.

    We have implemented a fix and currently monitoring the result. Everything is healthy and green.

    We will provide further updates.

  • Identified
    Identified

    We have identified the root cause of the issue and unfortunately it is completely unrelated to the recent events we have had although it has most likely contributed to the high load and increased the impact.

    We've had an immediate response to our P1 request with GCP and the root cause of the issue has been identified as high usage across majority of client nodes by a process named "/home/kubernetes/bin/gcfsd" - This process is a GCP managed process providing virtual mounts to the servers.

    The below line shows this process using around 8 cores of CPU (Peaking to 20 and 22 cores on other clients).

    2114 root 20 0 23.5g 15.9g 21088 S 757.0 8.5 1962:35 /home/kubernetes/bin/gcfsd --mountpoint=/run/gcfsd/mnt --maxcontentcachesizemb=213 --maxlargefilescachesizemb=213 --layercachedir=/var/lib/containerd/io.containerd.snap+

    We are working on an immediate mitigation action at the moment and another update will be provided within 10 minutes.

  • Investigating
    Investigating

    We are currently investigating high loads across a number of GCP servers and environments.
    We will provide another update within 30 minutes.

GCP Incident
  • Resolved
    Resolved

    GCP Incident has been resolved.
    The frontend impact of this incident was minimal to users.
    Please follow https://status.corefinity.com/cldq0aue114067rfnex131443l for updates on NFS degradation as once that rollout is complete, events such as these will no longer cause any impact to users.

  • Investigating
    Investigating

    Unfortunately we are currently dealing with a join incident on the GCP Stacks, whereby a critical security update released overnight is being pushed to all Kubernetes infrastructure by GCP.

    Corefinity offers a completely redundant and scaleable stack and in normal times these updates are pushed silently with no impact to users at all, unfortunately due the incident currently ongoing with NFS degradation (https://status.corefinity.com/cldq0aue114067rfnex131443l), we are seeing small dropouts exclusively on GCP stacks.

    We are working closely with GCP Cloud Toyko and GCP Cloud UK in order to pause the rollout of security updates until this incident is over.

    Corefinity engineering team is now also expediting our rollout of the NFS degradation fix and we estimate the rollout to be complete within 24 hours to all stacks.

Jan 2023

Investigating reports of degraded performance
  • Resolved
    Resolved

    Corefinity has completed the rollout of the fix and all stacks have now returned to normal. All scheduled tasks/indexing issues should now be resolved and we are seeing significantly better performance across the board than before the incident.

    Corefinity will be releasing a full incident report within 72 hours.

  • Update
    Update

    Unfortunately we are currently dealing with a join incident on the GCP Stacks, whereby a critical security update released overnight is being pushed to all Kubernetes infrastructure by GCP.

    Corefinity offers a completely redundant and scaleable stack and in normal times these updates are pushed silently with no impact to users at all, unfortunately due the incident currently ongoing with NFS degradation, we are seeing small dropouts at the moment exclusively on GCP stacks.

    We are working closely with GCP Cloud Toyko and GCP Cloud UK in order to pause the rollout of security updates until this incident is over.

    Corefinity engineering team is now also expediting our rollout of the NFS degradation fix and we estimate the rollout to be complete within 24 hours to all stacks.

  • Update
    Update

    The rollout of the patch to all UAT/Staging environments is now complete and preliminary monitoring results show not only a complete resolution to the performance degradation seen due to these events but much better performance than before.

    We will continue to monitor for another 48 hours and have a full rollout plan to all affected stacks with no expected impact (i.e we can rollout with the redundancy in place with no downtime).

  • Update
    Update

    We have now identified the root cause of the issue to be with a very specific bug within the 4.0 NFS client that is being used on all affected stacks.

    This has allowed us to start to work on a mitigation plan and we expect to be releasing a patch to all Staging and UAT environments within 24 hours of this update.

    We have scaled up all affected stacks at our cost to ensure none to minimal impact to front-end of each website and as such, we like to monitor Staging/UAT environments with the fix for a period of 72 hours and if no further issues are seen, we will rollout the fix to all affected websites. You will see better performance than before the incident and as promised we will be releasing a full incident report following these events. We'd like to thank you for your continued support and patience during this time and we feel extremely lucky and honoured to have such supportive customers and partners during these unprecedented events.

  • Update
    Update

    Corefinity has deployed a number of optimisations over the past few days that has seen the impact of the degradation of NFS performance reduce to minimal.

    The impact to frontend of websites remains extremely low (if any in most circumstances) and we continue to respond to all alerts within seconds ensuring stacks are scaled up at our cost during this time; and as such we continue to explore all open options thoroughly to ensure once a permanent fix is deployed, that it is deployed with no further impact to clients. We'd like to thank all of our customers and partners who have been extremely understanding of the events over the past week, and we will continue to provide updates to you as progress is made. We will be releasing an incident report with all details (technical and none technical) once everything has stabilised.

  • Update
    Update

    Corefinity is still working towards a permanent fix for degraded NFS performance. We have scaled up all affected stacks at our cost to ensure none to minimal impact to front-end of each website. We do still believe this incident to be open and are working around the clock to return performance to base or better.

  • Update
    Update

    Corefinity has deployed its first fix for the degraded NFS performance within its GCP and AWS stacks. We are continuing to monitor the health of all affected websites and will be providing another update and an incident report once we are certain performance has returned to base (or better).

    We'd like to thank you for your patience during this time as this certainly is not the norm for us but rest assured we are working around the clock to return service to normal.

    Impact to frontend of websites is none to minimal and we will continue to monitor all websites and any frontend alerts will be dealt with as a matter of urgency as always.

  • Update
    Update

    Corefinity continues to monitor all affected stacks. We are making progress in identifying the root cause of the issue and we will continue to provide updates meanwhile. Impact to the frontend of each website is none to minimal. We will continue to monitor all websites and any frontend alerts will be dealt with as a matter or urgency as always.

  • Monitoring
    Monitoring

    Corefinity has scaled up all stacks in order to deal as best as we can with the degraded performance identified. We will continue to investigate the root cause of the issues and once a permanent fix is in place, we will be releasing an incident report.

    During this time you may notice degraded performance on your scheduled tasks and we will do our best to return to normal as fast as we can.

  • Identified
    Identified

    We have identified the cause as reduced performance across all NFS servers that were updated as part of the planned maintenance. We are currently working on mitigation options.

  • Investigating
    Investigating

    Corefinity is currently investigating reports of degraded performance on its scaleable stacks hosted on GCP and AWS.

    We will provide another update within 2 hours.

Dec 2022

No notices reported this month

Dec 2022 to Feb 2023

Next