Describe and differentiate component, service, and CVM failover processes such as Disk Failure, CVM Failure, and Node Failure

Disk Failure

  • Monitored via SMART data
  • Hades responsible for monitoring
  • VM impact:
    • HA Event: NO
    • Failed I/O: NO
    • Latency: NO
  • In event of failure, Curator scan occurs immediately
  • Scans metadata to find data previously hosted on failed disk
  • Re-replicates (distributes) to nodes throughout cluster
  • All CVM’s participate

CVM Failure

  • Failure = I/O’s redirected to other CVM’s in cluster
  • VM impact:
    • HA Event: NO
    • Failed I/O: NO
    • Latency: Potentially higher given I/O’s are over network (not local)
  • ESXi/Hyper-V handle via CVM Autopathing = leverages HA.py (happy) where routes are modified to forward traffic from internal address (192.168.5.2) to external IP of another CVM.
    • Keeps datastore intact
    • Once local CVM is back online, route is removed and local CVM takes back I/O
  • KVM = iSCSI multipathing leveraged
    • Primary path = local CVM, other two paths = remote CVM

Node Failure

  • VM impact:
    • HA Event: Yes
    • Failed I/O: NO
    • Latency: NO
  • VM HA event will occur, restarting VMs on other nodes
  • Curator will find data previously hosted on node and replicate
  • In event node is down for prolonged period of time, downed CVM will be removed from metadata ring.
    • Will re-join after up and stable
Nutanix Failover Process
Image credit: https://nutanixbible.com

Leave a Reply

Your email address will not be published. Required fields are marked *