Determine what capacity optimization method(s) should be used based on a given workload

Nutanix Capacity Optimization Methods

Erasure Coding

  • Similar to RAID where parity is calculated, EC encodes a strip of data blocks on different nodes to calculate parity
  • In event of failure, parity used to calculate missing data blocks (decoding)
  • Data block is an extent group, and each block is on a different node belonging to a different vDisk
  • Configurable based on failures to tolerate data blocks/parity blocks

EC Strip Size:

  • Ex. RF2 = N+1
    • 3 or 4 data blocks + 1 parity strip = 3/1 or 4/1
  • Ex. RF3 = N+2
    • 3 or 4 data blocks + 2 parity strips = 3/2 or 4/2

Overhead:

Nutanix Capacity Optimization Methods
  • Recommended to have cluster size which is at least 1 more node than combined strip size (data + parity)
  • Allows for rebuilding in event of failure
  • Ex. 4/1 strip would have 6 nodes
  • Encoding is done post-process leveraging Curator MapReduce framework
  • When Curator scan runs, it finds eligible extent groups to be encoded.
  • Must be “write-cold” = haven’t been written to > 1 hour
  • Tasks are distributed/throttled via Chronos
Nutanix Capacity Optimization Methods
Image credit: https://nutanixbible.com
  • EC pairs well with Inline Compression

Compression

Capacity Optimization Engineer (COE) performs data transformations to increase data efficiency on disk

Inline

  • Sequential streams of data or large I/O in memory before written to disk
  • Random I/O’s are written uncompressed to OpLog, coalesced, and then compressed in memory before being written to Extent Store
  • Leverages Google Snappy compression library
  • For Inline Compression, set the Compression Delay to “0” in minutes.

Offline

  • New write I/O written in uncompressed state following normal I/O path
  • After compression delay is met, data = cold (migrated down to HDD tier via ILM) data can be compressed
  • Leverages Curator MapReduce framework
  • All nodes perform compression task
  • Throttled by Chronos
  • For Read/IO, data is decompressed in memory and then I/O is served.
  • Heavily accessed data is decompressed in HDD tier and leverages ILM to move up to SSD and/or cache

Elastic Dedupe Engine

  • Allows for dedupe in capacity and performance tiers
  • Streams of data are fingerprinted during ingest using SHA1 hash at 16k
  • Stored persistently as part of blocks’ metadata
  • Duplicate data that can be deduplicated isn’t scanned or re-read; dupe copies are just removed.
  • Fingerprint refcounts are monitored to track dedupability
  • Intel acceleration is leveraged for SHA1
  • When not done on ingest, fingerprinting done as background process
  • Where duplicates are found, background process removed data with DSF Map Reduce Framework (Curator)

Global Deduplication

  • DSF can dedupe by just updating metadata pointers
  • Same concept in DR/Replication
  • Before sending data over the wire, DSF queries remote site to check fingerprint(s) on target
  • If nothing, data is compressed/sent to target
  • If data exists, no data sent/metadata updated

Storage Tiering + Prioritization

  • ILM responsible for triggering data movement events
    • Keeps hot data local DSF
    • ILM constantly monitors I/O patterns and down/up migrates as necessary
  • Local node SSD = highest priority tier for all I/O
  • When local SSD utilization is high, disk balancing kicks in to move coldest data on local SSD’s to other SSD’s in cluster
  • All CVM’s + SSD’s are used for remote I/O to eliminate bottlenecks

Data Locality

  • VM data is served locally from CVM on local disks under CVM’s control
  • When reading old data (after HA event for instance) I/O will forwarded by local CVM to remote CVM
  • DSF will migrate data locally in the background
  • Cache Locality: vDisk data stored in Unified Cache. Extents may be remote.
  • Extent Locality: vDisk extents are on same node as VM.
  • Cache locality determined by vDisk ownership

Disk Balancing

  • Works on nodes utilization of local storage
  • Integrated with DSF ILM
  • Leverages Curator
  • Scheduled process
  • With “storage only” node, CVM can use nodes full memory to CVM for much larger read cache

Leave a Reply

Your email address will not be published. Required fields are marked *