NOTE :This is a re-post of http://blog.gluster.org/2016/03/automated-tiering-in-gluster/
This post describes how to run automated tiering in Gluster. Tiering is appropriate for stable workloads where frequently used data fits on small, fast storage, such as SSDs, and rarely used data resides on a slower/cheaper volume, such as spinning disks.
On a tiered volume, files are tracked according to frequency of access. Popular files tend to migrate to faster storage, and unpopular ones to slower storage. The behavior can be influenced with tuning parameters.
To use tiering, take an existing volume and attach a hot tier to it. The existing volume becomes the cold tier . The existing volume may be either erasure coded or distributed-replicated. The hot tier must be distributed-replicated. For example:
gluster volume tier vol1 attach 2 gprfs01:/brick1 gprfs02:/brick2 gprfs03:/brick3 / gprfs04:/brick4
Once the tier is attached, there may be a delay before migration begins. A full scan of the cold tier is undergone. This delay shall be removed in the near term.
Promotion stands for file migration from the cold to hot tier. Demotion stands for migration in the opposite direction.
When a file is migrated a counter is bumped. The counters may be viewed:
gluster volume tier vol1 status
You can stop tiering to use the hot bricks for some other purpose. To stop tiering, use the detach operation. The steps resemble removing bricks. You initiate the process, then wait for it to complete by monitoring its status. This may take time depending on how much data must be moved off the hot tier. Once completed, the commit command may be used to remove the hot tier. The cold tier then reverts to the original volume.
gluster volume tier vol1 detach start
gluster volume tier vol1 detach status
gluster volume tier vol1 detach commit
Hot storage is valuable and should be utilized, else the resource is wasted. To this end, the tiering feature aggressively promotes files to the hot tier until it nears full. That point is governed by the “cluster.watermark-low” tunable and is expressed as a percentage.
Conversely, the hot tier cannot become completely full. If too much data resides on the hot tier, files are aggressively demoted. This value is governed by “cluster.watermark-hi”.
The system shall attempt to stabilize such that the amount of data on the hot tier is between the lower and upper watermarks.
gluster volume set vol cluster.watermark-hi 90
gluster volume set vol cluster.watermark-low 75
The tiering daemon migrates files periodically. The period for promoting files is “cluster.tier-promote-frequency”. Its default value was chosen such that files would be promoted quickly , in reaction to I/O. The period for demoting files is “cluster.tier-demote-frequency”. Its default value was chosen such that files are demoted slowly in the background. These values are expressed in seconds.
gluster volume set vol cluster.tier-promote-frequency 120
gluster volume set vol cluster.tier-demote-frequency 3600
It is possible to limit how much data may be migrated within a period. The limit may be expressed in # of files or in MB.
gluster volume set vol cluster.tier-max-mb 4000
gluster volume set vol cluster.tier-max-files 10000
By default, files are queued to be promoted if they are accessed on the cold tier within a period. This behavior can be changed such that files are promoted if they are accessed more than some threshold within a period. The threshold may be expressed in terms of reads or writes. This would avoid populating the hot tier with files that are only accessed once. The hot tier should store files which are repeatedly accessed.
gluster volume set vol cluster.write-freq-threshold 2
gluster volume set vol cluster.read-freq-threshold 2
As of 3/16, measurements have tested cases where ~95% of the I/Os are to files on the hot tier. Those experiments have shown good performance when the cold tier is distributed-replicated. When the cold tier is erasure coded, the features works well for larger file sizes (greater than 512K) for a typical SSD.
Performance should improve as the code matures, and your milage my vary. A subsequent post shall explore performance.