Page tree
Skip to end of metadata
Go to start of metadata

About Distributed Block Storage

Distributed block storage is a technique for distributing the storage requirements of your virtual machines across a large number of storage devices.

When using centralised storage, such as a SAN or a NAS, each compute node accesses that SAN or NAS to provide the storage requirements for each VM. When using local storage, the storage for each VM is on a disk in the compute node running the VM. But when distributed block storage is used, each VM disk is stored across many different smaller storage devices, each being a hard disk in a storage node (which might or might not be a compute node). Each virtual disk is broken up into small chunks, each of which will normally be stored on at least two storage nodes. Each copy is called a 'replica'. That way if one copy is lost, the virtual machine's disk will not be corrupted.

Distributed block storage is a newer technology than centralised storage. Arguably, it is more difficult to administer, and carries the risks of being less well tested in mainstream production environments. In particular, tuning it for performance can be more difficult. However, if deployed properly it offers enhanced scalability, can be very economical, and can perform as well as a centralised storage solution.

About Ceph

Ceph is an open source distributed storage technology which supports:

  • Distributed Object Storage (RADOS)
  • Distributed Block Storage (rbd - or RADOS Block Device)
  • A Distributed FIlesystem (cephfs)

Ceph is not a Flexiant product. Ceph is an open source project; you can find out more about it here. Flexiant supports FCO's integration into Ceph, but Flexiant does not support your Ceph cluster. If you want to use Ceph, you should ensure you have the skills to support a Ceph cluster yourself, or that you have purchased a support contract for your Ceph cluster. The leading developers of Ceph are a company called Inktank. Inktank provide commercial consultancy and support for Ceph.

It is your responsibility to organise support for your Ceph cluster or to ensure you can support it yourself. Maintaining distributed storage can be complex. Flexiant are not experts on maintaining Ceph clusters, and this lies outside Flexiant's support package, just as supporting your SAN or NAS lies outside Flexiant's support package. The instructions below are just guidelines to get you started, and are not intended as a full support guide. If you wish to support Ceph yourself, you can find extensive documentation here.

Flexiant Cloud Orchestrator only makes use of the Distributed Block Storage element of Ceph, though you can use the same ceph cluster to provide other services to your customers. For instance, ceph provides an S3 gateway onto RADOS.

A single Ceph deployment is called a 'cluster'. A Ceph cluster need not correspond to an FCO cluster. Two FCO clusters in the same datacenter could share the same Ceph cluster. Equally one FCO cluster could theoretically share two Ceph clusters. 

To avoid ambiguity, in this section of the documentation we will use the phrases 'FCO Cluster' and 'Ceph Cluster' to differentiate between the two meanings of the word 'cluster'. Note that in Ceph's documentation, a cluster always means a 'Ceph cluster'.

Ceph clusters are made up of machines running the following types of services:

  • Monitors (MON): these maintain a map of the current state of the Ceph cluster. You will need a small number of them (normally one or three - see the section on Monitors below).
  • Object Storage Devices (OSD): these store the data. You will need lots of these to store your data.
  • Metadata server (MDS): these maintain metadata on the placement of files in the cephfs filing system. Unless you are using cephfs, you do not need any of these.
  • Auxilliary services, such as RADOS gateways to S3 and SWIFT. You do not need any of these.

Thus for a simple setup, you need only monitors and OSDs.

More than one Ceph service can be run on the same machine. See below for details. 

Ceph Monitors

Your Ceph cluster will not operate unless it can contact a quorum of monitors. A quorum means 'more than half'. Therefore, if half (or more) of your monitors are unreachable, your Ceph cluster will be inoperable. It is thus important to ensure you have a suitable number of monitor devices. The requirements for monitors are determined by scale and redundancy.

Ceph does not place a great workload on monitors; a cluster of many tens of machines should work correctly with one or three monitors. If you scale beyond this, you may need to consider using more monitors.

Redundancy is a more important consideration for small clusters. A test cluster may only have one monitor. If this becomes unavailable, you entire Ceph cluster will fail. It is not useful to have two monitors, as if either of these fail, a quorum will not be available, as a quorum requires more than half of the monitors to be available; thus this configuration is more likely to fail than only having one monitor. We recommend you have at least three monitors for a production service. These should each be on separate machines, if possible in separate racks with separate power.

For a production cluster, ensure you have at least three monitor services, and that these 'share fate' as little as possible. If you do not provide at least three monitors, you risk your Ceph cluster becoming unavailable if any one monitor fails.

If you are using FCO to provide your Ceph Cluster (see below), you can use the cluster controller(s) as monitor(s). You can also use compute nodes as monitors, though we do not recommend this for anything other than test configurations.

Ceph OSDs

Ceph OSDs are used to store data. You should run one OSD process per spindle on which data is stored. For instance, if you have server with 8 hard drives storing data, you should run 8 OSD processes on that server. The OSD's data consists (broadly speaking) of a journal, and other data (including the actual data stored). You may wish to put the journal on a separate high-performance block device, for instance an SSD; if you take this option, each OSD will be associated with two physical devices.

If you are using FCO to provide your Ceph Cluster (see below), you can use compute nodes to provide OSDs. Dummy OSDs are also provided on the first cluster controller to provide a small amount of storage until the first node is added.

Integration of Ceph with Flexiant Cloud Orchestrator

Integration with Ceph is provided on KVM only.

For information about setting up and configuring your integration of Ceph with Flexiant Cloud Orchestrator, see Integrating Ceph with Flexiant Cloud Orchestrator.


Example configurations for integrations of Ceph with Flexiant Cloud Orchestrator

If you provide your own Ceph cluster as opposed to using Flexiant Cloud Orchestrator to provide one, the minimum supported version of Ceph is Firefly (0.80).

The following diagrams lay out suggested configurations for proof of concept, beta, and production environments using a Ceph cluster with Flexiant Cloud Orchestrator.

Proof of Concept

 Click here to expand...


 Click here to expand...


 Click here to expand...



Maintaining and Troubleshooting your Ceph Cluster

General instructions for maintaining and troubleshooting your Ceph cluster can be found in the Ceph documentation, which is available here

Flexiant do not provide support for troubleshooting and maintenance of your Ceph cluster. You may wish to consider buying a support contract, e.g. through Inktank.


Adding a Ceph Storage Unit

Once you have a working Ceph Cluster, you can add a Ceph Storage Unit.

You cannot add a Ceph Storage Unit until you have a working Ceph Cluster. To determine whether you have a working Ceph cluster, log into your cluster controller and type 'ceph health'. You have a working Ceph cluster if the response is HEALTH_OK. If you have any other response, please attend to your Ceph cluster before continuing.

For instructions on how to add a Ceph storage unit so that it Flexiant Cloud Orchestrator can use it to provision virtual disks, see Adding a storage unit.

  • No labels