Flexiant Cloud Orchestrator's XVPManager has a built in service monitoring interface for monitoring the health of the nodes, but for more detailed reporting it is easily possible to extend and integrate your existing monitoring system.
These are processes running on the Control Plane server. This is the server which the FCO software was installed on.
Process Monitoring
Processes can be checked by ensuring they are running using their expected PID. These are stored in the following files.
Process | PID File |
---|---|
bind9 | /var/run/named/named.pid |
sshd | /var/run/sshd.pid |
mysql | /var/lib/mysql/extility-*.pid |
postfix | /var/lib/postgresql/9.1/main/postmaster.pid |
postgresql-9.1 | /var/run/postgresql/9.1-main.pid |
rsyslogd | /var/run/rsyslogd.pid |
apache2 | /var/run/apache2.pid |
tftpd-hpa | n/a (can use 'pidof' tool instead) |
ntpd | /var/run/ntpd.pid |
extility-open-iscsi | /var/run/iscsid.pid |
extility-vncproxy | /var/run/extility/vncproxy/vncproxy.pid |
extility-dhcpd | /var/run/dhcp-server/dhcpd.pid |
extility-jade | /var/run/extility/jade/extility-jade.pid |
extility-tigerlily | /var/run/extility/tigerlily/extility-tigerlily.pid |
extility-xvpmanager | /var/run/extility/xvpmanager/extility-xvpmanager.pid |
extility-skyline | /var/run/extility/skyline/skylinewatch.pid |
remote-support | /var/run/extility-remote-support.pid* |
*Remote-support is only running if manually enabled.
In addition to this, services can be checked using init scripts via the 'service' command, for example to check DHCP server is running:
service extility-dhcpd status
Status of Extility DHCP server: dhcpd is running.
Web Based Services
You may check web-based content by performing GETs to their respective URLs. This table outlines the services and respective URLs. Expected latency during normal operation should be <200ms.
Service | GET URL | Expected Result | Default IP | Default Port | Type | Notes |
---|---|---|---|---|---|---|
jade-admin | /soap/admin/current/?wsdl | 200 / XML | Public* | 443/tcp | HTTPS | Requires basic auth (admin user) |
jade-user | /soap/user/current/?wsdl | 200 / XML | Public* | 443/tcp | HTTPS | Requires basic auth (normal user) |
nodeconfig | /xvp/configs/<MAC address of Node> Letters in MAC addresses must be in lower case. | 200 / XML | 10.157.128.1 | 80/tcp | HTTP | MAC address of an active node in XVP Admin |
metadata | Requires ident then.. /metadata | 200 / XML | 10.157.128.1 | 1080/tcp | HTTP | Faked HTTP request required for ident |
payments | /payment.wsdl | 200 / XML | 10.157.16.10 | 3219/tcp | HTTPS | None |
skyline | / | 200 / HTML | Public* | 443/tcp | HTTPS | None |
*Public means the public IP address assigned to the Flexiant Cloud Orchestrator server on your network.
Other Services
Other services you may wish to perform functionality checks on are below. Expected latency during normal operation should be <200ms.
Service | Details | Expected Result | Default IP/Interface | Default Port |
---|---|---|---|---|
tftp | GET nagios.txt | Receive 7 byte file | 10.157.128.1 | 69/udp |
bind9 | DNS A record lookup 'control-mgmt.extility.install' | 'A' record 10.157.16.10 | 10.157.16.10 | 53/udp |
MySQL | SQL Query: 'SELECT count(*) FROM version;' on 'flexicp' DB. Credentials*: API_OPAL_DB_USER & API_OPAL_DB_PASSWORD | Value '1' | 0.0.0.0 | 3306/tcp |
Postfix | Establish TCP connection to port or.. Send an E-Mail via this service and.. Monitor mail queue size (mailq) | Banner begins '220' or.. E-mail arrives and.. <10 queued mails | 0.0.0.0 | 25/tcp |
Postgresql | SQL Query: 'SELECT count(*) FROM version;' on 'tigerlily' DB Credentials*: TL_DB_SUPPORT_USER & TL_DB_SUPPORT_PASSWORD | Value '1' | 0.0.0.0 | 5432/tcp |
DHCPd | Fake DHCP request for MAC address of a live Node. | IP address of given node | eth1 | 67/udp |
SSHd | Establish TCP connection to port | Banner begins 'SSH-2.0' | 0.0.0.0 | 22/tcp |
NTPd | Make NTP request to port | Correct Time (UTC) | 0.0.0.0 | 123/udp |
Syslog | Process check | Process matching 'rsyslogd' | 0.0.0.0 | 514/udp |
*Credentials can be obtained from variables in /etc/extility/config/vars
General OS
As well as specific services, the OS of the management plane server itself can be monitored. 3rd party tools are available for this. Here are some examples.
Service | Details | Expected | Result Notes |
---|---|---|---|
Load | System Load Averages | 5min average < (2xNumber of CPU Cores) | Spikes may occur, may wish to monitor 15 minute averages instead |
Disk Space | Root partition capacity '/' | > 250GB & at least 80% free | 'Fetch Disk/Image' feature initially downloads to the control plane server* |
Memory | Overall memory usage | > 4GB (-/+ buffers/cache) | Low amount of free memory will impact performance of control plane services |
Swap | Swap file usage | = 0 | No swap should ever be used. If swap usage is > 0 the system is low on memory |
*You may wish to allocate additional space for the Fetch Image service. This can be a local or remote disk. Please contact support for further details.
Nodes
Nodes also run essential processes. These are generally monitored internally by XVPManager but Nodes (Hypervisor and Router) can also be loaded with 3rd party Debian packages using the Node Payload System. Some processes to monitor are:
Service | Details | Expected Result | Notes | Node Type |
---|---|---|---|---|
xvpagent | XVP Node Agent | 2x processes (parent & child) matching '*xvpagent* ' | State should != D. Child PID may change. | Router/Hypervisor |
evr | Virtual router | 2x processes (parent & child) matching '*evr* ' | State should != D. Child PID may change. | Router/Hypervisor |
NTP | NTP Time Service | 1x process running matching 'ntpd ' | Essential for VM time to be correct on boot | Router/Hypervisor |
Metaproxy | Metadata Service | 1x process matching 'metaproxy ' | Essential for VM XML meta data | Hypervisor |
Multipath | Disk Mapper | 1x process matching 'multipathd ' (if multipath is configured) | Provides multiple LUN paths for VMs | Hypervisor |
iSCSI | iSCSI Daemon | 1x process matching 'iscsid ' | Provides iSCSI LUNs for VMs | Hypervisor |
Load | OS Load | < (Number of CPU Cores on Xen); < (number of vCPU cores on KVM) | Also monitored by XVPManager | Router/Hypervisor |
SAN
Your SAN should be carefully monitored for capacity & performance. The method to do this will differ between devices, however, at the very least you need to monitor & graph the following.
- Current disk usage / Available space
- Number of exported LUNs
- IOPS Utilisation
- Latency & Response Times
- Cache efficiency
- Network Throughput
- Redundancy / Failover viability
- CPU/memory Utilisation
- Spindle Health
General Hardware
All your hardware should be monitored using vendor provided agents. Most major vendors provide Ubuntu-compatible agents which can be hooked into your existing monitoring systems to monitor the general health of your hardware, such as disks, temperature, power & memory. Node hardware (as well as software, covered above) should also be monitored, using the Node Payload System. Your SAN should also be hardware monitored.
For specific details regarding drivers and Ubuntu compatibility with your hardware; please contact your hardware supplier.
Finite Resources
It is also a good idea to monitor finite resources, such as:
- Available RAM on the platform
- Number of available VLANs
- Number of available Subnets
Network Hardware
All network hardware should be monitored & graphed. Specifically:
- Switch ports throughput (packets & data)
- Switch ports link status
- Switch CPU/Memory utilisation
- Router throughput (packets & data)
- Router OSPF/BGP session status
- Router CPU/Memory utilisation
Advanced Monitoring
You may also wish to monitor on a more granular level. Services such as MySQL and Postgres can be checked in great detail with 3rd party tools, such as Nagios plugins. Any monitoring system which supports Ubuntu 14.04 LTS may be used.
You will need to ensure your monitoring application has access through the firewall which runs on the Control Plane server. Custom rules can be added in /etc/extility/iptables.custom
. For more information, see Defining Custom iptables Rules.
Other functionality which can be monitored is, for example, the ability to log in to the control panel. You could take this a step further and create a new VM via the API as part of a daily platform check. These are just examples and can be as detailed as you require.
Specific queries can be answered by request from Flexiant Support based upon the configuration you have.