Page tree
Skip to end of metadata
Go to start of metadata

Flexiant Cloud Orchestrator's XVPManager has a built in service monitoring interface for monitoring the health of the nodes, but for more detailed reporting it is easily possible to extend and integrate your existing monitoring system.

These are processes running on the Control Plane server. This is the server which the FCO software was installed on.

Process Monitoring

Processes can be checked by ensuring they are running using their expected PID. These are stored in the following files.

 
ProcessPID File
bind9
/var/run/named/named.pid
sshd
/var/run/sshd.pid
mysql
/var/lib/mysql/extility-*.pid
postfix
/var/lib/postgresql/9.1/main/postmaster.pid
postgresql-9.1
/var/run/postgresql/9.1-main.pid
rsyslogd
/var/run/rsyslogd.pid
apache2
/var/run/apache2.pid
tftpd-hpa
n/a (can use 'pidof' tool instead)
ntpd
/var/run/ntpd.pid
extility-open-iscsi
/var/run/iscsid.pid
extility-vncproxy
/var/run/extility/vncproxy/vncproxy.pid
extility-dhcpd
/var/run/dhcp-server/dhcpd.pid
extility-jade
/var/run/extility/jade/extility-jade.pid
extility-tigerlily
/var/run/extility/tigerlily/extility-tigerlily.pid
extility-xvpmanager
/var/run/extility/xvpmanager/extility-xvpmanager.pid
extility-skyline
/var/run/extility/skyline/skylinewatch.pid
remote-support
/var/run/extility-remote-support.pid*

*Remote-support is only running if manually enabled.

In addition to this, services can be checked using init scripts via the 'service' command, for example to check DHCP server is running:

service extility-dhcpd status
Status of Extility DHCP server: dhcpd is running.

Web Based Services

You may check web-based content by performing GETs to their respective URLs. This table outlines the services and respective URLs. Expected latency during normal operation should be <200ms.

ServiceGET URLExpected ResultDefault IPDefault PortTypeNotes
jade-admin/soap/admin/current/?wsdl200 / XMLPublic*443/tcpHTTPSRequires basic auth (admin user)
jade-user/soap/user/current/?wsdl200 / XMLPublic*443/tcpHTTPSRequires basic auth (normal user)
nodeconfig

/xvp/configs/<MAC address of Node>

Letters in MAC addresses must be in lower case.

200 / XML10.157.128.180/tcpHTTPMAC address of an active node in XVP Admin
metadata

Requires ident then..

/metadata

200 / XML10.157.128.1

1080/tcp

HTTPFaked HTTP request required for ident
payments/payment.wsdl200 / XML10.157.16.103219/tcpHTTPSNone
skyline/200 / HTMLPublic*443/tcpHTTPSNone

*Public means the public IP address assigned to the Flexiant Cloud Orchestrator server on your network.

Other Services

Other services you may wish to perform functionality checks on are below. Expected latency during normal operation should be <200ms.

ServiceDetailsExpected ResultDefault IP/InterfaceDefault Port
tftp

GET nagios.txt

Receive 7 byte file10.157.128.169/udp
bind9

DNS A record lookup 'control-mgmt.extility.install'

'A' record 10.157.16.1010.157.16.1053/udp
MySQL

SQL Query: 'SELECT count(*) FROM version;' on 'flexicp' DB.

Credentials*: API_OPAL_DB_USER & API_OPAL_DB_PASSWORD

Value '1'0.0.0.03306/tcp
Postfix

Establish TCP connection to port

or..

Send an E-Mail via this service

and..

Monitor mail queue size (mailq)

Banner begins '220'

or..

E-mail arrives

and..

<10 queued mails

0.0.0.025/tcp
Postgresql

SQL Query: 'SELECT count(*) FROM version;' on 'tigerlily' DB

Credentials*: TL_DB_SUPPORT_USER & TL_DB_SUPPORT_PASSWORD

Value '1'0.0.0.05432/tcp
DHCPd

Fake DHCP request for MAC address of a live Node.

IP address of given nodeeth167/udp
SSHdEstablish TCP connection to portBanner begins 'SSH-2.0'0.0.0.022/tcp
NTPdMake NTP request to portCorrect Time (UTC)0.0.0.0123/udp
SyslogProcess checkProcess matching 'rsyslogd'0.0.0.0514/udp

*Credentials can be obtained from variables in /etc/extility/config/vars

General OS

As well as specific services, the OS of the management plane server itself can be monitored. 3rd party tools are available for this. Here are some examples.

ServiceDetailsExpectedResult Notes
LoadSystem Load Averages5min average < (2xNumber of CPU Cores)Spikes may occur, may wish to monitor 15 minute averages instead
Disk SpaceRoot partition capacity '/'> 250GB & at least 80% free'Fetch Disk/Image' feature initially downloads to the control plane server*
MemoryOverall memory usage> 4GB (-/+ buffers/cache)Low amount of free memory will impact performance of control plane services
SwapSwap file usage= 0No swap should ever be used. If swap usage is > 0 the system is low on memory

*You may wish to allocate additional space for the Fetch Image service. This can be a local or remote disk. Please contact support for further details.

Nodes

Nodes also run essential processes. These are generally monitored internally by XVPManager but Nodes (Hypervisor and Router) can also be loaded with 3rd party Debian packages using the Node Payload System. Some processes to monitor are:

ServiceDetailsExpected ResultNotesNode Type
xvpagent

XVP Node Agent

2x processes (parent & child) matching '*xvpagent*'State should != D. Child PID may change.Router/Hypervisor
evrVirtual router2x processes (parent & child) matching '*evr*'State should != D. Child PID may change.Router/Hypervisor
NTPNTP Time Service1x process running matching 'ntpd'Essential for VM time to be correct on bootRouter/Hypervisor
MetaproxyMetadata Service1x process matching 'metaproxy'Essential for VM XML meta dataHypervisor
MultipathDisk Mapper1x process matching 'multipathd' (if multipath is configured)Provides multiple LUN paths for VMsHypervisor
iSCSIiSCSI Daemon1x process matching 'iscsid'Provides iSCSI LUNs for VMsHypervisor
LoadOS Load< (Number of CPU Cores on Xen); < (number of vCPU cores on KVM)Also monitored by XVPManagerRouter/Hypervisor

SAN

Your SAN should be carefully monitored for capacity & performance. The method to do this will differ between devices, however, at the very least you need to monitor & graph the following.

  • Current disk usage / Available space
  • Number of exported LUNs
  • IOPS Utilisation
  • Latency & Response Times
  • Cache efficiency
  • Network Throughput
  • Redundancy / Failover viability
  • CPU/memory Utilisation
  • Spindle Health
Your SAN vendor will be able to provide further details how to achieve this and advise other vectors to monitor & graph.

General Hardware

All your hardware should be monitored using vendor provided agents. Most major vendors provide Ubuntu-compatible agents which can be hooked into your existing monitoring systems to monitor the general health of your hardware, such as disks, temperature, power & memory. Node hardware (as well as software, covered above) should also be monitored, using the Node Payload System. Your SAN should also be hardware monitored.

For specific details regarding drivers and Ubuntu compatibility with your hardware; please contact your hardware supplier.

Finite Resources

It is also a good idea to monitor finite resources, such as:

  • Available RAM on the platform
  • Number of available VLANs
  • Number of available Subnets
Other counters such as number of VMs and the amount of Free RAM on a per node basis is a good idea. Please contact support for further details.

Network Hardware

All network hardware should be monitored & graphed. Specifically:

  • Switch ports throughput (packets & data)
  • Switch ports link status
  • Switch CPU/Memory utilisation
  • Router throughput (packets & data)
  • Router OSPF/BGP session status
  • Router CPU/Memory utilisation

Advanced Monitoring

You may also wish to monitor on a more granular level. Services such as MySQL and Postgres can be checked in great detail with 3rd party tools, such as Nagios plugins. Any monitoring system which supports Ubuntu 14.04 LTS may be used.

You will need to ensure your monitoring application has access through the firewall which runs on the Control Plane server. Custom rules can be added in /etc/extility/iptables.custom. For more information, see Defining Custom iptables Rules.

Other functionality which can be monitored is, for example, the ability to log in to the control panel. You could take this a step further and create a new VM via the API as part of a daily platform check. These are just examples and can be as detailed as you require.

Specific queries can be answered by request from Flexiant Support based upon the configuration you have.

  • No labels