Service Performance - SLA constraints in OpenStack Stacks

SLA constraints in OpenStack Stacks

3.1.2 Service Performance

Service performance can be described as how fast the service can process the received request. The faster the service responds, the lower is the response time. Within the SLA, the cloud provider can agree with the customer about the maximum time the service has to respond to a request with a particular simultaneous workload. In order to comply with the SLA, the response time must be monitored by the cloud provider.

If response time is higher than established, the SLA is violated, and the cloud provider must find a way to lower the response time. For the cloud provider to know how to

lower the response time, they must know why the response time got above the threshold.

The response time can be influenced by excessive CPU workload, intensive disk usage, lack of RAM or network latency. We must understand how each of those resources affects the service response time and how can it be improved. Each of those resources have different metrics, and each of them will be individually analyzed.

In order to be executed on the cloud, the service needs to be either run on a single machine or be distributed by several virtual instances to ease future scalability.

Assuming that the service is running on a single virtual instance and there are more virtual instances running other services on the same hypervisor, meaning all hypervisors resources will be shared between them, to be more specific, disk, RAM, CPU, and networking. Although the quantity of the resources can be limited to a specific instance, the hypervisor does not entirely limit the usage intensity. If one VM uses all of the disk performance, other instances will have higher latency to access the disk resulting in low service performance and response time. When using shared storage, all instances from every hypervisor host will share the same disks making the situation even more complicated. Not only disks can be overloaded, but also the network if there is no separation between storage and internet connectivity. To prevent this occurrence, disk performance and health should be monitored. When measuring the disk performance, there are three essential metrics: Latency, IOPS, and queue length.

Disk latency is the amount of time between the request for data and the return of it. The higher the amount of time to retrieve the information is, the higher the latency and lower the performance will be. IOPS refers to the number of reading and writing operations made to the disk. If the limit of disk IOPS is reached, the excess requests will go into a queue, where they will wait for their opportunity to be processed.

The longer the request waits in a queue, the slower the response will be and lower performance the client will experience. Every time the disk queue is not empty, the disk is being overloaded with requests that are postponed. The monitoring system can pick up short term spikes of intensive disk usage during certain intervals only. Those spikes should be carefully analyzed. Usually, some of them could mean a backup of the system, defragmentation, or something else. However, if the disk usage is always high, the swap memory used in the virtual instances or host should be verified. This can happen in a scenario where the instance is under-provisioned, and the available resources are not enough, so it has to compensate by using swap memory that uses the disk as additional RAM. The disk access speed used by swap memory compared to RAM is much slower, causing slower service delivery, and on top of that, as it uses swap, which writes to disk, it may also slow response time for other services that may try to access to the disk. Figure 3.2 shows the output of thehtop tool with CPU and RAM statistics of a virtual machine that currently is executing Nexus Repository OSS

service. The green bars represent the used memory pages, blue bars are buffered pages, and yellow bars are the cache pages. Although the RAM is not being all used, the majority of it is cached. Without available RAM, when OS ask to allocate memory, the VM will use the swap memory, slowing down significantly the performance.

Figure 3.2: Swap memory usage by Nexus OSS

To actually monitor the disk performance metrics, there must be a tool that allows getting real-time disk statistics for the following metrics: Latency, IOPS, and queue length. There are many open-source and commercial options available. The first example is iostat². It displays real-time information about the read and write speed, access disk latency, percentage of used swap memory, and the average queue size. Figure 3.3 shows some of the available metrics, such as:

• %user - percentage of CPU utilization at user level

• %nice - percentage of CPU utilization at user level with nice priority

• %system - percentage of CPU utilization at system level

• %iowait - percentage of time CPU were idle during outstanding I/O request

• %steal - percentage of time vCPU waited while the hypervisor was servicing another vCPU

• %idle - percentage of time CPU was idle without outstanding I/O request

• rrqm/s - number of read request per second merged into the queue

• wrqm/s - number of write request per second merged into the queue

• r/s - number of read requests issued per seconds

• w/s - number of write requests issued per seconds

• rkB/s - number of kilobytes read per second

• wkB/s - number of kilobytes written per second

• avgrq-sz - average size of the requests issued per sector

• avgqu-sz - average queue size of the issued requests

• await - average time in milliseconds for I/O requests to be served

• r_await - average time in milliseconds for read requests to be served

• w_await - average time in milliseconds for write requests to be served

• svctm - average service time in milliseconds of the issued I/O requests

• %util - percentage of CPU time used for I/O requests

2https://linux.die.net/man/1/iostat/

The disk performance monitoring is not only required to be done for the hypervisor host, but also for virtual instances. This will help to understand which instance is using most of the disk performance or which lacks in it.

Figure 3.3: IOSTAT real-time disk statistics

As disk, RAM is also shared between all the virtual instances provisioned on the same hypervisor host. The allocation of the RAM should be prudent, as sometimes hypervisor can over-provision, resulting in a total allocated RAM for the instances being higher than what the hypervisor owns. This is the widely used method since, on average, the instance will never use all the available memory. The three main metrics to monitor RAM performance are:

• Memory pages: showing how often the physical disk is used to compensate for the RAM shortage

• Free memory: showing the available RAM to be used by processes at that moment

• Memory pressure: showing the percentage of physical memory in use divided by the total amount of memory

In a scenario where a service tries to allocate more memory, the virtual instance will request more memory from the hypervisor host. If the hypervisor denies, the instance will start to swap the memory pages, using the physical disk as additional memory. In that scenario, the free memory is close to none, memory pages, and memory pressure are at 100%. Using swap memory lowers the performance since the physical disk is not as fast as the RAM.

Memory monitoring does not require additional tools. Usually, in the Linux OS, to check the RAM, the file /proc/meminfo can be used, as it contains all the relevant information about the RAM. Figure 3.5 shows the content of the file of a physical server with 64GB of RAM. The metrics considered relevant to be used: total memory (MemTotal), free memory (MemFree), total swap memory (SwapTotal), and free swap memory (SwapFree). Another way to obtain RAM statistics is through a Linux native tool named free. This tool shows the information about the memory statistics, but with much less detail, as some metrics are missing. Figure 3.4 shows the output of the freetool in human-readable output, where each metric means:

• total - total installed memory

• used - used memory

• shared - memory used by temporary file storage

• buff/cache - sum of buffered and cached memory

• available - estimation of available memory for starting new applications

Figure 3.4: Free memory tool in human-readable output

(a) (b)

(c) (d)

Figure 3.5: Content of file /proc/meminfo split through four images

After covering disk and memory, there is CPU performance, which can also influence performance and service response time. The first obvious metric to check is CPU usage that shows the percentage of the used CPU at that time by the service or virtual instance. A related monitoring metric to CPU usage is CPU demand, which is the amount of CPU workload the instance is requiring. In an ideal scenario, the CPU demand and usage should be about the same, but usually, the demand is much higher, especially if the hypervisor host is CPU over-provisioned. The hypervisor is responsible for receiving and managing CPU usage requests coming from virtual instances. Upon receiving a CPU usage request, the hypervisor will decide which logical CPU will process the request. If the hypervisor’s CPU usage is low or high yet used asynchronously, all requests are processed at the arrival time. However, if the CPU workload is intensive and occurring during the same interval of time, some of the requests will have to wait until the CPU is free. This event can be monitored through the CPU ready time metric.

More requests to process, less CPU is available, the higher the CPU ready time will be.

A more generic metric for CPU performance is the CPU wait time per dispatch. This shows the amount of time the CPU takes to process the request. Although this can show the CPU performance, it is not detailed enough to understand the root problem for low performance. Even after checking all of the above CPU metrics if the CPU usage is below the limit, and the performance is still unacceptable, this could be due to the hypervisor CPU scheduler reaching the limit number of requests. This situation can happen when the instance is creating a large number of requests with small instructions, which do not saturate the physical CPU, but do saturate the hypervisor scheduler, moving the remaining requests into a CPU queue. The CPU queue metric can only be obtained at the OS level of the virtual instance.

Some CPU metrics can only be obtained on a specific hypervisor. CPU demand and CPU ready time are specific for VMware vSphere hypervisor. The CPU wait time per dispatch metric is specific for Hyper-V hypervisor. Those metrics can be helpful, however other metrics are just enough. If our hypervisor does not have any specific CPU metric, we can use a tool namedmpstat to measure CPU statistics in real-time. Figure 3.6 shows the terminal output of the mpstat tool of a virtual machine with 46 vCPU.

This tool shows the available CPU metrics at OS level. The most relevant are CPU usage (%usr and %sys), CPU idle time during outstanding disk I/O request (%iowait) and CPU steal time, time the vCPU has to wait for the physical CPU while it is being used by the hypervisor to serve another virtual instance (%steal). An interesting and valuable feature of mpstat is the ability to see statistics of individual CPU cores.

Figure 3.6: Terminal output of mpstat tool of a virtual machine with 46 vCPU If the customer complains about the performance of the service, but the monitoring systems show that disk, memory and CPU performance are adequate, it is important to check network performance. The service can require to communicate with local or

even internet services and if the network connectivity is not stable or fast enough, it can delay the service workflow, causing higher response time.

3.1.3 Latency

The network performance is often measured in megabits or gigabits of information per second pushed through the network connection to reach the other end, also called by throughput. Network speed is an important aspect, but it is not the only one. Latency also must be taken into account. Network latency can be described as the amount of time measured in milliseconds between sending and receiving a packet or a group of packets. Some applications can be latency tolerant, where the connection speed is what is most relevant, but the list of latency-sensitive applications has been growing.

Latency can severely affect service usability and customer enjoyability. The lower the service latency is, the more enjoyable the customer experience will be. Latency can be challenging to predict and measure, since each customer may take a different network route to reach the service. Different network routes may lead to various necessary router hops, where each hop is an additional delay. A particular router along the way can be overloaded with requests, leading to another delay. Service latency can be affected by many other reasons, which can be categorized into types of latency such as network, Internet, virtualization, and interrupt latency.

Network latency is the amount of time it takes between a request for data and the return of the requested data. Three factors are contributing to the network latency:

propagation, transmission, and router processing time. The propagation time is the amount of time required to send one bit of information from one end of the medium to another, and it increases proportionally to distance. The transmission time is the amount of time required by a network device to push one packet into the medium. The larger the network packet, the higher the transmission time, and the higher the latency will be. Each time the packet travels through a router or gateway, the latency increases, as devices have to take time to process the packet and possibly change the header. The header change can, for example, increase the hop count in the Time-To-Live (TTL) field.

Internet latency is a more specific type of latency, that is part of the network latency.

As the Internet is a Wide Area Network (WAN), there are many hops, routes, and devices involved, where each hop has a different latency. The same factors of network latency affect internet latency but on a broader scale.

Virtualization latency is the time the hypervisor needs to process the received request from the virtual instance, and forward it to the physical component. Any given virtual instance must pass through the hypervisor to communicate with the Internet, which then redirects the network request to the network device.

Interrupt latency is the time the computer takes to act upon the interrupt. The interrupt tells the operating system to stop the task until it can decide what to do in response to the event. This latency type affects the overall latency, but this dissertation will not cover this matter because interrupts are a low-level point of view.

All of those types of latency are difficult to track and monitor without a specific tool, except network latency, that can be monitored using standard Linux tools. The simplest way of measuring the network latency is calculating the time spend from the moment a packet is sent to the moment it returns, also called by Round Trip Time (RTT). The tool traceroute can measure the latency and show the path the packet

took.

3.1.4 Network

Network performance can be affected by numerous factors, such as by a packet drop due to network congestion, or by a high number of packets arriving out of order, and many others. Those situations can be avoided by monitoring network performance. The metrics that can determine the network performance are latency, bandwidth, throughput, jitter and error rate. Network latency is the amount of time the data takes to travel from one host to another. In an ideal world, the latency is close to zero. Unfortunately, the speed is limited by the medium through which the data is traveling. Throughput is the quantity of data a certain device can send or receive. Throughput can be mistaken with bandwidth, despite being different concepts. Bandwidth is the maximum amount of data per second that a certain medium can transport from one side to another.

To simplify the matter, bandwidth can be compared to a bus and throughput to its passengers. The bus has a capacity of 100 passengers, but there are only 75 passengers to transport. Thus, the bus will only transport 75 even though it could transport more.

The bandwidth will always limit throughput. The network jitter is the variation of the latency over time. The more change in latency occurs, the higher the jitter will be. It is normal to have some jitter, but it is a useful metric to determine the network performance and identify network abnormalities. Network errors are a common thing to happen, especially on the Internet, since there is no control over the Internet router devices. Those errors will degrade the network performance. The common network errors that occur are packet drop, out-of-order, loss, and retransmission.

Some of the network performance metrics can be obtained by reading the network device statistics, like the number of packets received or sent, the number of dropped packets, the number of corrupted packets (error) or the number of packets which the device could not receive or send. Those metrics can be obtained using netstat, a network statistic tool. Figure 3.7 shows the terminal output of netstat -i command, with an MTU column. Maximum Transmission Unit (MTU) is the packet size that can

be supported by that device. RX-OK and TX-OK columns are metrics for the number of received and sent packets marked as correct, respectively. RX-ERR and TX-ERR are incorrect packets, a possible cause is data corruption. RX-DRP and TX-DRP are dropped packets. RX-OVR and TX-OVR are packets that the interface could not receive or send.

Figure 3.7: netstat terminal output

The previously demonstrated network metrics are passive, as they do not cause any network overhead. They are retrieved from statistics saved in the network interface, only obtaining a few metrics required to analyze the network performance. The network jitter, throughput, packet loss and out-of-order can only be actively monitored by creating network workload and adding network overhead. Figure 3.8 shows the iperf3 tool that injects packets into the network and measures the network performance. This tool has a client-side that sends the packets and a server-side that receives them, reporting statistics back to the client. The result of active network performance monitoring in Figure 3.8 shows 9.90 Gbits/s of bandwidth, 0.009 ms of network jitter and less than 1% of lost packets.

Figure 3.8: iperf3terminal output

3.1.5 Application

Multiple processes can constitute the service, and each process can be or not crucial for

No documento Orquestração de Serviços Cloud com Componentes Críticos no SKA (páginas 53-71)