Performance and availability monitoring in levels

The availability of an IT component can be obtained by measuring (monitoring) the performance of that component. If the performance is below a certain threshold, the IT components is reported unavailable.

Monitoring IT systems can be done using a variety of tools. Vendors like IBM, HP, BMC and others provide tools to:

  • Measure performance
  • Capture logging
  • Generating alarms based on thresholds
  • Report the collected data in dashboards or other overviews

Typically, the number of measuring points in an IT landscape is quite overwhelming. When installed out of the box, monitoring tools will typically detect many issues per second, leading to many false alarms. Therefore, it is essential to tune the monitoring system to only generate useful alarms and to create reports containing useful information for specific stakeholders.

Performance measurement (and as derivate – availability detection) can be done on multiple levels:

  • Business process level
  • Application component level
  • Infrastructure component level

It is important to have separated performance measurements on all three levels and to have processes to solve issues on all individual levels.

For the end user of the system, only the business process level is important – as soon as the performance of this level is too low, the end users will be in trouble. Therefore, the business process level should be measured. Today’s tools are able to measure individual business process steps either by measuring their normal use or by measuring the effect of generated business actions. For instance, it can be measured how long it takes to print an invoice and it can be measured how long a simulated fake order takes to be processed in a certain business step.

If the performance on the business process level is below the set threshold, first the performance of the underlying application component(s) should be verified. Since every layer is responsible for its own performance, it could be that there is a problem in the application component layer causing the performance issue in the business process layer. And the application component layer could have performance issues due to a performance issue in the infrastructure component layer. Therefore it is important to separate these layers and give systems managers specific responsibilities for a certain layer. Between the layers, service level agreements should be agreed (Service Level Agreements – SLAs).

If the performance of the business process level is too low and there is no problem in the underlying application components, the solution to the performance issue must be found in the business process layer itself. If this is not the case, then there is a mismatch between the layers – a certain business process issue is apparently not detected in the lower application service layer.

Of course, this reasoning is also valid for the relation between the application components layer and the infrastructure component layer.

On the application component level, performance can be measured effectively if the application components contains “hooks” that the monitoring tool can use to verify the performance of a software component. Without these hooks, measuring can only be done on a much lower granularity. Especially when bespoke software is developed it is advised to invest in building these hooks in the software as part of the regular development process. Typical measurements are the number of times a (part of an) application component is used and how long it takes to finish a certain task. In software, typically there are some hot spots – parts of the code that are used much more frequently than others. By measuring using hooks in the software, these hot spots can be found, monitored, and optimized for performance.

On the infrastructure component level, the performance of each individual component can be measured. Examples are:

  • CPU load
  • Memory usage
  • Network response time
  • Network load
  • Storage response time
  • Storage load

Based on these measurements, low performance, or even unavailability of a certain component or a set of components can be detected.

Systems managers can react on the detection of low performance by addressing the issue at hand. It is important to acknowledge that early detection and resolving of performance issues is essential to avoid performance problems at the higher layers. Early detection and resolving keeps the systems managers busy, but reduces the risk that end users experience performance issues.

It is like the people who work hard to keep the trains running on time. If they do their work well, no one will notice…


This entry was posted on Friday 09 January 2015

Earlier articles

Configuration management tools

Commonly used IaC languages

Edge computing

Cloud computing and Infrastructure

What is IT architecture?

Infrastructure as Code pipelines

Quantum computing

VS kan nog steeds Europese data Microsoft opeisen ondanks nieuwe regels

Data Nederlandse studenten in cloud niet grootschalig toegankelijk voor bedrijven VS

Passend Europees cloudinitiatief nog ver weg

Security bij cloudproviders wordt niet beter door overheidsregulering

The cloud is as insecure as its configuration

Infrastructure as code

DevOps for infrastructure

Infrastructure as a Service (IaaS)

(Hyper) Converged Infrastructure

Object storage

Software Defined Networking (SDN) and Network Function Virtualization (NFV)

Software Defined Storage (SDS)

What's the point of using Docker containers?

Identity and Access Management

Using user profiles to determine infrastructure load

Public wireless networks

Stakeholder management

Archivering data - more than backup

Desktop virtualization

Supercomputer architecture

x86 platform architecture

Midrange systems architecture

Mainframe Architecture

Software Defined Data Center - SDDC

The Virtualization Model

Sjaak Laan

What are concurrent users?

Performance and availability monitoring in levels

UX/UI has no business rules

Technical debt: a time related issue

Solution shaping workshops

Architecture life cycle

Project managers and architects

Using ArchiMate for describing infrastructures

Kruchten’s 4+1 views for solution architecture

The SEI stack of solution architecture frameworks

TOGAF and infrastructure architecture

How to handle a Distributed Denial of Service (DDoS) attack

The Zachman framework

An introduction to architecture frameworks

Architecture Principles

Views and viewpoints explained

Stakeholders and their concerns

Skills of a solution architect architect

Solution architects versus enterprise architects

Definition of IT Architecture

Purchasing of IT infrastructure technologies and services

IP Protocol (IPv4) classes and subnets

My Book

What is Cloud computing and IaaS?

What is Big Data?

How to make your IT "Greener"

IDS/IPS systems

Introduction to Bring Your Own Device (BYOD)

IT Infrastructure Architecture model

Fire prevention in the datacenter

Where to build your datacenter

Availability - Fall-back, hot site, warm site

Reliabilty of infrastructure components

Human factors in availability of systems

Business Continuity Management (BCM) and Disaster Recovery Plan (DRP)

Performance - Design for use

Performance concepts - Load balancing

Performance concepts - Scaling

Performance concept - Caching

Perceived performance

Ethical hacking

Computer crime

Introduction to Cryptography

Introduction to Risk management

Engelse woorden in het Nederlands

The history of UNIX and Linux

The history of Microsoft Windows

Infosecurity beurs 2010

The history of Storage

The history of Networking

The first computers

Cloud: waar staat mijn data?

Tips voor het behalen van uw ITAC / Open CA certificaat

Ervaringen met het bestuderen van TOGAF

De beveiliging van uw data in de cloud

Proof of concept

Measuring Enterprise Architecture Maturity

The Long Tail

Open group ITAC /Open CA Certification

Google outage

Een consistente back-up? Nergens voor nodig.

Human factors in security

TOGAF 9 - wat is veranderd?

Landelijk Architectuur Congres LAC 2008

De Mythe van de Man-Maand

InfoSecurity beurs 2008

Spam is big business

SAS 70

De zeven eigenschappen van effectief leiderschap

Een ontmoeting met John Zachman

Persoonlijk Informatie Eigendom


Recommended links

Genootschap voor Informatie Architecten
Ruth Malan
Gaudi site
XR Magazine
Esther Barthel's site on virtualization
Eltjo Poort's site on architecture


Feeds

 
XML: RSS Feed 
XML: Atom Feed 


Disclaimer

The postings on this site are my opinions and do not necessarily represent CGI’s strategies, views or opinions.

 

Copyright Sjaak Laan