-
CPU Analysis
Lets
say you have a
performance issue that needs to
be analyzed and you want to check how your CP's are performing. This
article will discuss this for both the z/OS mainframe and the
UNIX/LINUX systems using RMF Monitor I and VMSTAT reports respectively
as examples.
RMF Monitor I records are
written to SMF type 70
records. There are two subtypes within this record. Subtype 1 contains
CPU, PR/SM and ICF activity; which is the information that is
typically used. In
order to process the
SMF type 70 records you need to run the RMF Post Processor or create
your own code using a programming language such as SAS with MXG or
MICS. I know of people who have created programs using assembler or
Pl/I, however this is laborious and I would not recommend it. We will
use the output of the RMF Post Processor CPU report for our examples.
At the top of the report is information about each logical processor.
It shows the lpar busy time %, the MVS busy time %, the
logical
processor share %, and a couple of fields related to I/O.
The
lpar busy time % is the time that PR/SM dispatched the processor during
the interval.This time has a different meaning depending on whether you
are running dedicated partitions, shared partitions with wait
completion=no, or shared partitions with wait completion=yes.
Dedicated partitions are
lpars that have physical processors dedicated
to it. In this case
LPAR
BUSY=(ONLINE TIME-WAIT TIME )/ONLINE TIME*100(e.g., You are measuring
RMF in 15 minute intervals. The processor was online for the full 15
minutes and waited for work for 10 minutes. The lpar busy would be
(15-10)/15*100=33.3%). Dedicated partitions provides the best lpar
performance; however, you cannot maximize your processor investment
because any unused CPU cycles cannot be used by other lpars which has
the potential to increase the organizations costs.
Shared
Partitions on the other hand maximizes your processor investment,
however you give up some CPU cycles for managing the shared processors.
Shared partitions can be run with either wait completion set to yes or
no.
Wait completion=yes tells PR/SM to dispatch the lpars
based upon a time driven mechanism. This means that even if the lpar
has no more work to do it will have control of the logical
processor until the time interval has elapsed. In this case
LPAR BUSY=(PARTITION DISPATCH TIME-WAIT TIME
)/ONLINE TIME*100(e.g., You are measuring RMF in 15 minute intervals.
The processor was dispatched for 6 minutes and out of those 6 minutes
it waited for work
for 3 minutes. The lpar busy would be (6-3)/15*100=20.0%).
Wait
completion=no tells PR/SM to dispatch the lpars
based upon an event driven mechanism. This means that as soon as the
lpar has no work to do the processor is free to be dispatched
on
another lpar. In
this case LPAR BUSY=(PARTITION DISPATCH TIME)/ONLINE
TIME*100(e.g., You are measuring RMF in 15 minute intervals.
The processor was dispatched for 6 minutes. The lpar busy would be
(6)/15*100=40.0%).
You
will notice that in this formula there is no wait time. This is the
mechanism that most organizations use to balance performance with
maximizing their processor investment.
The
MVS busy time % is a little bit confusing. Whereas lpar busy is derived
from instrumentation within the hypervisor (the microcode that controls
PR/SM) MVS busy is CPU busy from the perspective of the
operating
system (z/OS). Essentially it is the amount of time that z/OS did not
go into a wait state. In a perfect system lpar busy and MVS busy would
be equal. However, many times you see LPAR busy less than MVS busy.
This is because there are times when z/OS dispatched work on one of its
CP's but PR/SM
took away that physical CP from the logical CP to give it to another
lpar. In this case the z/OS busy clock keeps ticking while the lpar
clock has stopped. If there is a significant difference between these
two values it signifies that there could latent
demand.
The logical processor share %
signifies the percentage of the physical processor that each logical
processor is entitled to.
This
part of the report also has two fields related to I/O interrupts: I/O
total interrupt rate and % I/O interrupts handled via TPI. IBM provides
a parameter in sys1.parmlib(ieaoptxx) called CPENABLE that manages the
number of processors that fields I/O interrupts. IBM has been changing
their recommendation for this parameters values over the years as new
hardware is introduced. Whenever you replace your mainframe with
another model you have to check to see what the CPENABLE recommendation
is. The following is a link to a techdoc that discusses this parameter
and IBM's recommendation: http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10337
At the bottom of the
report is the
Partitioned Data Report. This displays information for each of the
lpars on the machine. The keys fields are under the heading "Average
Processor Utilization Percentages". This shows both the logical
processor and physical processor utilization. Logical processor
utilization is the utilization of the lpar and is based upon the number
of logical cp's allocated to the lpar. Physical processor utilization
is how much the lpar is consuming of the machine.
There are
also two columns: effective and total. Total is the lpar utilization
that includes the lpar management time; while effective does not
include the lpar management time. There is also a pseudo lpar in the
report called "physical". This is lpar management time that is measured
but cannot be attributed to any specific lpar. Also, if you
have
ICF's they are reported separately from the general purpose processors.
The
middle of the report provides address space information.
The keys
fields to look at are "Out Ready" and "Logical Out Rdy". Both
of
these signifies that tasks are swapped out but are ready to execute.
When these numbers are greater than 0 it means that some work is being
delayed.
Click here for an example of an RMF Monitor
I CPU Report
Now
lets discuss CPU performance analysis under UNIX/LINUX.
Since many different flavors of UNIX (e.g.,AIX,
HPUX,
SOLARIS, LINUX) exist there a subtle differences in the
output
and flags associated with the various tools. The following metrics,
however, are standard: us, sy, id, and wa.
The acronym us
stands for user cpu; which is the cpu % of the application code
(roughly equivalent to the TCB time in z/OS. The acronym sy stands for
system cpu and is the cpu% that the system uses on behalf of the
application. This is also sometimes called kernal time and is roughly
equivalent to SRB time in z/OS. The acronym wa stands for waiting on
I/O. This means that the processors were idle but there is some disk
I/O in progress. The acronym id stands for idle time. This means that
the processors were idle and there was no I/O in progress.
To
determine the CPU usage of the machine you add up the sy and us time.
If you see a high wa time (IO wait) then you could have an
I/O
bottleneck that is impacting performance.
AIX has a
virtualization technique called logical partitioning (lpar for short).
Using this you can create multiple images on one physical machine and
can either dedicate or share physical processors among the various
images. When you are running with shared processors there are two
additional metrics:pc and ec. The acronym pc stands for the number of
physical processors consumed. The acronym ec stands for entitled
capacity consumed. In the AIX lpar environment this is a more accurate
measurement of cpu usage than summing us and sy; although these metrics
are still valuable in terms of understanding the ratio of user to
system time.
When configuring the AIX lpar environment you
specify the minimum, maximum, and desired processing units that you
want to have. The entitled processing units is determined at boot time
based upon the order that the lpars are booted and the total entitled
processing units allocated up to that time. The hypervisor (microcode
that controls the virtualization) will attempt to provide the desired
capacity units, however this is not guaranteed. The total entitled
capacity units cannot exceed the number of physical processors on the
machine. The ec, therefore, is the percentage of the entitled capacity
that is consumed. A useful command for determining the current values
is "lparstat -i".
A very useful metric has the acronym
'r'; which stands for run-queue. The run-queue shows the number of
threads that are both waiting to run and are running. A useful rot is
that the run-queue should be no greater than the number of
processors available to the image. If the run-queue is
greater
than it means that some work is being delayed.
All of the UNIX/LINUX
metrics discussed above are available in SAR, VMSTAT, and TOPAS.
Click here for an example of a LINUX VMSTAT
report.
Click here for
an example of a AIX VMSTAT report.
Please
click here to sign my guestbook
All comments and questions are
appreciated. Please contact Joel
Wolpert at
joel@perfconsultant.com
