Exercise 6: Debugging and Optimization




!! Electronic student evaluation of the courses !!

All students are asked to login on the ABSALON web pages for all the courses they follow in this block and fill out the evaluation form.

You must do this within the period 9th to 26th October.

So why don't you do it right now?!




Introduction:

Making an error free program first time round is not very likely -- as you all know by now -- and as the program package increases in size the likelihood for making mistakes increases too. Errors can occur in many situations, some are easy to detect while others are minor and only make a small signal on the final result. Whatever characteristic the error has, you normally want to eliminate it to get an error free program. There are different approaches to track errors. One can take the simple approach by including print (or write) statements into the program at relevant places. Recompile and rerun the program. Investigate the output and determine where to put new print statements while removing some of the previous ones. This way it is possible after a number of cycles to get close to the area in the code that is responsible for the problem. This approach works well, but it can take a long time. A different and usually faster approach is to use a debugger. These come in two flavors. The simple command language driven debugger, where you slowly step through the program in a shell like environment, typing various commands. To use these you have to learn all, or at least the basic, commands and generally to have one or more editors open showing the source code as you work your way down through the program statements. The second class of debuggers use a GUI, where it is possible to see the source code, set strategic breakpoints and investigate values of all local variables as you work your way through the program statements. In this exercise we are going to try the last type of debugger environment.



Getting the required files:

Download the tar file to your ~/Dat_F directory and unpack it:

> cd ~/Dat-F
> tar -zxvf fortran6.tgz
> rm fortran6.tgz

This creates a new directory "Experiment_6" that contains the files needed for the exercise.



Debuggers:

A debugger is a tool that can help you to see what happens in the code while it is running on the computer. Before the debugger can be used on the code, it has to be compiled in a way that the debugger can step through the code while providing useful information about it:
To reach this point, compile the code with the option: -g. At the same time optimization of the code has to be remove. Optimization gets the code to execute faster, but this is often done by rearranging the order in which the statements in the code are executed. This confuses the debugger and we can't be sure that the place where the code seems to be in the debuggers graphical window is in fact also where it is in the execution process. Therefore use only -O0 (capital o and zero) when recompiling the code. When this is done we can use the debugger on the executable to find problems in the code.

For today's exercise, you have to debug and correct a number of typical errors placed in the program included in the tar file. As you may already have see, the program is an extension of the code you wrote for the three body problem in exercise 4. This version includes the possibility for using both the simple expression for the time derivative of the acceleration (eq(8)) and the full expression (eq(5)) given in that exercise and is able to handle more than 3 objects by using allocatable arrays.

As a start, make the relevant changes to the Makefile, such that the compilation has the correct options. Then compile the code.

As you are already acquainted with the problem there is no introduction to the problem here. If you do not remember the background, go back and check it out in exercise 4.

To run the program from the prompt you simply have to type,

$ ORBIT.x < data.in

This is what the solution may looks like when you are finished debugging the code and correcting the imposed errors. Only the left half are likely to be identical to this one as only very small truncation errors on different CPUs creates exponentially growing differences...



And this image is just to show how the energy conservation is fulfilled for this integration.



The section below gives you information about how to use the debugger and find the problems in the code.




Data Display Debugger --- DDD

The Data Display Debugger (DDD) consist of a GUI interface that can use a number of command line debuggers. DDD is a GNU product and it is as such made to use GNUs debugger gdb. Initially gdb didn't support fortran codes as GNU product didn't include a fortran compiler. But that has changed recently, so you should in principle have two possibilities. Either to do the exercises using the gfortran compiler, or use Intel's Fortran compiler, ifort. For some reason the gfortran compiler doesn't allow you to do what is need in this exercise. Therefore you must use ifort for this exercise. As a part of the ifort package, there is also a more advanced debugger, idb. The older versions of idb has both a command line and a graphical interface, while the newest version, release 11, only provides an advanced graphical interface. idb is gdb compatible, and that makes it possible to use versions 10 or less of idb as the underlying debugger for DDD.

Debugging is a very important task in software development. The most important issue in programming is to make a full proved program that can handle all possible scenarios in a sensible way. Before this state is reached, it is normal that there are parts of the program that for some reason do not behave as they should, and therefore causes the program to either crash or provide you with wrong answers.

To test a program you need to have a series of test cases where you know the answer. Test the program against these and resolve eventually problems that arises. When the numerical results are consistent with your expectations, it is most likely that the program does what it is expected to do. On the way to this you will, repeatedly, experience that various things do not behave to your expectation. To find out where and why the error occurs you debug the code.

We are going to use the ddd debugger interface this time. The reason is that it is sufficiently simple to use that it does not requires a long introduction, while still providing the basic facilities needed when debugging. To start DDD using the Intel debugger, idb, you type

$ ddd --debugger "idb -gdb"
using version 10 or older of ifort (which they have at the fys.ku.dk at present. Or

$ ddd --debugger "idbc -gdb"
when using version 11 on lynx.

Using the gfortran and gdb directly you only have to type

$ ddd

(For this exercise you need to use the ifort compiler as there is a problem with debugging the code I have provided using the gfortran compiler -- it does not all you to view function values of dynamically allocated arrays.) This opens a window containing three different sections (see the image below). The top one is for displaying variable values, the middle frame hosts the source code, while the bottom one contains the command line interface which includes the IO from the running program. To load a program into the interface you choose File from the top line. In the new window, click on open program and choose ORBIT.x. This opens the source file of the main routine and puts the pointer at the top of the main routine.



Before starting to debug the code, you need to set a breakpoint in the source. This is a point where the execution will stop and allow you to look at various values in the code. You could for instance put one in the line containing the call input. This allows you to see how the program reads the initial data into the code. To start the execution choose Program from the top menu, and then click on run. This opens a new window. In the lower input line you must type

<data.in

which makes the code read the input from this file.

At the same time a small window called DDD appears (see the image below). The choices here are the ones needed for running through the program.

Command Impact
run Start running the code
interrupt Stop the execution now
step step one line at a time - going into underlying subroutines/functions. It also steps into standard fortran routines as print and allocate. To avoid this use next to jump over these internal routines.
next step one line, do not step in to routines
up Move one level up, if not at main level
down Move one level down, if possible
cont Continue execution
kill kill the process
undo undo the last command
redo redo the last command
edit opens the source code in an editor
make will in principle recompile the code, but....


To see a value of a variable, place the courser on top of it and its value will appear on the screen. To follow a series of values while progressing through the program mark it with the cursor and click on display in the top bar. This will place a copy in the top frame. The values shown here will be updated each time the program stops. If you are dealing with a 1 or 2D array you may use plot instead. This puts the variable in the top frame, while also starting Gnuplot, which plots the data array in a new window - one for each variable chosen. The plots are also updated each time the debugger stops.

To get the Gnuplot interface to work you have to change the default settings. This is done by clicking on Edit and then choose Preferences. In the window that appears you choose Helpers and choose External for the plot window. You can remove the many comments about x-applications by choosing Suppress X Warnings in the General page in the Preferences window.

You are now ready to start finding the planted errors and correct them.

To help you, here is a table showing the number of errors placed in each files:
File # errors
data.in 1
main.f90 3
data.f90 1
timestp.f90 1
io.f90 3
math.f90 3
orbit.f90 4
Makefile 2


When the program is running you can compare the results with the images above using gnuplot in an independent window. Start with making shorter runs, only allowing the solution to advance 1 or 2 time units.... If this seems OK you can do the full 7 time units to get the same images -- before doing this you need to change the options for the optimisations of the compiler from -g to -O3 and recompile the code -- otherwise it will take a very long time to run.....

I have created a combined PNG image containing both the orbits and the energy plots and called it orbit.png Credits: 2/-1



Performance analysis

Now that the program is working it is time to make a different investigation. Namely looking at the performance of the code. There are different tools that can be used for this purpose and here you will try one of these. As for the debugging, you have to compile the code with special compiler flags to be able to sample the run and obtain the data you need. For this purpose you have to change F90FLAGS to

F90FLAGS = -O3 -p

-O3 implies high optimization of the code, giving a fast execution of the program -- you can try to use different values after O to see the effect. Measure the run-time using the Unix command time -- if you don't know how to check the man page.... The -p option informs the code that it has to provide information needed for profiling while running. For this to take effect the code has to be recompiled from scratch. When you have done this, change the finish time in the *.in file to a smaller time (=1) and run the experiment from the command line by typing

$ ORBIT.x < data.in > log

The result of this, apart from the normal data output, is a file called gmon.out. This file contains information for each short time interval (0.01 second) about where in the code it was at that specific time. If this sampling time is short enough, and the duration of the experiment long enough, it gives a fair picture of the time usage in the various routines. To extract this information to an easy readable format you have to do:

$ gprof -z ORBIT.x > gprof.out

This creates the file, gprof.out, which contains information about the time usage in the various parts of the code, in ASCII format. Open this file with a text editor and look at the numbers. Here is a small part of the file, (for a different experiment!):

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 57.53      0.42     0.42     7569     0.06     0.09  pde_
 15.07      0.53     0.11    98397     0.00     0.00  stagger_mp_xdn_
  5.48      0.57     0.04    52983     0.00     0.00  stagger_mp_ddxup_
  4.11      0.60     0.03    45414     0.00     0.00  stagger_mp_xdn1_
  2.74      0.62     0.02    98397     0.00     0.00  stagger_mp_ydn_
  2.74      0.64     0.02    45414     0.00     0.00  stagger_mp_zdn1_
  1.37      0.65     0.01    98397     0.00     0.00  stagger_mp_zdn_
  1.37      0.66     0.01    45414     0.00     0.00  stagger_mp_ddzup_
  .....

The first table in the file, from which the first few lines above are shown, gives a sorted table of the routines according to their relative percentage of the time usage during the run. The first column shows the percentage of time used in the routine, the second the cumulative time and the third the actual time spend in the routine. Some of the routines contain call to other routines, and the splitting up of time in these are shown in tables further down in the file. Using this information, it is possible to find the routines that are using the largest fraction of time, and therefore are potential targets for using your effort to see if it is possible to improve the performance of the code. Depending on the problem, this may require rewriting some of the routines, restructure others and possible looking for more optimal re-usage of the variables when they are placed in the local ram.

The program we are investigating is smaller than the one shown above, and the spread in time consumption is not nearly as skew as for that case. But there are still routines that uses more time than other routines and it is these and their underlying routines you should concentrate on to see if you can find ways to improve the performance. There is not one, but several smaller issues that together can provide a nice speedup of the code. Lets make this a into a little competition.... How fast can you get the program to run?

Some hints to help you on the way: For each of the files you are going to change to increase speed copy them into a new file and alter the Makefile to use these (while still being able to compile the old version when required)

To get the real timing of the program on a multi user system, you can't rely on the timing gprof provides -- it simply clock the usage of real time. This time includes idle time on the CPU without being able to tell you how much time is spend waiting on other processes. Therefore run the profiling a few times and look at changes. The time command gives you information about how much CPU-time relative usertime the program has used... It is the CPU-time that is important here.

How much is the speedup of the program after rewriting routines? (t_old/t_new) -- use the Unix time command and the relative CPU-time usages Credits: 2/-1

The code you have used here uses dynamical memory allocation. Rewrite the initial code (that you backed up earlier) to use static arrays -- predefined at compile time -- and allow for only a fixed number of objects.

What is the speed change initial code and the same one using static memory (t_dynamic/t_static) Credits: 2/-1

Before you are finished, try to change between the two compilers (ifort <-> gfortran) and compare the execution time.... As you can see then it is not only a task of optimising the code, but it also depends significantly on which compiler one used for the actual run.



Cleanup!

Finally, just a reminder to clean up your disk space by removing all the data files etc that are not needed any more... You may also have some left overs from the last few exercises..... So please, spend a few minutes to cleanup......



$Id: index.php,v 1.7 2009/10/09 08:03:01 kg Exp $ kg@astro.ku.dk