wiki:instruments

Version 4 (modified by agscott1, 9 years ago) ( diff )

--

What is Apple Instruments?

Apple Instruments is an application profiler that can be used to find performance issues by using a multitude of separate "instruments." These instruments collect and display relevant data such as: time, i/o use, network use, memory allocations and leaks, and much more. For simplicity, we will only be using the time instrument to profile ISSM. In this case, the time instrument measures how long it takes to run a certain test in ISSM. It will also generate a list of all functions used during the program's execution, as well as individual times for those functions.

Installing Instruments

First, you must be running Mac OSX in order to access and run Instruments.

If you have XCode installed, then Instruments is already installed! The Instruments application is located in the following folder:

/Applications/Xcode.app/Contents/Applications/

If you want to access the application from Finder, go to Applications, right click on Xcode and select "Show Package Contents." Then navigate to Contents/Applications/ to find the Instruments application. To make finding and opening Instruments easier, you can either create an alias of Instruments.app on your desktop or pin it to the dock.

If you do not have XCode installed, you need to install the command line tools. You can install command line tools by running this command on the terminal:

xcode-select --install

Instruments will then be located in the directory above.

Setting up a profiler test

Before using Instruments, we need to execute a test from MATLAB. Under $ISSM_DIR/test/NightlyRun/ create a profile_test.m file and paste the following code inside the newly created file.

%Test Name: PigTranHO
md=triangle(model(),'../Exp/Pig.exp',8000.);
md=setmask(md,'../Exp/PigShelves.exp','../Exp/PigIslands.exp');
md=parameterize(md,'../Par/Pig.par');
md=extrude(md,10,1.);
md=setflowequation(md,'HO','all');
md.cluster=generic('name',oshostname(),'np',1);
md.timestepping.time_step=0.5;
md.verbose=verbose('all');
md=solve(md,TransientSolutionEnum());

Since Instruments is a sampling profiler where the data collected is statistically approximated, it is important to create a test that will run for more than a few seconds. Tests that run longer will allow Instruments to collect more accurate sampling data. The test above should take ~26 seconds to execute in MATLAB.

When the test is run, a folder is generated under $ISSM_DIR/execution/ that should have the following format:

$ISSM_DIR/execution/profile_test-MONTH-DAY-YEAR-TIME
example: profile_test-06-30-2016-16-07-26-99507

In order to provide correct arguments to Instruments, we need to save the arguments generated in profile_test.queue. The profile_test.queue file and the portion that needs to be saved for later use is shown below.

[profile_test.queue]
#!/bin/sh
mpiexec -np 1 /Users/username/ISSM/issm/trunk/bin/issm.exe TransientSolution /Users/username/ISSM/issm/trunk/execution/profile_test-06-30-2016-16-07-26-99507 profile_test

[arguments of profile_test.queue to save]
TransientSolution /Users/username/ISSM/issm/trunk/execution/profile_test-06-30-2016-16-07-26-99507 profile_test

Setting up Instruments

In order to run Instruments on the test above, we need to open Instruments and select the Time Profiler.

http://i.imgur.com/wJO4Lkt.png

After selecting the Time Profiler, you will need to attach the target process to Instruments. To do this, select the button that says "All Processes" towards the top of the Instruments window. This option is shown highlighted by a red rectangle in step 1 of the image below. Then, select "Choose Target" to choose the ISSM executable. This option is highlighted in step 2 of the image below.

http://i.imgur.com/0zCNfA3.jpg

Next, navigate to $ISSM_DIR/bin/ and select issm.exe as the target process. The final step is adding arguments to the Arguments section below the target process list. The arguments that need to be added are from the end of the "Setting up a profiler test" section. Below is a screenshot of what your target should look like (with the exception of different arguments).

http://i.imgur.com/VWgJEpv.jpg

Profiling your ISSM test

After setting up the target, we can finally start to profile ISSM. To do this, just hit the record button and let Instruments run. The record button is just a red circle at the top by the pause button. In order to get a better idea of how ISSM is performing, it might be necessary to do a few "runs." This just means recording a new run once the previous run has finished. Recording multiple runs allows Instruments and ISSM to overcome any initial setup operations that might falsely inflate the time, compared to a normal scenario. Below is an image showing what happens when we create multiple runs.

We can also view each run by selecting the drop down arrow on the left of the "CPU Usage" label. If a run is unsuccessful or was accidentally paused before completion, we can delete that specific run by selecting it and going to Instrument -> Delete Run # from the main menu.

http://i.imgur.com/K0p992f.jpg

Obtaining Useful Information from a Run

As you might have noticed, there is nothing useful from the default view. In order to see what is happening, we need to change a few settings. To start, we click the gear icon located in the middle right hand side of the screen (next to an icon that looks like an 'E'). This shows us the display settings. Under the "Call Tree" options, we select: Separate by Thread, Invert Call Tree, Hide System Libraries, and Flatten Recursion. Make sure that these are the only selected options.

This will clean up our view of the call tree and will only show relevant functions ordered by time (from highest to lowest). The image below shows all of the settings needed to obtain accurate information about what functions within ISSM take the most time. The first red rectangle highlights where the display settings icon is while the second red rectangle shows where the "Call Tree" options that need to be changed are.

http://i.imgur.com/j8kCtok.jpg

Call Tree Option Explanations

From the previous section, we selected Separate by Thread, Invert Call Tree, Hide System Libraries, and Flatten Recursion. This section will explain these options as well as the other options we did not select, and why we chose those options.

Separate by Thread

For multi-threaded applications, or for applications that call a multi-threaded library, if turn this setting on, we will get to see what is executed under each thread during run time. This is left on by default, and is extremely useful for profiling multi-threaded applications or tests that might call multi-threaded functions.

Invert Call Tree

This is probably the most important option you can enable if you want to find the most time-consuming functions. Within the call tree there is a running time column and a self time column. The running time displays a total running time for that function, including other functions that are called from that specific function. The self time will show you how much time is actually spent inside the function, excluding other function calls.

For example, if functionA() spends 200ms before calling functionB(), and functionB() takes 400ms, then functionA() will have a running time of 600ms while functionB() only has a running time of 400ms. This makes it seem like functionA() takes the most time, but in reality functionB() is actually the slowest function. With invert call tree on we examine leaves of the call tree. This means that functions which do not call other functions are shown first. With this setting on, sorting by run time actually displays the correct run time and therefore gives us a better idea of what we can optimize.

Hide System Libraries

This setting cleans up our call tree view by hiding away functions we don't have to worry about.

Flatten Recursion

Instead of creating a separate symbol for each level of recursion, only a single function name and symbol is generated for the entire recursive call. This allows us to see what the total time is in a recursive function instead of seeing a partial amount of time for each level of recursion.

Top Functions

Note: See TracWiki for help on using the wiki.