Running example models and using a cluster

From BanghamLab
Revision as of 13:23, 15 June 2011 by AndrewBangham (talk | contribs)
Jump to navigation Jump to search

The purpose of these tutorials is to learn how to run the example growth simulations included in GFtbox. We will describe methods for running simulations locally (on your own computer) and remotely (on a computing cluster). It is assumed that you have already downloaded the GFtbox software and have Matlab installed.

Getting Started

The remainder of this page is split into four sub-tutorials, each building on the knowledge you have learned from the preceding ones.

1) Explaining the tools. In these tutorials, we will be using GFtboxCommand and ClusterMonitor. This section explains the purpose of these tools.

2) Computer or cluster? Here, we illustrate when and why you should use a computing cluster for your growth simulations.

3) Running a growth simulation for an example model. This section demonstrates how to use GFtboxCommand to run a growth simulation on your computer. The model used for the simulation is an example model included with the GFtbox.

4) Altering the simulation parameters. Following on from part 2, here we show how to adjust a simulation parameter within GFtboxCommand. Specifically, we alter the value of dt, the temporal resolution of the simulation, and show how it can be used to verify the correctness of the value specified in the published literature.

5) Altering the model parameters. Finally, we demonstrate how a number of model parameters can be varied by specifying a range of values for each model argument. We show how the computationally expensive task of simulating all combinations of specified ranges can be more efficient if a computer cluster is used via the ClusterMonitor tool.

1 Explaining the tools

GFtboxCommand

This is a command line version of the GFtbox. By command line, we mean that all program functions are operated via typed commands, without the GUI. Like GFtbox, GFtboxCommand is capable of running growth simulations of an interaction function, and allows the user to specify model and simulation parameters. Unlike GFtbox, this also allows the user to select ranges of values for a number of parameters, and will automatically spawn multiple simulations which explore the various combinations of those parameters. This can be used to evaluate the effect of various parameters on the growth of a given model.

ClusterMonitor

Provides a graphical interface for managing simulations running remotely on a computer cluster. Specifically, it allows you to see which projects are present and running on the cluster, to retrieve the completed projects, to generate images of the simulations at specified stages of growth, and to remove projects from the cluster. If you do not intend to use a computer cluster, then you will not need to use ClusterMonitor.

2 Computer or cluster?

In basic terms, a computing cluster is effectively a network of many computer processors (often hundreds), centrally managed by a queuing system. When a job is submitted to a cluster, the job is run on a free processor, or queued until one becomes available. By contrast, a typical desktop computer will contain one processor. Jobs which must be run independently and sequentially on a desktop computer can be executed in parallel on a cluster, greatly reducing the total time required to complete all of the jobs. Although the exact details of your cluster might vary from those described in the rest of these tutorials, we aim to illustrate the generic processes involved in using GFtbox on a cluster. We are happy to offer assistance, where possible, to setup GFtbox for your cluster, so please contact us if you have any queries.

3 Running a growth simulation for an example model

This tutorial is aimed at running a growth simulation for one of the example interaction functions included with the GFtbox. The purpose of this exercise is to firstly demonstrate how simulations can be invoked using GFtboxCommand, and secondly to show how to reproduce experimental results (specifically, those published in Kennaway et al (2011)) given an interaction function.

Assuming Matlab is installed on your computer, and the latest GFtbox has been downloaded, you can add the GFtbox directory to Matlab's search path, which makes the toolbox accessible from any other path that you choose to work from. For a short tutorial on how to do this, please click here.

Once the GFtbox is added to Matlab, you are ready to run a growth simulation using GFtboxCommand. In this example, the model that we will simulate is called GPT_CASE_RST. Results generated using this model are published in Kennaway et al (2011). By running this simulation, we can confirm the results in the published literature and investigate the suitability of the various parameters.

3A - Running a simulation on your computer

The following command can be typed into Matlab to run a simulation of the GPT_CASE_RST interaction function, which contains three growth models: R, S and T. Three separate simulations are run sequentially on your computer, one for each model, each producing results corresponding to five intervals in the growth simulation.

       GFtboxCommand('Path','/GrowthToolbox/Models/Published/Kennaway-etal-2011/','Name','GPT_CASE_RST',...
       'Stages',[20 100 140 180 200],'modelname',[1:3]);

GFtboxCommand accepts parameters in pairs, separated by a comma, e.g. 'modelname',[1:3]. The parameters entered here are: Path, Name, Stages and modelname.

% Batches of simulations can be constructed using the State parameter, as described by this short tutorial. Here, the value 'Start' is specified, which means that we have finished constructing our batch and want to run it now. In this case, our batch will contain only the simulations specified by the remaining GFtbox parameters.

The optional Path parameter refers to the location of the folder (or directory) on your computer where the model interaction function you wish to simulate is stored. Name, is the name of the folder itself. In this case, we are using the GPT_CASE_RST folder which is included in the GrowthToolbox. You may wish to copy this folder elsewhere, if you intend to make changes to the interaction function.

During a growth simulation, a mesh can be generated at each time step of the simulation, which provides a visual representation of the growth of the biological tissue, given the various parameters of growth specified by the interaction function and how they have changed over time. Put another way, the mesh shows exactly what the growing tissue actually looks like. Stages refers to the points in the simulation (measured in hours) at which meshes should be generated and saved. In this example, five stages of growth will be written to disk. These values are chosen to best capture the appearance of the tissue at important stages of the tissue growth.

The final parameter listed here is modelname. This is a model-specific parameter, and in this case the GPT_CASE_RST interaction function contains three separate models for plant growth, allowing the desired model to be a selected. Here, the value [1:3] is specified, which is evaluated in the same way as entering [1 2 3]. This instructs GFtboxCommand to run three separate simulations, one for each of the growth models contained in the interaction function.

The function of every permissible parameter is given by keying the following command into Matlab:

       help GFtboxCommand

3B - Getting results from a completed simulation

Once a growth simulation is completed, the project folder (GPT_CASE_RST, in our example) will contain another folder named "movies". Within movies, there are folders which contain results for the executed simulations. As was instructed in 2A, here we can see three folders corresponding to simulations for the three separate models. Within the first folder there are three items:

CommandLine.txt - This file contains the Matlab command which was used to generate the results that this sub-directory contains.

gpt_case_rst.txt - This file, named according to the project name, contains a copy of the interaction function which was used to produce these results.

meshes - This folder contains the mesh files corresponding to the stages of growth specified in the Stages parameter, described in 2A.

The mesh files contain vertices information regarding the shape of the growth model at a particular stage of growth, but are not visualisable in that form. In order to convert these mesh files into viewable images, we can execute the following command in Matlab:

       VMSreport('Path','/GrowthToolbox/Models/Published/Kennaway-etal-2011/','Project','GPT_CASE_RST',...
       'Experiment','All','flattentime',572.5,'morphogen','KPAR','SNAPFIG',true);

Where the Path and Project parameters have the same function as Path and Name in 2A, i.e. Path refers to the location of the project folder and Project is the name of the project folder itself.

Here we can see the generated images, including the command line arguments, for one of the simulations performed in 2A.

NB. The results produced may not be visibly identical to those in the published literature. This is because of small, random perturbations which are applied to the initial model meshes to stop them from containing surfaces which are perfectly flat, and therefore biologically unrealistic. The results produced should be qualitatively but not quantitatively the same.

4 Altering the simulation parameters

One such simulation parameter is dt, which is the time in seconds between iterations in the growth simulation. Large values of dt mean that fewer steps and therefore fewer calculations are required to complete a simulation. Whereas smaller values of dt mean that more steps are required, and therefore more processing time too. Therefore, a value of dt must be selected which is not so small that it is computationally unmanageable, but not so large that the observed growth is an artifact of the value of dt, rather than the underlying growth model. It is necessary therefore to test a range of values for dt to ensure that the patterns of growth observed in a simulation are consistent across the range, and to find a value which is sufficient to demonstrate the model of growth and computationally efficient.

Testing a range of dts can be achieved in several ways. One way is to use the dt parameter when calling GFtboxCommand. This allows a single value to be tested. Another way is to make a batch of jobs, each using a different value for dt, by using the 'State' parameter, as described here. Lastly, a range of dts can be specified in GFtboxCommand, by using the dt parameter pair, where the values within the square brackets are the dts to simulate:

       GFtboxCommand('State','Start','Path','/GrowthToolbox/Models/Published/Kennaway-etal-2011/','Name','GPT_CASE_RST',...
       'Stages',[20 100 140 180 200],'modelname',3,'dt',[0.1 0.5 1 5]);

Here, we have specified four separate values for dt, and this means that four separate simulations will be run sequentially on your computer. As in 2B, images can then be generated for the mesh files produced, and compared to ensure consistency across the range of dts. In this figure, we can see the same model, at the same stage of growth, generated using the four values of dt. Though quantitatively different, they are qualitatively the same, illustrating the suitability of the default dt value of 5.

5 Altering the model parameters

It is easy to see that when even a small number of range variables, mutations or dts are specified, the total number of simulations increases quickly. If many ranges are specified, then the amount of processing time becomes unmanageable on a single computer. Via GFtboxCommand and the ClusterMonitor tool, we can remotely run a number of simulations, in parallel, on a computing cluster. It is assumed that the GrowthToolbox and Matlab are installed on your cluster, and PuTTY is required on your local computer for the tools pscp (for transferring files from your computer to a cluster) and plink (for remotely executing commands on a cluster). Using the dt range parameter from 3 as an example, here we add the 'Use' parameter with the value 'Cluster':

       GFtboxCommand('State','Start','Path','/GrowthToolbox/Models/Published/Kennaway-etal-2011/','Name','GPT_CASE_RST',...
       'Stages',[20 100 140 180 200],'modelname',3,'dt',[0.1 0.5 1 5],'Use','Cluster');

The 'Cluster' option for the 'Use' parameter instructs GFtboxCommand to upload the required project directory to a remote Linux server. Instead of running the simulation on your own computer, GFtboxCommand then works out how many individual simulations are specified by the command that invoked it. In this example, the only parameter which will generate multiple simulations is dt, of which there will be 4. Separate commands for each of these jobs are then automatically generated, each with an accompanying unique ID (The value of the ExpID parameter), and these are submitted as individual jobs to the cluster.


       GFtboxCommand('Name','GPT_CASE_RST','Stages',[20 100 140 180 200],'modelname',3,'dt',0.1,'ExpID','GPT_CASE_RST_1');
       GFtboxCommand('Name','GPT_CASE_RST','Stages',[20 100 140 180 200],'modelname',3,'dt',0.5,'ExpID','GPT_CASE_RST_2');
       GFtboxCommand('Name','GPT_CASE_RST','Stages',[20 100 140 180 200],'modelname',3,'dt',1,'ExpID','GPT_CASE_RST_3');
       GFtboxCommand('Name','GPT_CASE_RST','Stages',[20 100 140 180 200],'modelname',3,'dt',5,'ExpID','GPT_CASE_RST_4');

Once these jobs have been submitted, the ClusterMonitor tool opens and the new job batch ID is added to the list of jobs. Here, we will concentrate on three of the functions of ClusterMonitor: Queue?, Get project results and Make project pngs. As in part 3 of this page, these functions will enable us to visualise the growth simulation results.

Whilst ClusterMonitor is a useful tool for managing your cluster jobs, it is advisable to have at least a rudimentary understanding of and ability to use a Unix-based computer system. In particular, the abilities to list the contents of a folder, view the contents of a file, change the current working folder, delete, and to copy or move files or folders, are essential Unix skills for making sure everything is working as intended. It is beyond the scope of these tutorials to provide an in depth Unix tutorial (many good and simple tutorials exist on the web), but here is a short description of the Unix commands (which may be specific to our Unix-based system) that we believe to be important.