How to use GARLI 2.0

By Peter Unmack

At first glance GARLI seems a little bit complicated, but once you’ve used it a couple of times it is all very simple. The part about GARLI I like the most is it is fast, but also provides the option to use different models of sequence evolution on separate partitions. Everything here has been gleaned from the GARLI webpage, I’d suggest you spend some time getting familiar with it at first.

Before you get started, I find it simpler to put all of your outgroup OTUs first or last, as GARLI lists the outgroups by their numbered order, not their OTU name. Input files can be nexus, phylip or fasta, but only nexus format provides the options for using partitions.

GARLI usually runs from the command line, but don’t panic, it is all very simple. All of the options / settings are in a configuration file that you provide, whose default name is garli.conf, I provide a list of the settings I usually modify below. Note when seaching for the best ML tree vs conducting bootstrapping you need to conduct different runs of the program (with different settings in the garli.conf file). You also need an additional application to analyze the trees, the details of which are provided under SumTrees. See below for the differences in the garli.conf file for each approach.

Note that you have to run each instance of GARLI in a different directory. I usually setup a directory for a group of runs, within that directory I make five directories, one for each bootstrapping run (usually for 250 reps each to create a total of 1000 reps, I call each directory bs.1, bs.2, etc.) and one for creating the best ML tree (which I call nobs). I get the datafile, config file and any script command file (if you are running it on a cluster), put them in the first bs.1 directory, check that it works, then copy and paste the same identical files to the other three directories. Note that the search for the best tree will have a different config file and probably script command file if you use a different name for the .conf file.

How you start GARLI depends on your setup.

I always split up bootstrapping across multiple runs so that I get the results quicker. If you need 1000 bootstrap reps then I usually do four runs of 250 reps each, that way the analysis is completed in a quarter of the time. Either way, you need to use a different suite of programs to place bootstrap values on the best tree irrespective of single vs multiple runs.

SumTrees / Python / DendroPy

SumTrees is a neat utility that runs via DendroPy (which needs Python). Note that the commands to generate an ML tree with bootstrap values changes slightly depending on which version of SumTrees you use. This is based on DendroPy version 4.4, if you need the old command for version 3.12.0 the instruction are listed on the old version of this file, along with my old python install instructions. To determine if you already have it installed, and which version see https://wiki.python.org/moin/BeginnersGuide/Download.

My most recent installation of python used Anaconda as per Rob Lanfear's suggestion for running partition finder. Once that is installed you can install dendropy using these instructions. I seem to recall that it was a bit convoluted to install and that I had to update pip or something else first, with a little googling I figured it out though. Note that install will differ on different operating systems depending on how you do it. On my machine the sumtrees script was installed into this directory: C:\ProgramData\Anaconda2\Scripts

Once you have DendroPy sucessfully installed then start an Anaconda Prompt (via the start menu in windows), cd to the directory that has the files you are manipulating (I copy my output files from GARLI, usually the best tree and four bootstrap files) to a directory and I rename them to match the command below.

sumtrees.py --decimals=0 --percentages --suppress-annotations --no-taxa-block --output-tree-format newick mybootrun.boot1.phy mybootrun.boot2.phy mybootrun.boot3.phy mybootrun.boot4.phy --target=mysearch.best.phy --output=supportOnBest.nwk

This outputs the bootstrap values on the best tree and outputs it in newick format. I can then open that file in Mega for final manipulation prior to making the tree figure for publication.

Here I review the changes you should make to the garli.conf file, but you should be familiar with the full list on the GARLI website provided under GARLI Configuration Settings. You should also read the GARLI FAQ too. Items in yellow are what I typically alter for each dataset (I've also added notes on changes to the defaults that I made), your situation may differ though. You can copy and paste what is provided below, or grab the default garli.conf file from the installation directory and modify the appropriate values. Or, you can download the garli.conf files that I use to generate the ML tree and for bootstrapping. Be sure to change the datafname and ofprefix values.

Note when seaching for the best ML tree vs conducting bootstrapping you need to conduct different runs of the program (with different settings in the garli.conf file).

[general]

datafname = your.data.file.name.nex [the name of your datafile]

constraintfile = none

streefname = random [changed from the default of stepwise]

attachmentspertaxon = 160 [I set this value to double the number of OTUs]

ofprefix = gad_cytb [prefix of output file names GARLI creates]

randseed = -1

availablememory = 512

logevery = 10

saveevery = 100

refinestart = 1

outputeachbettertopology = 0

outputcurrentbesttopology = 0

enforcetermconditions = 1

genthreshfortopoterm = 10000 [changed from the default value of 20000, I use 10000 for bootstrapping, 100000 for best ML tree searching]

scorethreshforterm = 0.05

significanttopochange = 0.01 [I change the default value of 0.1 to 0.00001 for best ML tree searching]

outputphyliptree = 1 [changed from the default of zero so that nexus and newick trees are exported]

outputmostlyuselessfiles = 0

writecheckpoints = 0

restart = 0

outgroup = 77-80 [identifies the outgroup OTUs in your dataset, can leave out, thus trees will not be rooted--either way does not affect your analysis]

resampleproportion = 1.0

inferinternalstateprobs = 0

outputsitelikelihoods = 0

optimizeinputonly = 0

collapsebranches = 1

searchreps = 1 [if not doing bootstrapping then use a value like 10]

bootstrapreps = 250 [leave as zero if simply searching for best tree]

[model1] [these all depend on what model of sequence evolution you use, see https://molevol.mbl.edu/index.php/Garli_FAQ#MODELTEST_told_me_to_use_model_X._How_do_I_set_that_up_in_GARLI.3F]

datatype = nucleotide

ratematrix = (0 1 2 3 1 4)

statefrequencies = estimate

ratehetmodel = gamma

numratecats = 4

invariantsites = estimate

[if you have multiple partitions, repeat the section above, but call it [model2] through however many partitions you have, read https://www.nescent.org/wg_garli/Using_partitioned_models to fully understand what to do here] (link currently broken!)

[master]

nindivs = 4

holdover = 1

selectionintensity = 0.5

holdoverpenalty = 0

stopgen = 5000000

stoptime = 5000000

startoptprec = 0.5

minoptprec = 0.01

numberofprecreductions = 10

treerejectionthreshold = 20.0 [changed from the default value of 50 for bootstrapping only]

topoweight = 1.0

modweight = 0.05

brlenweight = 0.2

randnniweight = 0.1

randsprweight = 0.3

limsprweight = 0.6

intervallength = 100

intervalstostore = 5

limsprrange = 6

meanbrlenmuts = 5

gammashapebrlen = 1000

gammashapemodel = 1000

uniqueswapbias = 0.1

distanceswapbias = 1.0