Unmack's Guide to Primer Design

By Peter Unmack

How you go about designing primers depends on what your starting point and goal is!

Ideally though you should be aiming to amplify regions at least 800 bp or greater in size to maximize your base pairs obtained per dollar spent. I usually try and aim to amplify regions from 800-1000 bp, but it depends on the configuration of conserved priming sites.

One common strategy people use is to get existing primers, try them across your groups, if they work then fine, if they only work on a few then use that sequence to make new primers. But, many existing primers only amplify a short region, and today there is loads of sequences on GenBank that it is usually far better to start with completely new primers for your group.

Primers are cheap, if your pcr is not amplifying or sequencing consistently, it will be much cheaper to order some different primers (assuming your problems are not due to poor DNA quality), than wasting reagents and time. Don't waste your effort with primers that work poorly!

When designing primers it is a good idea to first design a set of primers that will allow you to obtain some longer sequences from your study species which can then be used to redesign better species specific primers (this depends on whether you have closely related species from GenBank from which you can design primers from).

I do not use any software to search for primers. I align up existing sequences and look for places that are close to the right length, then I look for conserved regions in that vicinity. In other cases I place them in conserved regions like tRNAs (for mitochondrial DNA) or exons (for nuclear DNA), usually 30-40 base pairs from the start/end of the target region. I only use software to check the characterists of the primers and to check for dimer formation.

Mitochondrial DNA

For mtDNA I almost never use pre-existing primers. I start by obtaining whatever mitochondrial genomes exist for that family and/or some other closely related families. I align those genomes together and then set about defining where the tRNAs and genes are found (in GenBank you can click on the specific gene within the whole genome record and get that specific gene sequence, then search for the first 10-20 bp in your alignment to match up to where in the sequence that gene is).

Ultimately, it likely doesn't matter too much which genes you select to amplify. I strongly avoid 12S, 16S and control region because of alignment issues and other problems. Control region is often thought to be the more variable region, and often it is a bit more variable (but not always), in some groups it is massively variable to the point that every individual is different. However, unless you are doing population genetic sampling I would avoid it as it does not seem to perform very well phylogenetically in terms of reconstructing genealogical relationships. By using control region vs. cytochrome b (or any protein coding gene), all you do is get a few more haplotypes that represent singletons which outside of a population genetic study are largely uninformative.

For fish I tend to amplify cytochrome b for most things as that is what I have done historically and I have a large suite of primers for that gene. Ultimately though for phylogeography, any 800-1000 bp chunk of mtDNA coding gene will be informative. My recommendation would be any one of the following genes: cytb, nd1, nd2, atpase 6/8 as they are all mostly flanked by tRNAs and are a nice length to amplify in a single reaction. Keep in mind though that any single mtDNA gene is unlikely to give very good resolution at deeper nodes. To better resolve more difficult relationships I usually amplify 4-8 kb of mtDNA for a small subset of individuals. This typically works very well, both for resolving within and between species relationships.

Nuclear DNA

I tend to rarely use previously designed primers in fishes for nuclear genes as they were often designed based on a either one or a few distantly related taxa. I start by downloading the target gene region from something that has a complete genome sequenced like Takifugu. I then blast it and find examples where the entire gene region has been obtained, then I'll align each individual, plus any previously designed primers to see how well they match.

I tend to amplify all nuclear genes via nested pcr, thus I usually design 4 primers for each gene region. This results in cleaner and more reliable pcr products, although it does take an extra round of pcr, plus contamination becomes a greater risk (always run negative controls, but absolutely always run them for nested pcr). For some samples you may only be able to obtain good amplifications via nested pcr.

It is hard to know which genes to target for primer design. Many previous coding genes used in phylogenetics have been specifically chosen for deep phylogeny questions, thus using them between sister species is unlikely to be very informative. Introns tend to be more variable (but not always, and sometimes they are too variable), but the results from introns can vary greatly in terms of how well they perform (in all aspects of your study, from pcr to variation between samples) both within and between species. Introns often also have nasty pcr artifacts as well as alignment issues, especially when comparing more distantly related species. My favorite nuclear gene introns are the first two introns of S7, but in some groups it is not suitable, but in many they work great. Growth hormone is another, but it can be very problematic. For nuclear genes with coding regions I've developed primers for RAG1 and RAG2. The very first 800 bp or so of RAG1 is the most variable region and will usually have enough variation to discriminate between species.

Designing Primers

There are no hard and fast rules, even the ugliest primers can be good, and the cleanest perfectly designed primers can fail. I've even had primers that were a perfect fit not amplify in certain random samples. Seems a little chancy at times, but you can increase your chances. I have made hundreds of primers, but only a few that did not amplify.

Avoid primers less than 20bp, generally aim for 20-22 or so. Adjust length to achieve a better match for bases at the start, (the 3` end). Longer primers have a higher melting temperature. Avoid really low or really high melting temperatures.

Avoid self priming at greater than ~ -7 kcal/mol. CCGG strings are bad as they strongly self bond. Avoid long strings of single bases. Avoid significant hairpins. I've seen many primers with these traits that worked just fine, but you will probably be better off if you can avoid it. But if you lack other good options then go ahead and try it.

Always check primer – primer binding. Check binding both to itself and to the other primers that you will use in the same pcr reaction!

The 3` end is what binds first, always try and put that part of the primer in a conserved region for the first 3-5 bp.

Having 2-3 mismatches in the middle and end regions of the primer should be ok, as long as they are not close together. How many mismatched bases you can get away with is hard to predict, I've gotten sequences from primers that were a very poor match, but I've also seen cases where 2 mismatches would not amplify.

If bases mismatch then always use a T in your primer as a T will more readily bind to other bases than any other base. Make sure if you are looking at a primer that is reversed that you use an A to create a T in the final primer!

I tend not to use degenerate bases, but I know people who use them quite commonly and frequently (with multiple degenerate bases within the primer) and they seem to not have issues with them. Degenerate codes are provided at the end of this document.

If you design multiple primers at the same site, with the same length, then these can be combined in pcr reactions (it is the same as making one degenerate primer).

Many guides make the following recommendations, but I usually don't pay very close attention to these points, so feel free to ignore them.

Try and keep ratio of bases roughly equal. G bonds strongest, then C, then A, then T. More G means higher melting temperature.

Try and keep primers with similar melting temps, avoid extreme differences (folks usually suggest within 5^oC).

Primers should start and end with 1-2 purines (A, G). Some suggest 3` end should have a CG clamp to help with binding.

Some people pay attention to codon position for starting primers in (2nd positions tend to be less variable, 1st positions tend to have medium variation, while 3rd positions are the most likely to differ).

Before ordering a primer, always check that you have the correct orientation!

Base degenerate codes

two way degenerate positions

M=(A, C), K=(G, T), R=(A, G), Y=(T, C), W=(A, T), S=(G, C)

three way degenerate positions

H=(A, C, T), B=(C, G, T), V=(A, C, G), D=(A, G, T)

completely degenerate positions

N=(A, C, G, T)

Back to Unmack's Molecular Phylogenetics page.