Feature: Catching the sequencing bug
Wednesday, 04 January, 2012
When Dr Tony Papenfuss was in his undergraduate years, his sights were set on the stars. Like many aspiring mathematicians, he turned to astrophysics as an outlet for his numerical nous.
At the time, the down-to-earth realm of biology was furthest from his mind, and yet, today Papenfuss heads a lab at the Walter and Eliza Hall Institute (WEHI) applying his mathematical might to the not inconsiderable challenges of genome sequence analysis.
It’s quite a journey, sidling from red shifts and magnetohydrodynamics to sifting genes, but Papenfuss has wholeheartedly embraced his adopted discipline, finding it ripe with engaging challenges for an applied mathematician. After spending a stint doing minerals exploration for BHP he was snapped up by WEHI to focus on bioinformatics.
There he worked on a number of antipodean data sets, including the platypus and the recently-published wallaby genome, but also spent time working on medical projects, such as sequencing transcripts from the malaria parasite, as well as a number of cancer genomes.
“I had a reputation for working on some interesting sequence data sets,” says Papenfuss. “So I got tapped on the shoulder by WEHI director Professor Doug Hilton. Doug is particularly interested in translating fundamental research at the institute to medically-relevant outcomes, and in making a contribution to Indigenous health, which is how we got involved in the scabies project.”
The WEHI had been fostering a relationship with the Menzies School of Health Research in Darwin to partner on particular diseases that were affecting the Indigenous population. One of those was the scabies mite which, according to Papenfuss, is something many researchers have been wanting to sequence for over a decade.
With the reduction in cost of high-throughput sequencing, it was decided that the time was right to get the sequencing underway, with the hope of helping reduce the health burden of this pernicious mite.
ResistanceScabies (Sarcoptes scabiei) is a nasty critter. It’s a burrowing ectoparasitic mite that can cause a nasty irritating skin infection; it’s no accident the mite’s name is derived from the Latin scabere, “to scratch”. Scabies can infect humans or animals, and it’s particularly endemic where there is overcrowding, such as in some Indigenous communities.
The infection itself causes tremendous itching and, while the mite itself isn’t particularly harmful, it can lead to secondary infections and skin sepsis which can, in turn, lead to renal disease and rheumatic heart disease.
Some immunocompromised individuals can even acquire an extreme form of the disease called crusted scabies, where the mites breed out of control, causing severe crusting of the skin, which makes treatment even more difficult.
There are several topical and ingested treatments for scabies of various potency and cost, but there are fears that emerging resistance in the mite will continue to reduce the efficacy of some of the most effective treatments.
Once drug resistance sets in, management of scabies will likely become even more difficult, so this is where it’s hoped sequencing can lend a crucial insight into the mite and open new avenues for treatment.
“Having a genome sequence as a resource accelerates acquisition of biological knowledge of the organism,” says Papenfuss. “That’s certainly been the case for the human genome project. It doesn’t give all the answers right away, but it accelerates our ability to learn, and provides us with whole new suites of tools to study these organisms.”
One of the particular areas of interest is searching for resistance genes for acaricides, which are the equivalent of insecticides for ticks and mites. One common treatment is permethrin, which is also used against head lice. Tests in vitro in 1994 found that permethrin was 100 per cent effective against scabies mites, yet resistance began to emerge soon after its introduction.
A study conducted in 2000 found that 35 per cent of mites were still viable after exposure to permethrin. In fact, it may be the very success of permethrin that caused its overuse, allowing the mites to develop resistance.
It’s believed resistance may be due to either target alteration, enzymatic degradation of the drug or removal of the drug from the mite’s system via an efflux pump. If the sequencing can shed some light on the mechanism at work, measures can be taken to reduce resistance. Another possibility is the development of a vaccine against scabies, says Papenfuss.
CollectionResearch into the scabies mite has traditionally been slow due to the difficulty in obtaining sufficient quantities of the mite for study. And this is no less a challenge when it comes to sequencing its genome. Each mite is tiny, and it requires a few hundred of the blighters to provide just one microgram of DNA for sequencing.
That might be enough for resequencing, but de novo sequencing has much higher demands in order to achieve sufficient coverage. Ideally, Papenfuss and his colleagues – Dr Deb Holt at the Menzies and Dr Katja Fischer at the Queensland Institute of Medical Research – wanted 10 micrograms from thousands of mites. “My colleagues literally scraped them off patients,” says Papenfuss.
However, they were only able to secure 5 μg of DNA from mites acquired from humans. So they’ve also sourced mites from pigs – which are infected by effectively the same species that infects people – and gathered a healthy 30 μg. They plan to use this to produce their reference genome, and use the 5 μg of human mites to build from there.
There are a few ways of approaching sequencing, such as the fairly straightforward approach of separating the DNA into a few different identically-sized fragment libraries and sequencing them en masse.
However, Papenfuss and his colleagues have taken an alternate approach, developed at the Broad Institute, of sequencing very short fragments to build contigs – i.e. contiguous DNA sequences where the order of bases is known to a high confidence – and then longer fragments to scaffold those contigs together.
This scaffolding process requires building a mate-pair library, which involves fragmenting the DNA into fixed lengths of 2 to 5 kilobases. The ends of these fragments are then end-repaired with biotin labels.
The fragments are then circularised, joining the two ends together, then re-fragmented into pieces of only a few hundred bases each. The fragments without biotin are then filtered out – although often imperfectly – and the remaining fragments are sequenced.
The upshot is a series of relatively short reads that represent the ends of much longer stretches of DNA in between. By sorting through the reads effectively, it can indicate the position of each of the sequenced ends in the genome, and also give an idea of the gap in between, which can be used as a scaffold to help organise the previously sequenced shorter contigs. The end result is a nice whole genome.
Typically, a mate-pair library would require 10 μg of DNA, so with around 30 μg of mite DNA to play with, there should be plenty of headroom. However, constructing a mate-pair libraries can be fraught with peril for the unwary.
They can easily become corrupted with so-called ‘shadow libraries’, where fragments of the circularised DNA with the correct biotin labels but in the wrong orientation remain in the mix and confuse the end result.
“There’s an element of risk in building mate-pair libraries,” says Papenfuss. “There’s a moderate failure rate with them, and there can be quality issues. You can end up building a mate-pair library that has a substantial fraction of a shadow library. In an assembly process where you don’t already have a genome, this can be a real problem, because these things are hard to detect.
“So there’s an element of trepidation, and even a little nervousness. We have 30 μg to play with, and straight off the bat we’re going to use 10 μg to build a mate-pair library. So we can only do that a couple of times...”
According to Papenfuss, now that the DNA has been collected and is in the process of being sequenced at the Australian Genome Research Facility (AGRF), it’s just a matter of waiting and seeing what the data looks like when it comes back. And then it’s on to assembly.
Genomic LegoThe next step is to assess the quality scores of the libraries, and then have a crack at assembly.
And while doing the assembly, more internal consistency checks are undertaken, one of which is using the short insert contigs to test the long insert mate-pair library to see the extent to which the mate-pair library is contaminated by a shadow library. According to Papenfuss, while these tools are somewhat limited, there is some capacity to clean out the shadow library and get a better quality sequence.
Once they have an assembly, they’ll then test it using an expressed sequence tag (EST) library, which was done a few years ago on the scabies genome. An EST library only sequences expressed genes, so it’s less detailed, and often of lower quality, than a full genome sequence, but it can be compared to the de novo sequence to see to what extent they match up, and where the expressed genes might sit overall. If, for example, they find an expressed gene is split into different parts in the new genome, then that suggests the assembly has scrambled some parts of the genome.
With luck, the contigs will be clean, the mate-pair library uncorrupted by a shadow library, and the genome won’t be scrambled when compared to the EST library. All in all, that’d be a big success. But only time – and a lot of time tinkering at a computer – will tell.
“Genome sequencing is a mixture of science, art and craft, and some luck and insight,” says Papenfuss. “The insight comes in understanding the algorithms that underlie the assembly and using that knowledge to choose the best approach.”
Assuming all goes well, the next step will be to put the genome in an appropriate format and make it available to biologists working on scabies, and that means setting up a genome browser. From there, the hope is researchers will find insights into the mite and figure out new ways of controlling it.
For Papenfuss, it’s a long way removed from the lofty realm of astrophysics, but no less interesting, and even more important to the lives of everyday people. “I love that biology is real and that it can have a positive impact,” he says.
“That’s also the sense I get from a lot of physicists who look to move into bioinformatics, they’re looking to do something more connected to the real world.
“Not everything I do is genome assembly. I still do a reasonable amount that is just mathematics, but it’s mathematics that’s connected to analysing biological data. So my passion now is really in biological data. This is where I’m going to say.”
EMBL Australia has announced Professor James Whisstock as its new scientific head and will be...
An international research collaboration has identified the gene variants that increase a...
Scientists have discovered that social insects such as ants, bees and wasps have a genetic...