## Opinion: Bioinformatics Software: A Buyer’s Guide

How to choose programs that suit your lab’s ’omics analysis needs

By David R. Smith

Today’s ’omics-obsessed scientific marketplace is overflowing with bioinformatics programs. Whatever your sequence analysis problem, there is probably a program or application to solve it. Many of these tools are open source, but they can be difficult to use; some require in-depth computational knowledge. There are, however, various commercial alternatives, which bring together multiple bioinformatic programs into stand-alone, user-friendly packages. Although beautifully designed, these software suites can come with hefty price tags, meaning that most researchers are lucky if they can afford just one. Like buying a car, choosing among the many software suites can be challenging, and there is surprisingly little information out there to evaluate the different programs. Here I describe my own experience with evaluating—and purchasing—some commercial bioinformatics packages.

At first I was reluctant to buy bioinformatics software. I felt that paying for such programs went against the spirit of academic research and that using graphical user interface (GUI) software would weaken my computational skills. But after failing for the fourth time to install an open-source genome assembly algorithm, I gave in and bought a user-friendly bioinformatics bundle, and have yet to regret it.

After testing an assortment of commercial bioinformatics packages, I decided on Biomatters’s Geneious. I chose Geneious largely because the company gives student discounts. Seven years ago, I paid about $200 for a student license, which allowed me to install the software on a single computer. As of May 2014, a student license costs$395 (a standard academic license is $795), which still places it among the least expensive all-in-one commercial suites on the market. I’ve since gone on to test, and in some instances purchase, other bioinformatics platforms. Most often, to get pricing details I had to request quotes from sales representatives, making it difficult to quickly compare the costs of different software packages. After a successful trial of CLC bio’s Genomics Workbench, for example, I filled out an online pricing request form and was contacted two days later by a sales agent who provided me with a formal quote (an estimated$5,500 for a standard academic license). I went through similar processes to get pricing on DNAStar’s Lasergene (approximately $6,000 for an academic license) and Sequencher from Gene Code ($2,500 for an academic license). When requesting price quotes, expect to receive several follow-up e-mails and phone calls from sales representatives.

So, what do you get?

Last year, for approximately $6,000, I purchased a single academic license of CLC Genomics Workbench and a genome-finishing plugin (more on plugins later) as part of a package deal. Enrollment in the upgrade and support program for the first 12 months, which was mandatory, was an additional$1,500, making the initial cost of the software $7,500. Renewal of the maintenance program was 25 percent of the purchase price per year—and automatic. In other words, nine months after buying the software I was sent an invoice for$1,500, with 2 percent interest per month if left unpaid. Although costly, subscribing to the maintenance agreement can be wise. Commercial bioinformatics programs frequently undergo major changes, which can significantly improve the software. More than once I’ve bought programs anew at full price because I let the maintenance period expire on previous licenses.

Things get even more complicated when purchasing network (or “floating”) licenses of bioinformatics programs. Unlike a single computer license, which only works on one computer, a network/floating license allows multiple people to use a bioinformatics package simultaneously. Network/floating licenses are more expensive than their single-computer counterparts, but they can be more economical for big labs or classroom settings, where purchasing multiple single-user licenses makes less sense.

Cloud computing has also infiltrated bioinformatics. Companies like DNAnexus are selling online access to powerful computers and their associated software, storage, and sharing capabilities. Illumina sells online access to its genomics cloud-computing infrastructure BaseSpace; 10 terabytes of storage will run you \$12,000 per year. Alternatively, the popular web-based platform Galaxy is a free, cloud-based bioinformatics tool. It is safe to assume that bioinformatics clouds will only grow larger and more popular over the coming years and are where the most innovative new software will be based.

What does it do?

Once considered inferior to their open-source counterparts, proprietary assembly algorithms have improved immensely in recent years and are now used by some of the top research labs in the world.

Commercial bioinformatics packages bring together, into a single, browser-based platform, a diversity of nucleotide and protein analysis tools. Given the prevalence of high-throughput sequencing in life science research, many of the tools are designed for analyzing and visualizing next-generation sequencing (NGS) data, including fast, efficient, and high-quality de novo genome assemblies.

Read mapping is another core feature of commercial bioinformatics packages. Bioinformatics companies regularly boast about their highly tuned, ultra-fast mapping algorithms for reference-guided alignments. The ultimate test for any assembler or read mapper is whether it’s cited in peer-reviewed journals. There is no question that open-source programs are cited more than proprietary ones. But citations for commercial software suites are on the rise.

In addition to assembling and mapping NGS data, commercial bioinformatics packages are great for organizing molecular sequence information. Their intuitive, graphical interfaces allow users to easily build folder hierarchies and dropdown lists of sequence data, move or export these data to different folders, and change file formats for use in other applications. In most cases, the software can connect to online resources, such as GenBank and UniProt, providing quick, direct access to vast amounts of nucleotide and protein sequence information, which can then be searched, downloaded, interpreted, and analyzed through interactive sequence viewers.

Most packages come with software for aligning nucleotide and amino acid sequences (and entire chromosomes), as well as tools for inferring evolutionary relationships among sequences and constructing phylogenetic trees and distance matrices. Other useful tools include protein structure prediction, nucleotide repeat and motif finders, and primer prediction. An advantage to performing these kinds of analyses within commercial software is that the results are depicted in colorful and editable graphics, which can be exported and used for figures in lectures and publications.

Options and access

If you purchase a bioinformatics program and discover that a particular function is missing, don’t panic: there is probably a plugin that can do the job. Plugins are downloadable applications that provide additional features to software packages. Bioinformatics companies are constantly designing new plugins, which means that the repertoire of tools within packages is continually expanding. Plugins also bring some of the most commonly used open-source software to proprietary programs, giving users the benefits of a user-friendly GUI and the power of peer-reviewed algorithms.

Going forward, innovations in sequencing will result in more sophisticated bioinformatics programs, and it is crucial that these programs are accessible to a broad range of users. We might soon be at a point where walk-in medical clinics have genome sequencing and bioinformatics desks, where patients can play an active role in interpreting their gene sequences and contributing to genetic treatments, and where high-school students assemble genomes for homework. The increasingly integral role of bioinformatics in society also means that it will become a more lucrative industry, one where users will have to pay for the best products.

My own experiences with proprietary bioinformatics software have been positive. The tools I’ve purchased have made me more productive, but at a cost—although I use sequence analysis tools almost every day, my bioinformatics skills, in certain respects, have plateaued. Moreover, the licensing and upgrading costs of using commercial software represent a significant proportion of my lab’s operating budget.

If you are considering commercial programs, I recommend taking advantage of the free trials that most companies offer. You may find that these programs streamline your research and invigorate your classroom, or that they’re a waste of time and you’re better off using open-source alternatives. Wherever you stand on the topic, I urge you to share your opinions and experiences with others.