Title: Efficient visualization of large-scale sequence alignments in the web browser
Presenters: John Rozewicki, Shigehiro Kuraku, Kazutaka Katoh
Abstract: The amount of sequence data available to researchers increases every year. Developers of sequence alignment tools such as MAFFT have worked hard to handle this large-scale data with many thousands of sequences and many thousands of sites by developing new algorithms and reimplementing older algorithms to take advantage of resources like cloud computing. A frequent use case in recent years has been the direct comparison of SARS- CoV-2 genomes. Manual inspection of alignment results can help to avoid misapplication of alignment methods and overlooking obvious observations. However, solutions for viewing and interpreting sequence alignments have not kept pace with the growth in data. Most were built in a time when sequence alignment results tended to be much smaller. When trying to view large-scale sequence alignments in existing solutions they often crash, are extremely slow, or fail to give a useful overview of data. This presentation will describe the design and features of a new web browser-based sequence alignment viewer built with WebAssembly/WebGL/HTML5 by the MAFFT team, from the ground up, for large-scale data. A key focus of the presentation will be efficient techniques for visualization of large-scale data which may be applicable to other areas beyond sequence alignment visualization. This new viewer has been tested on sequence alignments larger than 500 million letters (=100,000 sequences x 5000 sites or 100 sequences x 5 million sites), and performs smoothly even on inexpensive systems. Moreover, because it is built using widely compatible web technologies it is able to run not only on full computers, but also on handheld devices such as phones and tablets.