A computer scientist by training, my current research applies machine learning and big data techniques to pressing problems in the social and life sciences. My prior research focused on high-performance computing, bioinformatics, and scientific visualization.


HIV Transmission and Drug Resistance

In a collaboration with Dr. Rami Kantor, we are investigating improved assembly, alignment and phylogenetic methods for deep sequencing of HIV. These methods can be used to measure drug resistance within HIV-infected individuals at previously undetectable levels, and to infer transmission clusters among infected indivuals. Understanding an individual's drug resistance enables improved and personalized treatment. Detecting tranmission clusters, especially among newly-infected individuals, creates new opportunities for intervention, disruption, and prevention of future transmission.

Historical Transformations in Land Use

Printed directories like phone books, manufacturing directories, and city directories contain a wealth of historical information on urbanization and how land use has changed over time. Working with sociologist Scott Frickel, we developed a scalable method for collecting, digitizing, and assembling directory data to study industrial churning and relict manufacturing sites in the Providence metro area.

Computer vision techniques for extracting structured information from printed directories.

Statistical Methods for DNA Sequence Assembly

Modern DNA sequencing technologies generate many partially overlapping and redundant fragments from the original genomes under investigation. Sequence assembly is the estimation of the original sequence based on these many fragments. I lead a research project into statistical methods that measure and summarize the uncertainty in the assembly process. In contrast, existing assembly methods use heuristics and ad hoc criteria to report a single point-estimate of the assembly, and cross-validation studies have found that there is often significant disagreement among these point estimates. I have designed and implemented Bayesian methods for sequence assembly that use a generative model of the sequencing process to calculate the likelihood of proposed assemblies, and Markov chain Monte Carlo simulation to estimate a posterior distribution of assemblies.

Automating and Scaling Phylogenetic Analyses to Study Animal Evolution

Phylogenetics is the study of evolutionary relationships. As part of Dr. Casey Dunn's lab, I lead the development of an open-source software infrastructure, called Agalma, that enables reproducible, large-scale phylogenetic analyses to be implemented and described unambiguously as a series of high-level commands. Agalma is developed in a public git repository according to industry best-practices for software development and has been downloaded over 600 times for use in other researcher’s phylogenetics projects. In creating this infrastructure, we also researched new methods for understanding phylogenetic relationships, and have applied them to animal transcriptomes to understand the evolution of complex traits. We are particularly interested in deep animal phylogeny, but also work on subgroups of animals including cnidarians and molluscs.

Agalma Web Appliance

The Agalma Web Appliance is an easy-to-deploy instance of the Agalma pipeline for assembling transciptomes. It can be run directly as a spot instance in Amazon Web Services and requires no command-line expertise.

Video overview of the Agalma Web Appliance

Infrastructure for Research Computing

In my previous role at Brown as Director of Data Science, I led a centralized team of data scientists and served as the technical architect for Brown's secure computing environment that serves over 150 researchers across 17 labs and centers in fields such as public health, biomedical informatics, and economics. I have conducted numerous performance studies on large computing clusters, and developed methods, utilities, documentation, and training materials to help scientists in many disciplines use complex research computing systems and software.

Scientific Visualization

In my role as a Computer Systems Engineer at Berkeley Lab, I led or contributed to several projects that investigated the use of high-performance computing systems for scientific visualization. The increasing size and computational power of compute clusters has allowed scientists in fields such as particle physics and climate science to run larger and more complex simulations. Yet, interpreting and gaining insight from them often requires the scientist to visually inspect the results. Our work on scaling and parallelizing these visualization methods, such as volume rendering and image denoising, demonstrated that the visualization methods can keep pace with the increasing size of simulation. In one study, we scaled a volume rendering algorithm to run concurrently on 216,000 processors on the largest supercomputer in the world at the time.

Visualization of an astrophysics simulation rendered in VisIt

3D Escher Tiles

For my master's degree, I designed and implemented CAD tools for creating decorative solids that tile 3-space in a regular, isohedral manner. These 3D tilings come in two flavors. The simpler method is to extrude and offset a 2D tiling, e.g. a "2.5D" tiling. The second method is to derive true 3D tilings from cubic lattices. This research involved several algorithmic problems. To create 2.5D tilings, I designed an algorithm for cutting Delaunay triangulations by an arbitrary Jordan polygon. I also improved on an existing algorithm for Delaunay triangulation (Lawson's incremental algorithm) by tuning its search heuristic to use the inherent symmetry of the tiles, and by applying adaptive-precision arithmetic to solve issues with numerical robustness. While developing these algorithms, I invented visual debugging methods to overcome the difficulties of inspecting geometric data with standard text-based debuggers.

2.5D Escher tiles


ORCID: 0000-0002-0764-4090

Howison M, Coetzer M, Kantor R. 2018. Measurement error and variant-calling in deep Illumina sequencing of HIV. bioRxiv: 276576

Guang A, Howison M, et al. 2017. Preserving Intra-Patient Variance Improves Phylogenetic Inference of HIV Transmission Networks. Poster presented at the 24th International HIV Dynamics & Evolution, 23-26 May 2017, Scotland, UK.

Howison M, Bethel EW. 2017. GPU-accelerated denoising of 3D magnetic resonance images. Journal of Real-Time Image Processing 13(4): 713-724. doi:10.1007/s11554-014-0436-8

Berenbaum D, Deighan D, Marlow T, Lee A, Frickel A, Howison M. 2016. Mining Spatio-temporal Data on Industrialization from Historical Registries. arXiv: 1612.00992

Guang A, Zapata F, Howison M, Lawrence CE, Dunn CW. 2016. An Integrated Perspective on Phylogenetic Workflows. Trends in Ecology & Evolution 31(2): 116-126. doi:10.1016/j.tree.2015.12.007

Zapata F, Goetz F, Smith S, Howison M, et al. 2015. Phylogenomic Analyses Support Traditional Relationships within Cnidaria. PLOS ONE 10(10): e0139068. doi:10.1371/journal.pone.0139068

Zapata F, Wilson NG, Howison M, et al. 2014. Phylogenomic analyses of deep gastropod relationships reject Orthogastropoda. Proc. R. Soc. B 281(1794): 20141739. doi:10.1098/rspb.2014.1739

Howison M, Zapata F, Edwards EJ, Dunn CW. 2014. Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling. PLoS ONE 9(6): e99497. doi:10.1371/journal.pone.0099497

Howison M, Zapata F, Dunn CW. 2013. Toward a statistically explicit understanding of de novo sequence assembly. Bioinformatics 29(23): 29592963. doi:10.1093/bioinformatics/btt525 [pdf]

Dunn CW, Howison M, Zapata F. 2013. Agalma: an automated phylogenomics workflow. BMC Bioinformatics 14(1): 330. doi:10.1186/1471-2105-14-330

Howison M, Shen A, Loomis A. 2013. Building Software Environments for Research Computing Clusters. In Proceedings of the 27th Large Installation System Administration Conference (LISA '13), 3-8 November 2013, Washington, DC, USA. [website]

Howison M. 2013. High-throughput compression of FASTQ data with SeqDB. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10(1): 213-218. doi:10.1109/TCBB.2012.160 [pdf]

Bethel EW, Howison M. 2012. Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning. International Journal of High Performance Computing Applications 26(4): 399-412. doi:10.1177/1094342012440466

Howison M, Sinnott-Armstrong NA, Dunn CW. 2012. BioLite, a lightweight bioinformatics framework with automated tracking of diagnostics and provenance. In Proceedings of the 4th USENIX Workshop on the Theory and Practice of Provenance (TaPP '12), 14-15 June 2012, Boston, MA, USA. [website]

Howison M, Bethel EW, Childs H. 2012. Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems. IEEE Transactions on Visualization and Computer Graphics 18(1): 17-29. doi:10.1109/TVCG.2011.24

Howison M, Trninic D, Reinholz D, Abrahamson D. 2011. The Mathematical Imagery Trainer: From Embodied Interaction to Conceptual Learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '11), pp. 1989-1998, 7-12 May 2011, Vancouver, BC, Canada. doi:10.1145/1978942.1979230

Childs H, Pugmire D, Ahern S, Whitlock B, Howison M, Prabhat, Weber GH, Bethel EW. 2010. Extreme Scaling of Production Visualization Software on Diverse Architectures. IEEE Computer Graphics and Applications 30(3): 22-31. doi:10.1109/MCG.2010.51

Uselton A, Howison M, Wright NJ, Skinner D, Keen N, Shalf J, Karavanic KL, Oliker L. 2010. Parallel I/O performance: From events to ensembles. In Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 19-23 April 2010, Atlanta, GA, USA. doi:10.1109/IPDPS.2010.5470424

Howison M, Séquin CH. 2009. CAD Tools for the Construction of 3D Escher Tiles. Computer-Aided Design and Applications 6(6): 737-748. doi:10.3722/cadaps.2009.737-748


I am or have been the lead developer of the following open-source software packages:

hivmmeran alignment and variant-calling pipeline for HIV sequences
GABIprototype of a Bayesian framework for genome sequence assembly
Agalmaautomated transcriptome assembly and phylogenomics workflow
BioLitebioinformatics framework in Python/C++ with automated tracking of diagnostics and provenance
SeqDBstorage model for Next Generation Sequencing data
PyModulesmodules system for managing software environments on research computing clusters
GD3DGPU-accelerated implementation of three commonly used 3D image denoising methods: bilateral filtering, anisotropic diffusion, and non-local means
Iotalightweight tracing tool for diagnosing poorly performing I/O operations to parallel file systems, especially Lustre
H5Parthigh-performance, parallel I/O library in C for particle physics simulations
jmEscherconstrained Delauney triangulation library in Java using adaptive-precision arithmetic for robustness
WiiKinemathicsWiiRemote-enabled learning tool that uses gesture to teach 5th graders about proportion and ratio