Mark Howison

I'm a software engineer and data scientist specializing in Artificial Intelligence (AI) and Machine Learning (ML).

Currently, I work at Amazon as a Senior Applied Scientist improving the employee experience (all opinions here are my own). As an engineer and tech leader, I have a portfolio of data-driven solutions I've delivered in the public sector. As a scientist, I have over 50 peer-reviewed publications. I received my M.S. in Computer Science from UC Berkeley and am an AWS Certified AI Practitioner.

Portfolio //

Public Policy

Measuring the labor market with AI. Job postings contain rich information on the labor market but are challenging to analyze because of the variation in how they are written. I developed a Generative AI solution to extract structured information from job postings and measure trends in skills demand, pay, benefits, remote work, and more.
Connecting workers to new careers. Navigating the changing labor market is challenging for job seekers. I was the science and engineering director for a career recommendation product, deployed with 5 state governments, that helps job seekers learn more about in-demand careers that match their skills and could boost their earnings.
Delivering public benefits, faster and cheaper. I led the rapid deployment of Rhode Island's emergency unemployment assistance system during the COVID-19 economic shutdown. My team launched this system in just 10 days after the US CARES Act passed. It accepted over 450,000 applications for benefits and saved the state an estimated $502,000 through automation.
Unlocking data for policy innovation. Governments have valuable data that can help them design more effective and efficient programs, but those data are often siloed across agencies and difficult to use. I led a data science team that integrated over 800 data sets from Rhode Island government agencies into an anonymized and secure database that delivered 75 policy insights.
Navigating college applications. Many talented students from low-income households struggle to attend college because of the challenging application process. I served as the technnology director for Rhode2College, an ed-tech program to help low-income students in Rhode Island meet key milestones on their path to college through behavioral nudges and a AI chatbot.
Open data for public safety. Surveys show that trust in law enforcement is at a record low. Most communities lack the data they need to measure safety in their neighborhoods. I served as the lead scientist for Data for Community Trust, a pilot to help communities and law enforcement work together on safety issues through open data and dashboards.

Public Health

Preventing opioid use disorder. As many as 80% of those suffering with an opioid use disorder had a legitimate opioid prescription from a doctor before they were diagnosed. I developed a predictive model for detecting the risk of opioid use disorder before a doctor writes the first perscription.
Monitoring COVID-19 variants. As the COVID-19 pandemic unfolded, public health officials needed better intelligence on how new variants of the virus were spreading. I developed the bioinformatics pipeline used to monitor the emergence of COVID-19 variants in Rhode Island.
Disrupting HIV transmission. The actual transmission of HIV is unknown, but gene sequencing of new infections can reveal patterns of transmission. I served as the lead bioinformatician on an NIH-funded project to disrupt HIV transmission using those patterns.
Improving treatment for HIV. Gene sequencing can monitor HIV infections with high precision to personalize drug therapies. I created methods for measuring HIV drug resistance and collaborated with an international group of scientists to recommend standards for clinical applications.

Data & Technology

Data wrangling. Data in the real world can be messy. I have experience with a variety of techniques for obtaining and cleaning data, including: computer vision to extract industrial land use from printed directories, natural language processing to extract occupation and skills from millions of job postings, and mining the Internet Archive to reconstruct the FDA's drug code directory.
Securing data for research. Researchers in the social sciences often need to analyze sensitive data. I was the technical architect for a secure computing environment at Brown University that served over 150 researchers across 17 labs and centers in fields such as public policy, economics, public health, and biomedical informatics. Then I extended it to a secure cloud-based architecture.
Scalable computing. I have strong foundations in distributed systems, parallel programming, profiling and optimization from my work in high-performance computing. I have conducted performance studies at the petascale using the largest supercomputer in the world at the time. I have also optimized memory access on GPUs and improved I/O performance for applications in particle physics and genomics.
Reproducible analysis. Reproducibility is a cornerstone of science, but complex analyses with big data are hard for other scientists to replicate. I developed methods for tracking software versioning in scientific computing and tracking complex analyses of genomics data.

Publications //

I've published over 50 peer-reviewed scientific papers on a variety of topics, including AI/ML, cloud computing, labor economics, education, public health, bioinformatics, high-performance computing, and computer graphics. My ORCID is 0000-0002-0764-4090.


2024

Howison M, et al. 2024. Extracting Structured Labor Market Information from Job Postings with Generative AI. Digital Government: Research and Practice, in press. doi:10.1145/3674847

Kantor R, et al. 2024. Prospective Evaluation of Routine Statewide Integration of Molecular Epidemiology and Contact Tracing to Disrupt Human Immunodeficiency Virus Transmission. Open Forum Infectious Diseases 11(10): ofae599. doi:10.1093/ofid/ofae599

Howison M, Long J, Hastings JS. 2024. Recommending Career Transitions to Job Seekers Using Earnings Estimates, Skills Similarity, and Occupational Demand. Digital Government: Research and Practice, 5(3): 31:1-31:9. doi:10.1145/3678261

Aung S, et al. Acquired Human Immunodeficiency Virus Type 1 Drug Resistance in Rhode Island, USA, 2004–2021. The Journal of Infectious Diseases, jiae344. doi:10.1093/infdis/jiae344

Howison M, Angell M, Hastings JS. 2024. Protecting Sensitive Data with Secure Data Enclaves. Digital Government: Research and Practice, 5(2): 14:1-14:11. doi:10.1145/3643686


2023

Novitsky V, et al. 2023. Added Value of Next Generation Sequencing in Characterizing the Evolution of HIV-1 Drug Resistance in Kenyan Youth. Viruses 15(7): 1416. doi:10.3390/v15071416

Dixon N, et al. 2023. Occupational models from 42 million unstructured job postings. Patterns 4(7): 100757. doi:10.1016/j.patter.2023.100757

Howison M, et al. 2023. An Automated Bioinformatics Pipeline Informing Near-Real-Time Public Health Responses to New HIV Diagnoses in a Statewide HIV Epidemic. Viruses 15(3): 737. doi:10.3390/v15030737

Novitsky V, et al. 2023. Not all clusters are equal: Dynamics of molecular HIV-1 clusters in a statewide Rhode Island epidemic. AIDS 37(3): 389-399. doi:10.1097/QAD.0000000000003426


2022

Hastings JS, Howison M. 2022. Predicting Divertible Medicaid Emergency Department Costs. Digital Government: Research and Practice 3(3): 19:1–19:19. doi:10.1145/3548692

Singh M, et al. 2022. SARS-CoV-2 Variants in Rhode Island; May 2022 Update. Rhode Island Medical Journal 105(6): 6-11.

Howison M, Goggins M. 2022. SIRAD: Secure Infrastructure for Research with Administrative Data. Software Impacts 12: 100245. doi:10.1016/j.simpa.2022.100245

Steingrimsson JA, et al. 2022. Beyond HIV outbreaks: protocol, rationale and implementation of a prospective study quantifying the benefit of incorporating viral sequence clustering analysis into routine public health interventions. BMJ Open 12(4): e060184. doi:10.1136/bmjopen-2021-060184

Earnest R, et al. 2022. Comparative transmissibility of SARS-CoV-2 variants delta and alpha in New England, USA. Cell Reports Medicine 3(4): 100583. doi:10.1016/j.xcrm.2022.100583

Guang A, et al. 2022. Incorporating Within-Host Diversity in Phylogenetic Analyses for Detecting Clusters of New HIV Diagnoses. Frontiers in Microbiology 12: 803190. doi:10.3389/fmicb.2021.803190

Munro C, et al. 2022. Evolution of Gene Expression across Species and Specialized Zooids in Siphonophora. Molecular Biology and Evolution 39(2): msac027. doi:10.1093/molbev/msac027

Novitsky V, et al. 2022. Statewide Longitudinal Trends in Transmitted HIV-1 Drug Resistance in Rhode Island, USA. Open Forum Infectious Diseases 9(1): ofab587. doi:10.1093/ofid/ofab587


2021

Beckwith CG, et al. 2021. HIV Drug Resistance and Transmission Networks Among a Justice-Involved Population at the Time of Community Reentry in Washington, D.C. AIDS Research and Human Retroviruses 37(12): 903-912. doi:10.1089/aid.2020.0267

Angell M, et al. 2021. Estimating Value-added Returns to Labor Training Programs with Causal Machine Learning. OSF Preprints: thg23. doi:10.31219/osf.io/thg23

Novitsky V, et al. 2021. Longitudinal typing of molecular HIV clusters in a statewide epidemic. AIDS 35(11): 1711-1722. doi:10.1097/QAD.0000000000002953

Kantor R, et al. 2021. SARS-CoV-2 Variants in Rhode Island. Rhode Island Medical Journal 104(7): 50-54.

Guang A, et al. 2021. Revising transcriptome assemblies with phylogenetic information. PLOS ONE 16(1): e0244202. doi:10.1371/journal.pone.0244202


2020

Novitsky V, et al. 2020. Empirical comparison of analytical approaches for identifying molecular HIV-1 clusters. Scientific Reports 10(1): 18547. doi:10.1038/s41598-020-75560-1

Kantor R, et al. 2020. Challenges in evaluating the use of viral sequence data to identify HIV transmission networks for public health. Statistical Communications in Infectious Diseases 12(s1). doi:10.1515/scid-2019-0019

Angell M, et al. 2020. Delivering Unemployment Assistance in Times of Crisis. Digital Government: Research and Practice 2(1): 5:1-5:11. doi:10.1145/3428125

Parkin NT, et al. 2020. Multi-Laboratory Comparison of Next-Generation to Sanger-Based Sequencing for HIV-1 Drug Resistance Genotyping. Viruses 12(7): 694. doi:10.3390/v12070694

Hastings JS, Howison M, Inman SE. 2020. Predicting high-risk opioid prescriptions before they are given. Proceedings of the National Academy of Sciences 117(4): 1917-1923. doi:10.1073/pnas.1905355117


2019

Hastings JS, et al. 2019. Unlocking Data to Improve Public Policy. Communications of the ACM 62(10): 48-53. doi:10.1145/3335150

Berenbaum D, et al. 2019. Mining Spatio-temporal Data on Industrialization from Historical Registries. Journal of Environmental Informatics 34(1): 28-34. doi:10.3808/jei.201700381

Howison M, Coetzer M, Kantor R. 2019. Measurement error and variant-calling in deep Illumina sequencing of HIV. Bioinformatics 35(12): 2029-2035. doi:10.1093/bioinformatics/bty919


2018

Munro C, et al. 2018. Improved phylogenetic resolution within Siphonophora (Cnidaria) with implications for trait evolution. Molecular Phylogenetics and Evolution 127: 823-833. doi:10.1016/j.ympev.2018.06.030

Ji H, et al. 2018. Bioinformatic data processing pipelines in support of next-generation sequencing-based HIV drug resistance testing: the Winnipeg Consensus. Journal of the International AIDS Society 21(10): e25193. doi:10.1002/jia2.25193


2017

Howison M, Bethel EW. 2017. GPU-accelerated denoising of 3D magnetic resonance images. Journal of Real-Time Image Processing 13(4): 713-724. doi:10.1007/s11554-014-0436-8


2016

Guang A, et al. 2016. An Integrated Perspective on Phylogenetic Workflows. Trends in Ecology & Evolution 31(2): 116-126. doi:10.1016/j.tree.2015.12.007


2015

Zapata F, et al. 2015. Phylogenomic Analyses Support Traditional Relationships within Cnidaria. PLOS ONE 10(10): e0139068. doi:10.1371/journal.pone.0139068

Bethel EW, et al. 2015. Improving Performance of Structured-Memory, Data-Intensive Applications on Multi-core Platforms via a Space-Filling Curve Memory Layout. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 565-574, 25-29 May 2015, Hyderabad, India. doi:10.1109/IPDPSW.2015.71

Howison M, Shen A. 2015. Bioinformatics Brew: the cross-platform package manager for open-source bioinformatics tools. Poster presented at Bio-IT World, April 21-23, Boston, MA, USA. doi:10.7301/Z0Z60KZD


2014

Zapata F, et al. 2014. Phylogenomic analyses of deep gastropod relationships reject Orthogastropoda. Proceedings of the Royal Society B: Biological Sciences 281(1794): 20141739. doi:10.1098/rspb.2014.1739

Howison M, et al. 2014. Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling. PLOS ONE 9(6): e99497. doi:10.1371/journal.pone.0099497

Howison M, Zapata F, Dunn CW. 2013. Toward a statistically explicit understanding of de novo sequence assembly. Bioinformatics 29(23): 2959-2963. doi:10.1093/bioinformatics/btt525


2013

Dunn CW, Howison M, Zapata F. 2013. Agalma: an automated phylogenomics workflow. BMC Bioinformatics 14(1): 330. doi:10.1186/1471-2105-14-330

Howison M, Shen A, Loomis A. 2013. Building Software Environments for Research Computing Clusters. In Proceedings of the 27th Large Installation System Administration Conference (LISA '13), 3-8 November 2013, Washington, DC, USA.

Howison M. 2013. High-throughput compression of FASTQ data with SeqDB. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10(1): 213-218. doi:10.1109/TCBB.2012.160


2012

Bethel EW, Howison M. 2012. Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning. International Journal of High Performance Computing Applications 26(4): 399-412. doi:10.1177/1094342012440466

Howison M, Sinnott-Armstrong NA, Dunn CW. 2012. BioLite, a lightweight bioinformatics framework with automated tracking of diagnostics and provenance. In Proceedings of the 4th USENIX Workshop on the Theory and Practice of Provenance (TaPP '12), 14-15 June 2012, Boston, MA, USA.

Howison M, Bethel EW, Childs H. 2012. Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems. IEEE Transactions on Visualization and Computer Graphics 18(1): 17-29. doi:10.1109/TVCG.2011.24


2011

Howison M, et al. 2011. The Mathematical Imagery Trainer: From Embodied Interaction to Conceptual Learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1989-1998, 7-12 May 2011, Vancouver, BC, Canada. doi:10.1145/1978942.1979230


2010

Howison M, et al. 2010. H5hut: A High-Performance I/O Library for Particle-based Simulations. In Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS '10), 20-24 Sept. 2010, Heraklion, Crete, Greece. doi:10.1109/CLUSTERWKSP.2010.5613098

Howison M, et al. 2010. Tuning HDF5 for Lustre File Systems. In Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS '10), 20-24 Sept. 2010, Heraklion, Crete, Greece.

Childs H, et al. 2010. Extreme Scaling of Production Visualization Software on Diverse Architectures. IEEE Computer Graphics and Applications 30(3): 22-31. doi:10.1109/MCG.2010.51

Uselton A, et al. 2010. Parallel I/O performance: From events to ensembles. In Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, 19-23 April 2010, Atlanta, GA, USA. doi:10.1109/IPDPS.2010.5470424


2009

Howison M, Séquin CH. 2009. CAD Tools for the Construction of 3D Escher Tiles. Computer-Aided Design and Applications 6(6): 737-748. doi:10.3722/cadaps.2009.737-748