Casper H. Blaauw

Making artificial neurons learn how cells become real neurons

Hi! I'm Cas, a bioinformatician working on gene regulation and genomic machine learning. Currently, I'm a joint PhD student in the Aerts lab in Leuven, Belgium and the van Oudenaarden lab in Utrecht, the Netherlands. I'm interested in building and investigating models that can enable deeper insights in the rules behind cell type-specific gene regulatory programmes.


I previously studied bioinformatics in Copenhagen, where I also worked as a student bioinformatician at a Danish biotech scale-up. Before that, I did molecular biology, programming and statistics at Utrecht University, resulting in a degree in Molecular Life Sciences with a minor in Applied Data Science. I also studied at the University of Hong Kong, focusing on politics, societies and nationalism in East Asia.

Besides bioinformatics, I care about responsible data use, algorithmic justice, and debunking biological determinism. In my free time, some of my interests are baking, urban design, languages, and obscure geographical knowledge. To learn more about me, you can read my resume, explore some of my code on GitHub, or scroll down for more details about some of my former projects or my background.


Projects

Modeling gene regulation through sequence-based deep learning

VIB-KU Leuven and Hubrecht Institute - Aerts lab and van Oudenaarden lab

Exploring the potential of large-scale sequence-based genomic machine learning models to further our understanding of gene regulation.

Topics: deep learning · gene regulation · development | Tools: Python · scanpy · PyTorch

2022-

Predicting O-GalNAc glycosylation through transfer learning

University of Copenhagen - Joshi lab

MSc thesis; trained a combined region-site model for O-GalNAc prediction based on protein language model embeddings.

Topics: deep learning · transfer learning · glycosylation | Tools: Python · PyTorch

2022

Visualization of kinase preferences for phosphorylation sites

University of Copenhagen - Jensen lab

Building a tool to represent phosphorylated sites using predicted kinase probabilities in a low-dimensional colour space.

Topics: high-dimensional visualisation · phosphoproteomics | Tools: R · U-CIE

2021

RNA velocities of tissue-tissue interaction spaces from spatial transcriptomics

University of Copenhagen - Won lab

Worked on an approach to investigate the transcriptome dynamics of cells at the borders of different tissues, building interaction-aware RNA velocities from spatial transcriptomics data.

Topics: spatial transcriptomics · RNA velocity | Tools: R · Python · Seurat · scVelo | Code: GitHub

2021

Machine learning for microbiome analysis

Built a ML toolkit to robustly link microbiome abundance with various phenotypes, implementing a variety of possible ML approaches.

Topics: machine learning · microbiomics | Tools: R · mlr3

2021

Language modeling of proteins for glycosylation site prediction

University of Copenhagen - Joshi lab with Nielsen lab & Winther lab

Used BERT-based deep learning models from language modeling, trained on UniProt rather than Wikipedia, to distill protein sequences to their core grammatical context in the language of life. Armed with the contextual representations for a variety of glycosylated sequences, I set out to figure out the rules behind O-GalNAc glycosylation, a biological process under complex multi-level regulation, by training a CNN. Work presented as a poster at the Danish Bioinformatics Conference 2021.

Topics: deep learning · transfer learning | Tools: Python · PyTorch | Code: GitHub

2021

Automating and standardising report creation

Built a system to automate the production of production-level reports, integrating data and report text in RMarkdowns and exporting to fully-styled Microsoft Office documents.

Topics: tool development · automation | Tools: R · RMarkdown · officedown

2021

Dynamic eQTL mapping in development and stress

Utrecht University - Snoek lab

Investigated the dynamics of transcriptional regulation during cellular stress and development. Incorporating this transcriptomic data with eQTL mapping provided new insights in the dynamics of transcriptomic regulation and increased eQTL detection accuracy.

Topics: transcriptomics · statistical genetics | Tools: R · random forests | Code: GitHub

2019-2020, 2021

Homomerisation state prediction using symmetry-driven protein docking

Utrecht University - Bonvin lab

Built a proof-of-concept pipeline to predict protein homomer stoichiometry from monomer structures using HADDOCK, an information-driven protein docking software.

Topics: structural bioinformatics · protein docking | Tools: HADDOCK · pymol

2018

Skills

Primary competences

  • Data analysis
  • Tool and package development
  • Biological interpretation
  • Statistical machine learning
  • Data visualisation
  • Science pedagogy
  • Societal impact assessment

Programming Languages & Tools

  • R - [tidyverse · Shiny · officedown · mlr3 · DESeq2 · Seurat]
  • Python - [numpy · pandas · PyTorch · scikit-learn · biopython · scanpy]
  • Other tools - [bash · git · LaTeX]

About me

Current interests

My overarching interest is encapsulating the complexity of (biological) systems. I love exploring and developing tools that allow analysis of the molecular biological system as a whole. Most recently, this has brought me to representation-based approaches, primarily in transcriptomics and proteomics. In a more general sense, I enjoy learning about how other fields (from NLP to GIS) approach 'data', and theorising about information more broadly.

Academic background

I always like to say that I ended up in life sciences due to a single gif, which solidified my fascination for the molecular mechanisms of life. The life sciences did face competition when picking my major (which you do at enrolment in the Netherlands) though, as fields like computer science, systems-oriented sciences, and policy and society were at the back of my mind.

I did really like biology and chemistry's experiment-based approach to science, and so I enrolled in Molecular Life Sciences (now Molecular and Biophysical Life Sciences) in Utrecht. That programme, primarily hosted by the Chemistry department, definitely shaped me as a scientist; emphasising fundamental understanding over rote memorisation, focusing on the big picture instead of navel-gazing on one factor, and prioritising independent thought over blindly following protocols.

After my first year, I did leave the chemistry labs behind, as I realised their (excellent) work in structural biology was not what I was looking for: analysing genomes and transcriptomes, rather than painstakingly figuring out individual protein structures, was much closer to my goal of understanding what makes cells tick. As such, I indulged in programming and statistics, as deciphering (biological) datasets to understand large-scale biology was what really got my attention. In a way, that was a natural fit, bringing back some of the computer science skills and systems outlook that had appealed to me in the past as well. In doing so, I even got to work at the group that originally coined the term bioinformatics!

In keeping with this interdisciplinary trend, I decided to look beyond the Netherlands to go study in Hong Kong for a semester. While there, I focused on politics, society, and governance of East Asia, to gain a scholarly background in radically different academic disciplines and to satisfy my interests in global affairs. Although the different standards of these 'foreign' disciplines definitely took some getting used to, I've gained some of the most valuable academic experience of my career here. Although I am a natural scientist at heart, learning to do research for history and political science thoroughly reminded me that our scientific persuits are fundamentally human endeavours, with all the pitfalls that that brings. Many choices in science and technology have real-world consequences, from utilising mass incarcerated populations' genomes in authoritarian regimes in population genetics to crime-predicting algorithms leading to self-fulfilling prophesies in ML, and these topics deserve proper thought.

After returning to Utrecht, I finished up my bachelor by following a minor in Applied Data Science and doing my thesis research in genomic regulation of transcription. At that point, there really was no question that I wanted to continue for a master in bioinformatics. Looking around Europe, Copenhagen caught my eye: perhaps the biggest life sciences hub in Europe, with great bioinformatics faculty hosting a great MSc programme, and in a lovely city to boot. Ever since, I've been nothing but glad that I've been admitted here, as it's been great to get involved with leading bioinformatics work in both academia and industry.


Contact me

Interested in my work, left with questions, or just want to connect? Feel free to send a message, I'd love to chat! You can find my email address here.

I'm always glad to talk bioinformatics, science in society, and experiences with studying abroad.


For those left wondering about my name, I go by Cas in Dutch and both Cas or Casper in English.
Following Dutch naming custom, I was named Cas (IPA: /kɑs/), but received a set of longer full names at birth. These only really functioned as initials, until I moved abroad and every system started referring to me as Casper. For fun, keep an eye out when visiting Dutch academic websites to see this in action: they will almost always refer to someone's preferred daily name alongside their initials.
Either way, feel free to use either when contacting me!