Projects are posted to help find interested individuals with appropriate expertise to implement needs

Text Classification from Clinical Notes


Project Description

The use of real-world data as a source of clinical evidence is growing in importance following the widespread adoption of electronic health records (EHR) systems. However, the use of these data are plagued by many challenges, not the least of which is that most clinically-meaningful data resides in an unstructured form (i.e. free text). To answer clinically-meaningful questions, we will need to sequentially develop algorithms to abstract important data elements from the EHR one-by-one. Doing so may require customized solutions – rule-based vs traditional statistical vs deep learning – depending on the specific data element; it will also generally require the development of a gold-standard data set to perform model training and evaluation.

To answer the important and unanswered questions relevant to the comparative effectiveness of treatments for IBD and precision medicine (which drugs for which patients), we propose to develop a series of text classifiers/regressors to abstract these elements from clinical notes. This process will begin with colonoscopy reports (IBD severity scores) and then involve other medication-related data from clinical notes.


Required Skills
  • Python
  • HPC/GPU computing
  • Natural Language Processing tools (NLTK, CTAKES, PyTorch-NLP, etc)
  • Deep Learning
Required Course Work or Level of Knowledge
  • Machine Learning, intermediate - advanced
  • Ideally, candidates would fall into one of the following categories
  1. Experience with  machine learning and natural language processing. Deep learning experience would be a plus but not critical.
  2. A medical background (to assist in study design and manual data abstraction for model training).
Acceptable Level of Education (eg. Undergrad, Grad Students, Post Docs, MD, PhD)

Undergraduate students, Graduate students, Postgraduates, Full-time workers who volunteer time




PI/Research group

Atul Butte


Vivek Rudrapatna, [email protected]