Unsupervised learning on a large clinical corpus for biomedical knowledge discovery

ABOUT THE PROJECT

Project Description

Representation learning is a powerful machine learning technique to capture the intrinsic structure of data and meaningful interdependencies. It is growing in popularity in a wide variety of domains, including computer vision and natural language processing.

The availability of the clinical notes corpus at UCSF (~75M) represents an opportunity to apply these unsupervised methods (e.g. concept embeddings, autoencoders) and obtaining a representation that captures important semantics about medical practice. We propose to apply these methods and develop a framework to test, evaluate and compare different models. E.g does the representation properly capture the similarity between brand-name drugs and their generic counterparts? Does it capture ontological information (e.g. Ulcerative Colitis is a subtype of IBD)?

If successful, such a representation could form the basis of a future platform for biomedical knowledge discovery. For example, asking of it what drugs make IBD better will (hopefully) reveal the drugs currently used to treat the disease, but may also identify new drugs not previously known to reduce disease activity (e.g. drug repurposing).

LOOKING FOR

Required Skills

Python
HPC/GPU computing
Natural Language Processing tools (NLTK, CTAKES, PyTorch-NLP, etc)
Deep Learning

Required Course Work or Level of Knowledge

Machine Learning, intermediate - advanced
Individuals with a data science background, and ideally in deep learning would be most helpful for this project.

Acceptable Level of Education (eg. Undergrad, Grad Students, Post Docs, MD, PhD)

Undergraduate students, Graduate students, Postgraduates, Full-time workers who volunteer time

Funding

Potentially

CONTACT INFO

PI/Research group

Atul Butte

Contact

Vivek Rudrapatna, [email protected]

Computational Projects

Projects

Unsupervised learning on a large clinical corpus for biomedical knowledge discovery

ABOUT THE PROJECT

Project Description

LOOKING FOR

Required Skills

Required Course Work or Level of Knowledge

Acceptable Level of Education (eg. Undergrad, Grad Students, Post Docs, MD, PhD)

Funding

CONTACT INFO

PI/Research group

Contact

Skill / Knowledge Set

Blog Post Topics

Date of Publication

Computational Projects

Search form

You are here

Projects

Unsupervised learning on a large clinical corpus for biomedical knowledge discovery

ABOUT THE PROJECT

Project Description

LOOKING FOR

Required Skills

Required Course Work or Level of Knowledge

Acceptable Level of Education (eg. Undergrad, Grad Students, Post Docs, MD, PhD)

Funding

CONTACT INFO

PI/Research group

Contact

Skill / Knowledge Set

Blog Post Topics

Date of Publication