Docs

Removing biases from molecular representations via information maximisation

Abstract: "High-throughput drug screening - using cell imaging or gene expression measurements as readouts of drug effect - is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an information maximisation approach for confounder removal, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalise their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimising correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE."

Details

author(s)
Caroline Uhler
publication date
1 December 2023
source
Arxiv
related programme
MIT Jameel Clinic
Link to publication
External link ->

Generative AI in the era of 'alternative facts'

|

MIT Open Publishing Services

External data and AI are making each other more valuable

|

Harvard Business Review Press

Removing biases from molecular representations via information maximisation

|

Arxiv

Effective human-AI teams via learned natural language rules and onboarding

|

Arxiv

A deep dive into single-cell RNA sequencing foundation models

|

bioRxiv

Antibiotic identified by AI

|

Nature

LLM-grounded video diffusion models

|

Arxiv

Successful Development of a Natural Language Processing Algorithm for Pancreatic Neoplasms and Associated Histologic Features

|

Pancreas

Leveraging artificial intelligence in the fight against infectious diseases

|

Science

BioAutoMATED: An end-to-end automated machine learning tool for explanation and design of biological sequences

|

Cell Systems

Conformal language modeling

|

Arxiv

Comparison of mammography AI algorithms with a clinical risk model for 5-year breast cancer risk prediction: An observational study

|

Radiological Society of North America

Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii

|

Nature

Algorithmic pluralism: A structural approach towards equal opportunity

|

Arxiv

Artificial intelligence and machine learning in lung cancer screening

|

Science Direct

Wide and deep neural networks achieve consistency for classification

|

PNAS

Autocatalytic base editing for RNA-responsive translational control

|

Nature

DiffDock: Diffusion steps, twists and turns for molecular docking

|

Arxiv

Sybil: A Validated Deep Learning Model to Predict Future Lung Cancer Risk From a Single Low-Dose Chest Computed Tomography

|

Journal of Clinical Oncology

Sequential multi-dimensional self-supervised learning for clinical time series

|

Proceedings of Machine Learning Research

Queueing theory: Classical and modern methods

|

Dynamic Ideas

Toward robust mammography-based models for breast cancer risk

|

Science

The age of AI: And our human future

|

Little, Brown and Company

Uniform priors for data-efficient transfer

|

Arxiv

Machine learning under a modern optimisation lens

|

Dynamic Ideas

The marginal value of adaptive gradient methods in machine learning

|

Advances in Neural Information Processing Systems

Efficient graph-based image segmentation

|

International Journal of Computer Vision

We use cookies on our site.