Removing biases from molecular representations via information maximisation

Abstract: "High-throughput drug screening - using cell imaging or gene expression measurements as readouts of drug effect - is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an information maximisation approach for confounder removal, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalise their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimising correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE."

‍

Details

author(s)

Caroline Uhler

publication date

1 December 2023

source

Arxiv

related programme

MIT Jameel Clinic

Link to publication

External link ->

Generative AI in the era of 'alternative facts'

27 March 2024

|

MIT Open Publishing Services

External data and AI are making each other more valuable

26 February 2024

|

Harvard Business Review Press

Removing biases from molecular representations via information maximisation

1 December 2023

|

Arxiv

Effective human-AI teams via learned natural language rules and onboarding

7 November 2023

|

Arxiv

A deep dive into single-cell RNA sequencing foundation models

23 October 2023

|

bioRxiv

Antibiotic identified by AI

11 October 2023

|

Nature

LLM-grounded video diffusion models

2 October 2023

|

Arxiv

Successful Development of a Natural Language Processing Algorithm for Pancreatic Neoplasms and Associated Histologic Features

14 September 2023

|

Pancreas

Leveraging artificial intelligence in the fight against infectious diseases

13 July 2023

|

Science

BioAutoMATED: An end-to-end automated machine learning tool for explanation and design of biological sequences

21 June 2023

|

Cell Systems

Conformal language modeling

16 June 2023

|

Arxiv

Comparison of mammography AI algorithms with a clinical risk model for 5-year breast cancer risk prediction: An observational study

6 June 2023

|

Radiological Society of North America

Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii

25 May 2023

|

Nature

Algorithmic pluralism: A structural approach towards equal opportunity

14 May 2023

|

Arxiv

Artificial intelligence and machine learning in lung cancer screening

12 May 2023

|

Science Direct

Wide and deep neural networks achieve consistency for classification

30 March 2023

|

PNAS

Autocatalytic base editing for RNA-responsive translational control

11 March 2023

|

Nature

DiffDock: Diffusion steps, twists and turns for molecular docking

11 February 2023

|

Arxiv

Sybil: A Validated Deep Learning Model to Predict Future Lung Cancer Risk From a Single Low-Dose Chest Computed Tomography

12 January 2023

|

Journal of Clinical Oncology

Sequential multi-dimensional self-supervised learning for clinical time series

1 January 2023

|

Proceedings of Machine Learning Research

Queueing theory: Classical and modern methods

1 January 2022

|

Dynamic Ideas

Toward robust mammography-based models for breast cancer risk

26 January 2021

|

Science

The age of AI: And our human future

1 January 2021

|

Little, Brown and Company

Uniform priors for data-efficient transfer

13 October 2020

|

Arxiv

Machine learning under a modern optimisation lens

1 January 2019

|

Dynamic Ideas

The marginal value of adaptive gradient methods in machine learning

23 May 2017

|

Advances in Neural Information Processing Systems

Efficient graph-based image segmentation

1 January 2004

|

International Journal of Computer Vision

We use cookies on our site.