Data Integration Via Analysis of Subspaces (DIVAS)

May 3, 2019 - 10:00am
Jan Hannig, Ph.D., Department of Statistics and Operations Research, UNC-Chapel Hill
StatGen Seminar Series

A major challenge in the age of Big Data is the integration of disparate data types into a data analysis. That is tackled here in the context of data blocks measured on a common set of experimental subjects. This data structure motivates the simultaneous exploration of the joint and individual variation within each data block. This is done here in a way that scales well to large data sets (with blocks of wildly disparate size), using principal angle analysis, careful formulation of the underlying linear algebra, and differing outputs depending on the analytical goals. Ideas are illustrated using cancer and neuroimaging data sets.

Weekly Forum