Increasing the power of DAMACY
Supervisor: Gerjen Tinnevelt
Multicolor Flow Cytometry (MFC) is a powerful analytical platform to measure the expression of several surface markers on a single cell. A typical MFC sample may contain a very large number of cells (>10000).1 The number of markers that can be measured on the same cell is constantly increasing. Chemometric multivariate analysis is needed to visualize the high dimensional data based on all measured markers. These analysis can be used to enhance the study of hematopoiesis and immunology, including immune responses on drugs and tumor progression.2 Additionally, MFC is also used to measure the autofluorescence of algae.3
One of the current methods which is being developed is called Discriminant Analysis of Multi-Aspect CYtometry data (DAMACY). DAMACY creates a cellular distribution of each sample using multidimensional histograms made within an interpretable Principal Component space in which all cells are individually expressed. The histogram bins are chosen as small as possible to only contain cells with very similar properties. These bins can however be directly compared to each other between samples, unlike the cells themselves. The bins are compared in a second model that is supervised to find phenotype-related difference in expression, using Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA). This second model provides a highly insightful 'cell map’ that provides an unprecedented view on the immunological, hematology, environmental background.
Ideally you would want to best describe all the different types of cells. However, the current base model is based on the maximum explained variance by the first Principal Components and therefore the information of different cell populations can be missed. In literature methods can be found that are developed for finding different cell populations and are based on distance matrices. However, with the use of distance matrices, the original surface marker intensity is lost. The surface marker contribution can be found by applying pseudo samples, a method developed within the department.
The base model should also have the discriminating information needed for the top model. However, Principal Component Analysis (PCA) is unsupervised, the information of the different groups is not used. It is possible to teach PCA to only model the most discriminating cells. A learning algorithm can be developed by selecting the cells from bins with high weights in the OPLS-DA top model and to reperform the DAMACY algorithm with only these selected cells.