Classification with continuous, ordinal and categorical data
Supervisor: dr. Jeroen Jansen
Many analyses combine variables on different ‘levels’ of measurement. These range from continuous measurements from spectroscopy to ordinal measurements of low, medium and high levels. Other data may come in the form of categories that do not have a quantifiable relationship, such as race or species. These measurements together may be informative on the variability within the data and on the classification between groups of samples. Several methods are available to highlight how combinations between several such methods vary between samples, such as CAT-PCA ( http://dx.doi.org/10.1037/1082-989X.12.3.336). However, this approach cannot be used to classify samples into different groups, such as conventionally done in chemometrics by PLS-DA( doi:10.1016/S0169-7439(01)00155-1). Other methods that are specifically suited to classify samples with variables expressed on different levels of measurement, such as Random Forests (https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm ), do not provide information on the relationships between the variables that lead to the classification. This severely hampers the interpretation of the data. This internship aims to implement a PLS model for classification into the CATPCA analysis, to allow classification of data with mixed measurement levels while retaining overview of the relations between the variables.