Publisher's Synopsis
Data preprocessing is a critical part of chemometric / metabonomic / metabolomic data analysis. This book is an introduction to many of the critical concepts involved. The best preprocessing methods will be the ones that ultimately produce a robust model with the most accurate predictive ability. Unfortunately, there are no particularly straightforward rules to guide investigators to the best selection of preprocessing options; the subsequent trial and error optimization process may be quite time consuming and confusing. However, spending little or no time investigating preprocessing options is likely to result in less than optimal results. The primary objective of this book is to present a relatively focused outline of the major options available for data analysis. The most frequently used methods are noted to varying degrees of elaboration; the more useful methods are discussed in more detail. Some of the selected topics are noted below. Chapter 1 Introduction Problems / Challenges. Reproducibility Issues. Chapter 2 Scaling Mean Centering. Autoscaling. Pareto Scaling. Range Scaling. Level Scaling. Log Transformation. Probabilistic Quotient Normalization. Variable Stability Scaling. Binning. Orthogonal Signal Correction. Histogram Matching. Chapter 3 Preprocessing Savitzky-Golay. Differentiation. Smoothing. Baseline Correction. Wavelets. Denoising. Peak Alignment. Chapter 4 Sample Subset Selection Sample Size. Representativity. Training / Testing / Validation Sample Selection. Outliers. Bootstrapping. Cross Validation. Mahalanobis Distance. Kennard-Stone. Duplex. SPXY. Rank Select. Kohonen Neural Networks. Chapter 5 Variable Subset Selection Missing Values. Imputation. Chance. Generalizability. Bias. Filter/Wrapper/Embedded Methods. Information Leak. F-test. T-test. Fisher Index. Chi-square test. Wilcoxon rank sum. ANOVA. Linear Discriminant Analysis. Principal Component Analysis. Partial Least Squares. IntervalPCA. IntervalPLS. VIP. Non-orthogonalized PLS. Outer Product Analysis. Uninformative Variable Elimination. Genetic Algorithms. Mutual Information. Particle Swarm Optimization. Ant Colony Optimization. CART. Back-Propagation Neural Networks, Probabilistic Neural Networks. Bayesian Belief Network. Support Vector Machines. Minimum Redundancy-Maximum Relevance. Random Forest. Independent Component Analysis. .. and many more.