Preface
Chapter 1: Practical Machine Learning with R
Introduction
Downloading and installing R
Downloading and installing RStudio
Installing and loading packages
Reading and writing data
Using R to manipulate data
Applying basic statistics
Visualizing data
Getting a dataset for machine learning
Chapter 2: Data Exploration with RMS Titanic
Introduction
Reading a Titanic dataset from a CSV file
Converting types on character variables
Detecting missing values
Imputing missing values
Exploring and visualizing data
Predicting passenger survival with a decision tree
Validating the power of prediction with a confusion matrix
Assessing performance with the ROC curve
Chapter 3: R and Statistics
Introduction
Understanding data sampling in R
Operating a probability distribution in R
Working with univariate descriptive statistics in R
Performing correlations and multivariate analysis
Operating linear regression and multivariate analysis
Conducting an exact binomial test
Performing student''s t-test
Performing the Kolmogorov-Smirnov test
Understanding the Wilcoxon Rank Sum and Signed Rank test
Working with Pearson''s Chi-squared test
Conducting a one-way ANOVA
Performing a two-way ANOVA
Chapter 4: Understanding Regression Analysis
Introduction
Fitting a linear regression model with Im
Summarizing linear model fits
Using linear regression to predict unknown values
Generating a diagnostic plot of a fitted model
Fitting a polynomial regression model with Im
Fitting a robust linear regression model with rim
Studying a case of linear regression on SLID data
Applying the Gaussian model for generalized linear regression
Applying the Poisson model for generalized linear regression
Applying the Binomial model for generalized linear regression
Fitting a generalized additive model to data
Visualizing a generalized additive model
Diagnosing a generalized additive model
Chapter 5: Classification I - Tree, Lazy, and Probabilistic
Introduction
Preparing the training and testing datasets
Building a classification model with recursive partitioning trees
Visualizing a recursive partitioning tree
Measuring the prediction performance of a recursive partitioning tree
Pruning a recursive partitioning tree
Building a classification model with a conditional inference tree
Visualizing a conditional inference tree
Measuring the prediction performance of a conditional inference tree
Classifying data with the k-nearest neighbor classifier
Classifying data with logistic regression
Classifying data with the Naive Bayes classifier
Chapter 6: Classification II - Neural Network and SVM
Introduction
Classifying data with a support vector machine
Choosing the cost of a support vector machine
Visualizing an SVM fit
Predicting labels based on a model trained by a support vector machine
Tuning a support vector machine
Training a neural network with neuralnet
Visualizing a neural network trained by neuralnet
Predicting labels based on a model trained by neuralnet
Training a neural network with nnet
Predicting labels based on a model trained by nnet
Chapter 7: Model Evaluation
Introduction
Estimating model performance with k-fold cross-validation
Performing cross-validation with the e1071 package
Performing cross-validation with the caret package
Ranking the variable importance with the caret package
Ranking the variable importance with the trainer package
Finding highly correlated features with the caret package
Selecting features using the caret package
Measuringthe performance of the regression model
Measuring prediction performance with a confusion matrix
Measuring prediction performance using ROCR
Comparing an ROC curve using the caret package
Measuring performance differences between models with the caret package
Chapter 8: Ensemble Learning
Introduction
Classifying data with the bagging method
Performing cross-validation with the bagging method
Classifying data with the boosting method
Performing cross-validation with the boosting method
Classifying data with gradient boosting
Calculating the margins of a classifier
Calculating the error evolution of the ensemble method
Classifying data with random forest
Estimating the prediction errors of different classifiers
Chapter 9: Clustering
Introduction
Clustering data with hierarchical clustering
Cutting trees into clusters
Clustering data with the k-means method
Drawing a bivariate cluster plot
Comparing clustering methods
Extracting silhouette information from clustering
Obtaining the optimum number of clusters for k-means
Clustering data with the density-based method
Clustering data with the model-based method
Visualizing a dissimilarity matrix
Validating clusters externally
Chapter 10: Association Analysis and Sequence Mining
Introduction
Transforming data into transactions
Displaying transactions and associations
Mining associations with the Apriori rule
Pruning redundant rules
Visualizing association rules
Mining frequent itemsets with Eclat
Creating transactions with temporal information
Mining frequent sequential patterns with cSPADE
Chapter 11: Dimension Reduction
Introduction
Performing feature selection with FSelector
Performing dimension reduction with PCA
Determining the number of principal components using the scree test
Determining the number of principal components using the Kaiser method
Visualizing multivariate data using biplot
Performing dimension reduction with MDS
Reducing dimensions with SVD
Compressing images with SVD
Performing nonlinear dimension reduction with ISOMAP
Performing nonlinear dimension reduction with Local Linear Embedding
Chapter 12: Big Data AnalysisR and Hadoop
Introduction
Preparing the RHadoop environment
Installing rmr2
Installing rhdfs
Operating HDFS with rhdfs
Implementing a word count problem with RHadoop
Comparing the performance between an R MapReduce program and a standard R program
Testing and debugging the rmr2 program
Installing plyrmr
Manipulating data with plyrmr
Conducting machine learning with RHadoop
Configuring RHadoop clusters on Amazon EMR
Appendix A: Resources for R and Machine Learning
Appendix B: Dataset - Survival of Passengers on the Titanic
Index