Shreyas Subramanian

AWS Certified Machine Learning Study Guide


Скачать книгу

2.1-1. Identify and handle missing data, corrupt data, stop words, etc. 6 2.1-2. Formatting, normalizing, augmenting, and scaling data 6 2.1-3. Labeled data (recognizing when you have enough labeled data and identifying mitigation strategies [Data labeling tools (Mechanical Turk, manual labor)]) 1, 5, 6

      Subdomain 2.2: Perform Feature Engineering

Exam Objective Chapter
2.2-1. Identify and extract features from datasets, including from data sources such as text, speech, image, public datasets, etc. 7
2.2-2. Analyze/evaluate feature engineering concepts (binning, tokenization, outliers, synthetic features, One-hot encoding, reducing dimensionality of data) 7

      Subdomain 2.3: Analyze and Visualize Data for Machine Learning

Exam Objective Chapter
2.3-1. Graphing (scatter plot, time series, histogram, box plot) 9
2.3-2. Interpreting descriptive statistics (correlation, summary statistics, p value) 9
2.3-3. Clustering (hierarchical, diagnosing, elbow plot, cluster size) 9

      Domain 3: Modeling

      Subdomain 3.1: Frame Business Problems as Machine Learning Problems

Exam Objective Chapter
3.1-1. Determine when to use/when not to use ML 3
3.1-2. Know the difference between supervised and unsupervised learning 4
3.1-3. Selecting from among classification, regression, forecasting, clustering, recommendation, etc. 4

      Subdomain 3.2: Select the Appropriate Model(s) for a Given Machine Learning Problem

Exam Objective Chapter
3.2-1. XGBoost, logistic regression, K-means, linear regression, decision trees, random forests, RNN, CNN, Ensemble, Transfer learning 8
3.2-2. Express intuition behind models 8

      Subdomain 3.3: Train Machine Learning Models

Exam Objective Chapter
3.3-1. Train validation test split, cross-validation 6
3.3-2. Optimizer, gradient descent, loss functions, local minima, convergence, batches, probability, etc. 8
3.3-3. Compute choice (GPU vs. CPU, distributed vs. non-distributed, platform [Spark vs. non-Spark] 12, 16
3.3-4. Model updates and retraining 8, 12

      Subdomain 3.4: Perform Hyperparameter Optimization

Exam Objective Chapter
3.4-1. Regularization 8
3.4-2. Cross validation 9
3.4-3. Model initialization 8
3.4-4. Neural network architecture (layers/nodes), learning rate, activation functions 8
3.4-5. Tree-based models (# of trees, # of levels) 8
3.4-6. Linear models (learning rate) 8

      Subdomain 3.5: Evaluate machine learning models

Exam Objective Chapter
3.5-1. Avoid overfitting/underfitting (detect and handle bias and variance 9
3.5-2.