Course · 12 modules · 82 lessons · 495 min

Machine Learning Foundations

Mathematical foundations, learning theory, supervised and unsupervised methods, neural networks, and production ML systems.

← All courses

Mathematical Foundations

·Derivatives and GradientsThe mathematical machinery for measuring how outputs change with inputs -- the foundation of all learning algorithms.5 min→·Information TheoryEntropy, KL divergence, and mutual information -- quantifying uncertainty, surprise, and the difference between distributions.5 min→·Matrix DecompositionsEigendecomposition, SVD, and Cholesky -- factoring matrices to reveal structure, compress data, and solve systems efficiently.5 min→·Maximum Likelihood EstimationFinding the parameter values that make observed data most probable -- the dominant paradigm for fitting ML models.5 min→·Norms and Distance MetricsMeasuring size and similarity in feature space -- L1, L2, cosine, Mahalanobis, and when each is appropriate.6 min→·Optimization and Gradient DescentIteratively adjusting parameters to minimize a loss function -- the engine that drives model training.5 min→·Probability FundamentalsRandom variables, distributions, Bayes' theorem, and conditional probability -- the language of uncertainty in ML.5 min→·Statistical InferenceDrawing conclusions about populations from samples -- hypothesis testing, confidence intervals, and the frequentist-Bayesian divide.5 min→·Vectors and MatricesThe fundamental data structures of ML -- representing data as points in high-dimensional space and transformations as matrices.5 min→

Data Science Fundamentals

·Data Cleaning and PreprocessingHandling noise, inconsistencies, and formatting issues -- garbage in, garbage out is the first law of ML.6 min→·Data Splitting and SamplingTrain/validation/test splits, stratification, and handling class imbalance -- the foundation of honest evaluation.8 min→·Data Types and StructuresNumerical, categorical, ordinal, text, time series -- understanding your data's nature determines every downstream decision.5 min→·Encoding Categorical VariablesOne-hot, label, target, and embedding-based encoding -- translating categories into numbers without introducing false relationships.7 min→·Exploratory Data AnalysisVisualizing distributions, correlations, and anomalies before modeling -- the most undervalued step in the ML pipeline.6 min→·Feature Scaling and NormalizationStandardization, min-max scaling, and robust scaling -- ensuring features contribute equally regardless of their original units.6 min→·Handling Missing DataDeletion, imputation, and model-based approaches -- the strategy depends on why data is missing, not just how much.7 min→

Core Learning Theory

·Bias-Variance TradeoffThe fundamental tension between underfitting and overfitting -- every model navigates this tradeoff whether you manage it or not.6 min→·Curse of DimensionalityAs dimensions increase, data becomes sparse, distances become meaningless, and exponentially more data is needed.6 min→·Empirical Risk MinimizationMinimizing average loss on training data as a proxy for true risk -- the theoretical framework underlying most ML algorithms.6 min→·Loss FunctionsThe objective being optimized -- MSE, cross-entropy, hinge loss, and how the choice shapes what the model learns.5 min→·Overfitting and UnderfittingMemorizing training data vs. failing to capture patterns -- the two failure modes of every learning algorithm.6 min→·RegularizationConstraining model complexity to improve generalization -- L1, L2, dropout, early stopping, and the bias-variance connection.6 min→·Types of Machine LearningSupervised, unsupervised, semi-supervised, and self-supervised -- a taxonomy based on what labels are available.6 min→·What Is Machine Learning?Learning patterns from data rather than programming rules explicitly -- the three paradigms and when each applies.6 min→

Supervised Learning Regression

·Generalized Linear ModelsExtending linear regression to non-normal responses via link functions -- unifying logistic, Poisson, and other regression types.7 min→·Linear RegressionFitting a hyperplane to data by minimizing squared errors -- the most interpretable and foundational predictive model.6 min→·Polynomial RegressionCapturing nonlinear relationships within the linear regression framework by adding polynomial feature terms.6 min→·Regression DiagnosticsResidual analysis, heteroscedasticity, multicollinearity, and influence points -- verifying assumptions before trusting results.6 min→·Ridge and Lasso RegressionL2 and L1 penalties that shrink coefficients toward zero -- Ridge for stability, Lasso for sparsity and feature selection.7 min→

Supervised Learning Classification

·Decision TreesRecursive binary splitting that produces interpretable if-then rules -- the building block of ensemble methods.6 min→·K-Nearest NeighborsClassify by majority vote of the K closest training examples -- no training phase, all computation at prediction time.6 min→·Kernel MethodsThe kernel trick maps data to higher dimensions without explicit computation -- making linear methods handle nonlinear boundaries.6 min→·Logistic RegressionLinear model with sigmoid output for probability estimation -- the workhorse baseline for binary classification.6 min→·Multi-Class ClassificationExtending binary classifiers to multiple classes via one-vs-rest, one-vs-one, and native multi-class approaches.7 min→·Naive BayesApplying Bayes' theorem with a strong independence assumption -- surprisingly effective despite being "wrong" in theory.6 min→·Support Vector MachinesFinding the maximum-margin hyperplane that separates classes -- elegant geometry with strong theoretical guarantees.6 min→

Ensemble Methods

·AdaBoostSequentially training weak learners that focus on previously misclassified examples -- boosting accuracy through reweighting.5 min→·Bagging and BootstrapTraining multiple models on bootstrapped samples and averaging predictions -- reducing variance through diversity.5 min→·Gradient BoostingBuilding an additive model by fitting each new tree to the residual errors of the ensemble -- the most powerful tabular method.6 min→·Random ForestsBagged decision trees with random feature subsets -- robust, parallelizable, and hard to overfit with more trees.5 min→·Stacking and BlendingTraining a meta-learner on base model predictions -- combining diverse model families for competition-winning performance.7 min→·XGBoost, LightGBM, and CatBoostIndustrial-strength gradient boosting implementations with regularization, histogram binning, and GPU acceleration.7 min→

Unsupervised Learning

·Anomaly DetectionIdentifying data points that deviate significantly from the norm -- isolation forests, autoencoders, and statistical approaches.7 min→·Association RulesDiscovering frequent itemsets and co-occurrence patterns in transactional data -- the Apriori algorithm and market basket analysis.7 min→·DBSCANDiscovering arbitrarily-shaped clusters based on point density -- no need to specify K, naturally identifies outliers.8 min→·Gaussian Mixture ModelsSoft clustering via a weighted sum of Gaussians fitted with EM -- probabilistic assignment captures cluster uncertainty.7 min→·Hierarchical ClusteringBuilding a tree of nested clusters via agglomerative merging or divisive splitting -- revealing multi-scale data structure.6 min→·K-Means ClusteringPartitioning data into K groups by iteratively assigning points to nearest centroids -- simple, fast, and surprisingly effective.7 min→·Principal Component AnalysisProjecting data onto orthogonal directions of maximum variance -- the foundational dimensionality reduction technique.8 min→·t-SNE and UMAPNonlinear dimensionality reduction for visualization -- preserving local neighborhood structure in 2D/3D plots.8 min→

Neural Network Foundations

·Activation FunctionsNonlinear transforms between layers -- ReLU, sigmoid, tanh, and why the choice matters for gradient flow and expressivity.5 min→·BackpropagationComputing gradients layer by layer via the chain rule -- the algorithm that makes deep learning computationally feasible.5 min→·Batch NormalizationNormalizing layer inputs within each mini-batch -- stabilizing training, enabling higher learning rates, and acting as regularization.5 min→·Dropout and RegularizationRandomly zeroing activations during training -- an implicit ensemble that prevents co-adaptation of neurons.6 min→·OptimizersSGD, momentum, RMSProp, Adam, and AdamW -- adaptive methods that navigate loss landscapes faster than vanilla gradient descent.5 min→·Perceptrons and Multilayer NetworksFrom single linear classifiers to universal function approximators -- stacking layers creates representational power.5 min→·Universal Approximation TheoremA single hidden layer with enough neurons can approximate any continuous function -- but finding those weights is the hard part.7 min→·Weight InitializationXavier, He, and orthogonal initialization -- breaking symmetry and controlling signal magnitude at the start of training.5 min→

Probabilistic Methods

·Bayesian InferenceUpdating beliefs with evidence via Bayes' theorem -- treating parameters as distributions rather than fixed values.6 min→·Expectation-MaximizationIteratively inferring latent variables (E-step) and optimizing parameters (M-step) -- the workhorse for incomplete data.6 min→·Gaussian ProcessesNonparametric Bayesian regression defining distributions over functions -- elegant uncertainty quantification with $O(n^3)$ cost.6 min→·Graphical ModelsBayesian networks and Markov random fields representing conditional dependencies as graphs -- structured probabilistic reasoning.7 min→·Markov Chain Monte CarloSampling from complex posterior distributions by constructing Markov chains -- when exact inference is intractable.7 min→·Variational InferenceApproximating intractable posteriors by optimization rather than sampling -- trading exactness for scalability.7 min→

Model Selection And Evaluation

·CalibrationWhen a model says "80% confidence" it should be right 80% of the time -- reliability diagrams, Platt scaling, and isotonic regression.6 min→·Classification MetricsAccuracy, precision, recall, F1, AUC-ROC, and AUC-PR -- choosing the right metric depends on what errors cost.5 min→·Cross-ValidationK-fold, stratified, and leave-one-out validation -- maximizing use of limited data for both training and evaluation.6 min→·Hyperparameter TuningGrid search, random search, and Bayesian optimization -- finding optimal settings without overfitting to the validation set.5 min→·Learning CurvesPlotting performance vs. training set size or training iterations -- diagnosing whether you need more data, more capacity, or more regularization.6 min→·Model ComparisonPaired t-tests, McNemar's test, and Wilcoxon signed-rank -- determining if performance differences are real or noise.6 min→·Regression MetricsMSE, RMSE, MAE, MAPE, and R-squared -- each captures different aspects of prediction quality.5 min→

Feature Engineering

·Automated Feature EngineeringAutoML, Featuretools, and neural feature learning -- when manual engineering doesn't scale.7 min→·Feature Extraction and TransformationCreating new informative features from raw data through domain knowledge, mathematical transforms, and automated methods.7 min→·Feature Selection MethodsFilter, wrapper, and embedded approaches for identifying the most informative features -- removing noise to improve generalization.7 min→·Handling High-Cardinality FeaturesTarget encoding, hashing, and embedding approaches for categorical features with thousands of unique values.7 min→·Time-Series Feature EngineeringLags, rolling statistics, seasonality decomposition, and calendar features -- encoding temporal patterns for ML models.6 min→

Ml Systems And Production

·A/B Testing for MLComparing model versions in production with statistical rigor -- offline metrics don't always predict online impact.6 min→·Data Drift and Model MonitoringDetecting when production data diverges from training data -- models degrade silently without monitoring.5 min→·Experiment TrackingLogging parameters, metrics, artifacts, and code versions -- reproducing results and navigating the experiment space systematically.5 min→·ML PipelinesChaining data processing, feature engineering, and model training into reproducible, deployable workflows.5 min→·Model Deployment and ServingBatch vs. real-time inference, containerization, model registries, and the infrastructure of production ML.6 min→·Responsible AI and FairnessMeasuring and mitigating bias, ensuring transparency, and building ML systems that are accountable and equitable.7 min→