Course · 12 modules · 82 lessons · 495 min

Machine Learning Foundations

Mathematical foundations, learning theory, supervised and unsupervised methods, neural networks, and production ML systems.

← All courses
Mathematical Foundations
·Derivatives and GradientsThe mathematical machinery for measuring how outputs change with inputs -- the foundation of all learning algorithms.5 min·Information TheoryEntropy, KL divergence, and mutual information -- quantifying uncertainty, surprise, and the difference between distributions.5 min·Matrix DecompositionsEigendecomposition, SVD, and Cholesky -- factoring matrices to reveal structure, compress data, and solve systems efficiently.5 min·Maximum Likelihood EstimationFinding the parameter values that make observed data most probable -- the dominant paradigm for fitting ML models.5 min·Norms and Distance MetricsMeasuring size and similarity in feature space -- L1, L2, cosine, Mahalanobis, and when each is appropriate.6 min·Optimization and Gradient DescentIteratively adjusting parameters to minimize a loss function -- the engine that drives model training.5 min·Probability FundamentalsRandom variables, distributions, Bayes' theorem, and conditional probability -- the language of uncertainty in ML.5 min·Statistical InferenceDrawing conclusions about populations from samples -- hypothesis testing, confidence intervals, and the frequentist-Bayesian divide.5 min·Vectors and MatricesThe fundamental data structures of ML -- representing data as points in high-dimensional space and transformations as matrices.5 min
Data Science Fundamentals
·Data Cleaning and PreprocessingHandling noise, inconsistencies, and formatting issues -- garbage in, garbage out is the first law of ML.6 min·Data Splitting and SamplingTrain/validation/test splits, stratification, and handling class imbalance -- the foundation of honest evaluation.8 min·Data Types and StructuresNumerical, categorical, ordinal, text, time series -- understanding your data's nature determines every downstream decision.5 min·Encoding Categorical VariablesOne-hot, label, target, and embedding-based encoding -- translating categories into numbers without introducing false relationships.7 min·Exploratory Data AnalysisVisualizing distributions, correlations, and anomalies before modeling -- the most undervalued step in the ML pipeline.6 min·Feature Scaling and NormalizationStandardization, min-max scaling, and robust scaling -- ensuring features contribute equally regardless of their original units.6 min·Handling Missing DataDeletion, imputation, and model-based approaches -- the strategy depends on why data is missing, not just how much.7 min
Core Learning Theory
·Bias-Variance TradeoffThe fundamental tension between underfitting and overfitting -- every model navigates this tradeoff whether you manage it or not.6 min·Curse of DimensionalityAs dimensions increase, data becomes sparse, distances become meaningless, and exponentially more data is needed.6 min·Empirical Risk MinimizationMinimizing average loss on training data as a proxy for true risk -- the theoretical framework underlying most ML algorithms.6 min·Loss FunctionsThe objective being optimized -- MSE, cross-entropy, hinge loss, and how the choice shapes what the model learns.5 min·Overfitting and UnderfittingMemorizing training data vs. failing to capture patterns -- the two failure modes of every learning algorithm.6 min·RegularizationConstraining model complexity to improve generalization -- L1, L2, dropout, early stopping, and the bias-variance connection.6 min·Types of Machine LearningSupervised, unsupervised, semi-supervised, and self-supervised -- a taxonomy based on what labels are available.6 min·What Is Machine Learning?Learning patterns from data rather than programming rules explicitly -- the three paradigms and when each applies.6 min
Supervised Learning Regression
·Generalized Linear ModelsExtending linear regression to non-normal responses via link functions -- unifying logistic, Poisson, and other regression types.7 min·Linear RegressionFitting a hyperplane to data by minimizing squared errors -- the most interpretable and foundational predictive model.6 min·Polynomial RegressionCapturing nonlinear relationships within the linear regression framework by adding polynomial feature terms.6 min·Regression DiagnosticsResidual analysis, heteroscedasticity, multicollinearity, and influence points -- verifying assumptions before trusting results.6 min·Ridge and Lasso RegressionL2 and L1 penalties that shrink coefficients toward zero -- Ridge for stability, Lasso for sparsity and feature selection.7 min
Supervised Learning Classification
·Decision TreesRecursive binary splitting that produces interpretable if-then rules -- the building block of ensemble methods.6 min·K-Nearest NeighborsClassify by majority vote of the K closest training examples -- no training phase, all computation at prediction time.6 min·Kernel MethodsThe kernel trick maps data to higher dimensions without explicit computation -- making linear methods handle nonlinear boundaries.6 min·Logistic RegressionLinear model with sigmoid output for probability estimation -- the workhorse baseline for binary classification.6 min·Multi-Class ClassificationExtending binary classifiers to multiple classes via one-vs-rest, one-vs-one, and native multi-class approaches.7 min·Naive BayesApplying Bayes' theorem with a strong independence assumption -- surprisingly effective despite being "wrong" in theory.6 min·Support Vector MachinesFinding the maximum-margin hyperplane that separates classes -- elegant geometry with strong theoretical guarantees.6 min
Ensemble Methods
·AdaBoostSequentially training weak learners that focus on previously misclassified examples -- boosting accuracy through reweighting.5 min·Bagging and BootstrapTraining multiple models on bootstrapped samples and averaging predictions -- reducing variance through diversity.5 min·Gradient BoostingBuilding an additive model by fitting each new tree to the residual errors of the ensemble -- the most powerful tabular method.6 min·Random ForestsBagged decision trees with random feature subsets -- robust, parallelizable, and hard to overfit with more trees.5 min·Stacking and BlendingTraining a meta-learner on base model predictions -- combining diverse model families for competition-winning performance.7 min·XGBoost, LightGBM, and CatBoostIndustrial-strength gradient boosting implementations with regularization, histogram binning, and GPU acceleration.7 min
Unsupervised Learning
·Anomaly DetectionIdentifying data points that deviate significantly from the norm -- isolation forests, autoencoders, and statistical approaches.7 min·Association RulesDiscovering frequent itemsets and co-occurrence patterns in transactional data -- the Apriori algorithm and market basket analysis.7 min·DBSCANDiscovering arbitrarily-shaped clusters based on point density -- no need to specify K, naturally identifies outliers.8 min·Gaussian Mixture ModelsSoft clustering via a weighted sum of Gaussians fitted with EM -- probabilistic assignment captures cluster uncertainty.7 min·Hierarchical ClusteringBuilding a tree of nested clusters via agglomerative merging or divisive splitting -- revealing multi-scale data structure.6 min·K-Means ClusteringPartitioning data into K groups by iteratively assigning points to nearest centroids -- simple, fast, and surprisingly effective.7 min·Principal Component AnalysisProjecting data onto orthogonal directions of maximum variance -- the foundational dimensionality reduction technique.8 min·t-SNE and UMAPNonlinear dimensionality reduction for visualization -- preserving local neighborhood structure in 2D/3D plots.8 min
Neural Network Foundations
·Activation FunctionsNonlinear transforms between layers -- ReLU, sigmoid, tanh, and why the choice matters for gradient flow and expressivity.5 min·BackpropagationComputing gradients layer by layer via the chain rule -- the algorithm that makes deep learning computationally feasible.5 min·Batch NormalizationNormalizing layer inputs within each mini-batch -- stabilizing training, enabling higher learning rates, and acting as regularization.5 min·Dropout and RegularizationRandomly zeroing activations during training -- an implicit ensemble that prevents co-adaptation of neurons.6 min·OptimizersSGD, momentum, RMSProp, Adam, and AdamW -- adaptive methods that navigate loss landscapes faster than vanilla gradient descent.5 min·Perceptrons and Multilayer NetworksFrom single linear classifiers to universal function approximators -- stacking layers creates representational power.5 min·Universal Approximation TheoremA single hidden layer with enough neurons can approximate any continuous function -- but finding those weights is the hard part.7 min·Weight InitializationXavier, He, and orthogonal initialization -- breaking symmetry and controlling signal magnitude at the start of training.5 min
Probabilistic Methods
·Bayesian InferenceUpdating beliefs with evidence via Bayes' theorem -- treating parameters as distributions rather than fixed values.6 min·Expectation-MaximizationIteratively inferring latent variables (E-step) and optimizing parameters (M-step) -- the workhorse for incomplete data.6 min·Gaussian ProcessesNonparametric Bayesian regression defining distributions over functions -- elegant uncertainty quantification with $O(n^3)$ cost.6 min·Graphical ModelsBayesian networks and Markov random fields representing conditional dependencies as graphs -- structured probabilistic reasoning.7 min·Markov Chain Monte CarloSampling from complex posterior distributions by constructing Markov chains -- when exact inference is intractable.7 min·Variational InferenceApproximating intractable posteriors by optimization rather than sampling -- trading exactness for scalability.7 min
Model Selection And Evaluation
·CalibrationWhen a model says "80% confidence" it should be right 80% of the time -- reliability diagrams, Platt scaling, and isotonic regression.6 min·Classification MetricsAccuracy, precision, recall, F1, AUC-ROC, and AUC-PR -- choosing the right metric depends on what errors cost.5 min·Cross-ValidationK-fold, stratified, and leave-one-out validation -- maximizing use of limited data for both training and evaluation.6 min·Hyperparameter TuningGrid search, random search, and Bayesian optimization -- finding optimal settings without overfitting to the validation set.5 min·Learning CurvesPlotting performance vs. training set size or training iterations -- diagnosing whether you need more data, more capacity, or more regularization.6 min·Model ComparisonPaired t-tests, McNemar's test, and Wilcoxon signed-rank -- determining if performance differences are real or noise.6 min·Regression MetricsMSE, RMSE, MAE, MAPE, and R-squared -- each captures different aspects of prediction quality.5 min
Feature Engineering
·Automated Feature EngineeringAutoML, Featuretools, and neural feature learning -- when manual engineering doesn't scale.7 min·Feature Extraction and TransformationCreating new informative features from raw data through domain knowledge, mathematical transforms, and automated methods.7 min·Feature Selection MethodsFilter, wrapper, and embedded approaches for identifying the most informative features -- removing noise to improve generalization.7 min·Handling High-Cardinality FeaturesTarget encoding, hashing, and embedding approaches for categorical features with thousands of unique values.7 min·Time-Series Feature EngineeringLags, rolling statistics, seasonality decomposition, and calendar features -- encoding temporal patterns for ML models.6 min
Ml Systems And Production
·A/B Testing for MLComparing model versions in production with statistical rigor -- offline metrics don't always predict online impact.6 min·Data Drift and Model MonitoringDetecting when production data diverges from training data -- models degrade silently without monitoring.5 min·Experiment TrackingLogging parameters, metrics, artifacts, and code versions -- reproducing results and navigating the experiment space systematically.5 min·ML PipelinesChaining data processing, feature engineering, and model training into reproducible, deployable workflows.5 min·Model Deployment and ServingBatch vs. real-time inference, containerization, model registries, and the infrastructure of production ML.6 min·Responsible AI and FairnessMeasuring and mitigating bias, ensuring transparency, and building ML systems that are accountable and equitable.7 min