The Complete Data Scientist
Data Engineering for Machine Learning
Inputs and Outputs: The Data Warehouse as a Production Service
Create the flexible and evolvable data ingestion systems required to support streaming data pipelines and ML use cases.
Analyze the tradeoffs between row-oriented and column-oriented data formats for use during data ingestion, analysis, model training, and model serving.
Solve a broad class of common ML problems by building tools for moving large datasets from your data warehouse into a low-latency serving system in your production environment.
Training Machine Learning Models: The Data Engineering Perspective
Compose data from multiple sources and time scales into coherent datasets that are designed to avoid the most common sources of error in model training.
Evolve your data models beyond supporting a single ML use case into a shared knowledge resource that lets your company bring machine learning everywhere it is needed.
Create a data platform for feature evaluation and model training that enables data scientists and ML researchers to easily trade off speed, flexibility, and compute costs.
Data Quality and Monitoring in the Data Warehouse and Production
Create tools for linking data profiling and quality checks from model training into your production model deployments.
Understand the benefits and the limitations of using standard application performance monitoring (APM) tools for data and ML monitoring problems.
Balance the need for comprehensive and thorough data quality checks with the cost and performance overhead required to perform those checks in both the data warehouse and the production environment.
From Batch to Streaming: Experiments and Contextual Bandits
Understand the unique constraints and opportunities for evaluating ML models in an online serving environment beyond normal A/B testing.
Design streaming data pipelines for performing rapid evaluation of models for recommendations, ranking, and classification problems.
Create the data infrastructure required to support reinforcement learning and contextual bandits in order to support ML models that can learn in real time.
Applied Causal Inference
An Introduction to Causal Inference
Define causal inference and distinguish it from machine learning predictions
Recognize the key challenges to doing causal inference with observational databy exploring a case study on the impact of Amazon Prime on customer spending
Implement simple approaches for causal inference with linear regression
Improving Our Causal Estimates by Comparing “Similar” Users
Use matching-based estimators to compare similar users who received different “treatments” (e.g., one Amazon Prime member compared to a non-member with similar characteristics)
Estimate propensity scores and use them for causal inference
Distinguish between situations where machine learning models may be able to generate reliable causal estimates and when they may not
Taking Advantage of Quasi-Experiments
Use event studies and difference-in-differences to estimate the long run impact of a late delivery
Combine propensity scores and outcome models to make “doubly robust” estimates
Reuse old A/B tests to make new causal estimates by using instrumental variables
How Effective were my Facebook Ads? A Sobering Tale of the Limitations of Causal Inference
Explain the challenges of accurately measuring the true causal impact of advertising spending
Use double/debiased machine learning to estimate causal effects and drive incremental sales
Understand the limitations of causal inference approaches by evaluating a situation where causal inference approaches failed (even with Facebook-scale data)
Heterogeneous Treatment Effects: Going Beyond Averages
Contrast approaches for estimating user-specific causal effects with estimating the population average effect by examining a case study about targeting advertising to drive incremental sales
Apply ML-based techniques for HTE: s-, t- and x- learner
Distinguish causal effect estimation from causal decision making
Modern Forecasting in Practice
Should your business problem be solved with forecasting?
Understand which and how business processes can be optimized by incorporating (probabilistic) predictions of future outcomes
Differentiate strategic from operational forecasting problems with examples from Zalando and Amazon
Measure and compare the accuracy of different forecasts
Forecasting solutions using a small set of time series
Case Study: Forecasting top-level energy demand and prices
Understand the underlying business problem and the challenges of the resulting data-constrained forecasting problem
Identify the effects and structural components that make up the data, such as trend(s), seasonality, exogenous shocks, and noise
Identify the appropriate method and tool, such as linear regression, ETS, and ARIMAX
Forecasting solutions with a large set of time series
Case Study: Retail demand forecasting
Build an intuition for the data via visualization of individual time series and aggregate summaries
Obtain co-variates/features and process them
Use and tune global ML-powered methods such as Gradient Boosted Trees and Neural Network-based methods like DeepAR
Forecasting solutions with dependency structures
Case Study: Forecasting with causal inputs
Forecast demand subject to price changes for millions of products
Build what-if analysis using simple and advanced approaches
Evaluate & improve forecasting in counterfactual situations
What best practices help you avoid common pitfalls in production?
Practical tactics for forecasting exemplified by labor planning
Productionize forecasting models including retraining schemes
Handle missing data and the associated perils
Research approaches to outliers/extreme events such as blizzards and pandemics