🚨 End of Year Deal 🚨 Purchase a course through an individual or team subscription and get $200 towards your next course | Expiring Jan 1 2023
📣 This is Ronny Kohavi's last class on Sphere. Make sure to join while you have the chance!

Designed to help make organizations

Data Driven

Product Leaders

Program managers and product managers that are focused on metrics like growth and revenue, and prioritization decisions

Data Scientists

Data Scientists and Data Science managers who help map strategic decisions to actionable experimental designs and then interpret the results in a trustworthy manner

The motivation and basics of A/B testing (e.g., causality, surprising examples, metrics, interpreting results, trust and pitfalls, Twyman’s law, A/A tests)
Cultural challenges, humbling results (e.g., failing often, pivoting, iterating), experimentation platform, institutional memory and meta-analysis, ethics
Hierarchy of evidence, Expected Value of Information (EVI), complementary techniques, risks in observational causal studies

Engineering Leaders

Engineering managers, directors, VPs, and CTOs who want to make their organizations data-driven with metrics and A/B tests

Safe deployments
Triggering, especially in evaluating machine learning models
The benefits of agile product development

Designed for ML and Data Engineers

who want to architect the data infrastructure required to support scalable machine learning in production environments

You will:
  • Build and monitor production services for capturing high-quality data for model training and for serving data computed in your data warehouse
  • Design batch data pipelines for training models that integrate diverse data sources, avoid data leakage, and run on-time and on-budget
  • Learn how to make the leap from batch to streaming pipelines to support real-time model features, model evaluation, and even model training
You should:
  • Be proficient at creating data pipelines in SQL/Python using a cloud data warehouse (Snowflake/Databricks/BigQuery)
  • Be comfortable with building simple web APIs and working with key-value stores like Redis or DynamoDB
  • Be familiar with core database join strategies such as hash joins and sort-merge joins

About Josh

Live Cohort

Over the past decade, leading technology companies have developed and deployed hundreds of machine learning models to help optimize their products. While there is ample material available about training individual models on a static data set, far less information is available about the data engineering best practices required to support production models. During this live course, we will dive into these data engineering best practices and the value they can bring to your ML systems. Specifically, we will answer:

  • How do you design the data collection systems needed to support robust and reliable machine learning models?
  • How do you scale from managing features for a single production machine learning model to many?
  • How do you track the performance of your data pipelines to balance reliability and cost?
  • When and how do you transition from batch to streaming data pipelines for model evaluation?

In traditional software development, CI/CD automates many tasks, including testing, building and deploying software. But CI/CD for ML is a different beast. Testing and deployment of ML can be triggered by many event types, and observability and logging requirements are materially different for ML.Today, no single tool can facilitate end-to-end CI/CD for ML. The process of testing, building and deploying ML requires a symphony of tools and glue code to create an integrated CI/CD system. To offer an entry point that many data scientists and engineers are familiar with, we’ll teach you how to integrate GitHub with other ML tools to build custom CI/CD automations for ML that will increase your engineering efficiency and prevent errors from being released to production.

Session 1: Inputs and Outputs: The Data Warehouse as a Production Service

Tuesday, November 8, 2022
1-3 pm PST
  • Create the flexible and evolvable data ingestion systems required to support streaming data pipelines and ML use cases.
  • Analyze the tradeoffs between row-oriented and column-oriented data formats for use during data ingestion, analysis, model training, and model serving.
  • Solve a broad class of common ML problems by building tools for moving large datasets from your data warehouse into a low-latency serving system in your production environment.
Live with

Session 2: Training Machine Learning Models: The Data Engineering Perspective

Thursday, November 10, 2022
1-3 pm PST
  • Compose data from multiple sources and time scales into coherent datasets that are designed to avoid the most common sources of error in model training.
  • Evolve your data models beyond supporting a single ML use case into a shared knowledge resource that lets your company bring machine learning everywhere it is needed.
  • Create a data platform for feature evaluation and model training that enables data scientists and ML researchers to easily trade off speed, flexibility, and compute costs.
Live with

Session 3: Data Quality and Monitoring in the Data Warehouse and Production

Tuesday, November 15, 2022
1-3 pm PST
  • Create tools for linking data profiling and quality checks from model training into your production model deployments.
  • Understand the benefits and the limitations of using standard application performance monitoring (APM) tools for data and ML monitoring problems.
  • Balance the need for comprehensive and thorough data quality checks with the cost and performance overhead required to perform those checks in both the data warehouse and the production environment.
Live with

Session 4: From Batch to Streaming: Experiments and Contextual Bandits

Thursday, November 17, 2022
1-3 pm PST
  • Understand the unique constraints and opportunities for evaluating ML models in an online serving environment beyond normal A/B testing.
  • Design streaming data pipelines for performing rapid evaluation of models for recommendations, ranking, and classification problems.
  • Create the data infrastructure required to support reinforcement learning and contextual bandits in order to support ML models that can learn in real time.
Live with
Interested in sending your

Team?

Sphere offers a range of subscription packages that provide discounts on all courses in our library. We help upskill employees at some of the world’s best companies. Learn more about pricing options here or book a time to talk to one of our staff below.

Book a free consultationCheck Expense Approval At Your Company

Learn live from a world-class

Instructor

Learn live from a world-class

Instructor

Josh Wills has built and led data engineering and data science teams at Slack, Cloudera, and Google. As an individual contributor, he was the technical lead for Slack’s search indexing pipeline and Google’s ad auction and experimentation library. Josh has also consulted on data pipeline design and machine learning systems at companies like Spotify, Airtable, Apple, and Capital One. He is the co-author of Advanced Analytics with Apache Spark and has given numerous popular talks and lectures about the practice of data science and engineering over the past decade.

Learn live from world-class

Instructors

Josh Wills has built and led data engineering and data science teams at Slack, Cloudera, and Google. As an individual contributor, he was the technical lead for Slack’s search indexing pipeline and Google’s ad auction and experimentation library. Josh has also consulted on data pipeline design and machine learning systems at companies like Spotify, Airtable, Apple, and Capital One. He is the co-author of Advanced Analytics with Apache Spark and has given numerous popular talks and lectures about the practice of data science and engineering over the past decade.

Guest Lectures by

Industry Experts

Jerry Talton
CTO @ Carta

”In my two decades of experience building large-scale data systems in industry and academia, Josh stands out as a singular mentor, practitioner, and teacher. There are few more qualified to impart the benefits of foundational data engineering practices.”

Chip Huyen
Co-Founder @ Claypot.ai

”Josh has an incredible range of experience building data systems at companies of various scales. On top of that, he is a fantastic speaker. I've learned so much from him over the years -- he has an engaging way of explaining difficult concepts!”

Eric Sammer
CEO @ Decodable

”Josh is truly a rare breed with real world experience in applied mathematics, data science, ML, and data infrastructure. His knowledge really gives him a wildly unfair advantage when building data-driven products and systems. He's also one of the best presenters and mentors you'll find.”

Join a diverse and experienced

Community

This cohort gives you access to a rich community of like-minded professionals from some of the best businesses in the world. Even after the course ends, you will continue to learn and build with each other.

Exclusive Content

to advance your business

Get access to exclusive content through live sessions, meetups and our Student Portal (even after you finish the cohort). Ask questions and get personal feedback directly from your instructors and others taking the course.

Still have questions?

We’re here to help!

Do I have to attend all of the sessions live in real-time?

You don’t! We record every live session in the cohort and make each recording and the session slides available on our portal for you to access anytime.

Will I receive a certificate upon completion?

Each learner receives a certificate of completion, which is sent to you upon completion of the cohort (along with access to our Alumni portal!). Additionally, Sphere is listed as a school on LinkedIn so you can display your certificate in the Education section of your profile.

Is there homework?

Throughout the cohort, there may be take-home questions that pertain to subsequent sessions. These are optional, but allow you to engage more with the instructor and other cohort members!

Can I get the course fee reimbursed by my company?

While we cannot guarantee that your company will cover the cost of the cohort, we are accredited by the Continuing Professional Development (CPD) Standards Office, meaning many of our learners are able to expense the course via their company or team’s L&D budget. We even provide an email template you can use to request approval.

I have more questions, how can I get in touch?

Please reach out to us via our Contact Form with any questions. We’re here to help!

Book a time to talk with the Sphere team

Join us for a next generation

Learning Experience