Blog

Data Engineering on Google Cloud Platform, Chicago

Data Engineering on Google Cloud Platform, Chicago
This four-day instructor-led class provides participants a hands-on introduction to designing and building data processing systems on Google Cloud Platform. Through a combination of presentations, demos, and hand-on labs, participants will learn how to design data processing systems, build end-to-end data pipelines, analyze data and carry out machine learning. The course covers structured, unstructured, and streaming data. Objectives This course teaches participants the following skills: Design and build data processing systems on Google Cloud Platform Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow Derive business insights from extremely large datasets using Google BigQuery Train, evaluate and predict using machine learning models using Tensorflow and Cloud ML Leverage unstructured data using Spark and ML APIs on Cloud Dataproc Enable instant insights from streaming data Prerequisites To get the most of out of this course, participants should have: Completed Google Cloud Fundamentals- Big Data and Machine Learning course OR have equivalent experience Basic proficiency with common query language such as SQL Experience with data modeling, extract, transform, load activities Developing applications using a common programming language such Python Familiarity with Machine Learning and/or statistics Audience This class is intended for experienced developers who are responsible for managing big data transformations including: Extracting, Loading, Transforming, cleaning, and validating data Designing pipelines and architectures for data processing Creating and maintaining machine learning and statistical models Querying datasets, visualizing query results and creating reports Course Outline   Module 1: Serverless data analysis with BigQuery What is BigQuery Advanced Capabilities Performance and pricing Lab: Queries and Functions Lab: Load and Export data Module 2: Serverless, autoscaling data pipelines with Dataflow Introduction to Dataflow and capabilities Lab: Data pipeline Lab: MapReduce in Dataflow Lab: Side inputs Lab: Streaming Module 3: Getting started with Machine Learning What is machine learning (ML) Effective ML: concepts, types Evaluating ML ML datasets: generalization Lab: Explore and create ML datasets Module 4: Building ML models with Tensorflow Getting started with TensorFlow Lab: Using tf.learn TensorFlow graphs and loops + lab Lab: Using low-level TensorFlow + early stopping Monitoring ML training Lab: Charts and graphs of TensorFlow training Module 5: Scaling ML models with CloudML Why Cloud ML? Packaging up a TensorFlow model End-to-end training Lab: Run a ML model locally and on cloud Module 6: Feature Engineering Creating good features Transforming inputs Synthetic features Preprocessing with Cloud ML Lab: Feature engineering Module 7: ML architectures Wide and deep Image analysis Lab: Custom image classification with transfer learning Embeddings and sequences Recommendation systems Module 8: Google Cloud Dataproc Overview Introducing Google Cloud Dataproc Creating and managing clusters Defining master and worker nodes Leveraging custom machine types and preemptible worker nodes Creating clusters with the Web Console Scripting clusters with the CLI Using the Dataproc REST API Dataproc pricing Scaling and deleting Clusters Lab: Creating Hadoop Clusters with Google Cloud Dataproc Module 9: Running Dataproc Jobs Controlling application versions Submitting jobs Accessing HDFS and GCS Hadoop Spark and PySpark Pig and Hive Logging and monitoring jobs Accessing onto master and worker nodes with SSH Working with PySpark REPL (command-line interpreter) Lab: Running Hadoop and Spark Jobs with Dataproc Module 10: Integrating Dataproc with Google Cloud Platform Initialization actions Programming Jupyter/Datalab notebooks Accessing Google Cloud Storage Leveraging relational data with Google Cloud SQL Reading and writing streaming Data with Google BigTable Querying Data from Google BigQuery Making Google API Calls from notebooks Lab: Big Data Analysis with Dataproc Module 11: Making Sense of Unstructured Data with Google’s Machine Learning APIs Google’s Machine Learning APIs Common ML Use Cases Vision API Natural Language API Translate Speech API Lab: Adding Machine Learning Capabilities to Big Data Analysis Module 12: Need for real-time streaming analytics What is Streaming Analytics? Use-cases Batch vs Streaming (Real-time) Related terminologies GCP products that help build for high availability, resiliency, high-throughput, real-timestreaming analytics (review of Pub/Sub and Dataflow) Lab: Setup project, enable APIs, setup storage Module 13: Architecture of streaming pipelines Streaming architectures and considerations Choosing the right components Lab: Explore the dataset Windowing Streaming aggregation Events, triggers Lab: Create architecture reference Module 14: Stream data and events into PubSub Topics and Subscriptions Publishing events into Pub/Sub Lab: Streaming data ingest into PubSub Subscribing options: Push vs Pull Alerts Module 15: Build a stream processing pipeline Pipelines, PCollections and Transforms Windows, Events, and Triggers Aggregation statistics Streaming analytics with BigQuery Low-volume alerts Lab: alerting scenario for anomalies Module 16: High throughput and low-latency with Bigtable Latency considerations Lab: create streaming data processing pipelines with Dataflow What is Bigtable Designing row keys Performance considerations Lab: high-volume event processing Module 17: High throughput and low-latency with Bigtable What is Google Data Studio? From data to decisions Lab: build a real-time dashboard to visualize processed data

at Chicago, Illinois, United States
Chicago, Illinois, United States
Chicago, United States

no comment

Leave a Reply