Unit 1 Fundamentals, Course 1: Introduction to Data Science

Similar documents
Python Machine Learning

STA 225: Introductory Statistics (CT)

M55205-Mastering Microsoft Project 2016

Probability and Statistics Curriculum Pacing Guide

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

On-Line Data Analytics

OFFICE SUPPORT SPECIALIST Technical Diploma

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

CS Machine Learning

36TITE 140. Course Description:

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Learning Microsoft Office Excel

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

EdX Learner s Guide. Release

Lecture 1: Machine Learning Basics

Radius STEM Readiness TM

Android App Development for Beginners

Measurement & Analysis in the Real World

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Top US Tech Talent for the Top China Tech Company

Platform for the Development of Accessible Vocational Training

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Visit us at:

School of Innovative Technologies and Engineering

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

CENTRAL MAINE COMMUNITY COLLEGE Introduction to Computer Applications BCA ; FALL 2011

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Strengthening assessment integrity of online exams through remote invigilation

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Assignment 1: Predicting Amazon Review Ratings

STUDENT MOODLE ORIENTATION

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Learning From the Past with Experiment Databases

Education the telstra BLuEPRint

Research computing Results

K5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc.

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

SAMPLE SYLLABUS. Master of Health Care Administration Academic Center 3rd Floor Des Moines, Iowa 50312

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Intermediate Computable General Equilibrium (CGE) Modelling: Online Single Country Course

The Enterprise Knowledge Portal: The Concept

COURSE LISTING. Courses Listed. Training for Cloud with SAP SuccessFactors in Integration. 23 November 2017 (08:13 GMT) Beginner.

Ryerson University Sociology SOC 483: Advanced Research and Statistics

EGRHS Course Fair. Science & Math AP & IB Courses

Unit 7 Data analysis and design

Mathematics. Mathematics

Physics 270: Experimental Physics

Mathematics subject curriculum

Math 96: Intermediate Algebra in Context

Please find below a summary of why we feel Blackboard remains the best long term solution for the Lowell campus:

An Industrial Technologist s Core Knowledge: Web-based Strategy for Defining Our Discipline

Mathematics Program Assessment Plan

SURVIVING ON MARS WITH GEOGEBRA

Green Belt Curriculum (This workshop can also be conducted on-site, subject to price change and number of participants)

GACE Computer Science Assessment Test at a Glance

Software Maintenance

SCT Banner Financial Aid Needs Analysis Training Workbook January 2005 Release 7

Measurement. When Smaller Is Better. Activity:

An Introduction to Simio for Beginners

CSL465/603 - Machine Learning

Development of an IT Curriculum. Dr. Jochen Koubek Humboldt-Universität zu Berlin Technische Universität Berlin 2008

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Office Hours: Mon & Fri 10:00-12:00. Course Description

Enhancing Customer Service through Learning Technology

Modeling user preferences and norms in context-aware systems

Lecture 15: Test Procedure in Engineering Design

The Moodle and joule 2 Teacher Toolkit

ITM2500 Spreadsheet & Database Productivity. Spreadsheet & Database Productivity

Education & Training Plan Civil Litigation Specialist Certificate Program with Externship

Rule Learning With Negation: Issues Regarding Effectiveness

Tour. English Discoveries Online

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

Ministry of Education and Science of Kazakhstan. Karaganda State Technical University

Coding II: Server side web development, databases and analytics ACAD 276 (4 Units)

Mathematics process categories

Statewide Framework Document for:

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

If we want to measure the amount of cereal inside the box, what tool would we use: string, square tiles, or cubes?

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

Probability and Game Theory Course Syllabus

November 17, 2017 ARIZONA STATE UNIVERSITY. ADDENDUM 3 RFP Digital Integrated Enrollment Support for Students

Laboratorio di Intelligenza Artificiale e Robotica

Pre-AP Geometry Course Syllabus Page 1

EDIT 576 (2 credits) Mobile Learning and Applications Fall Semester 2015 August 31 October 18, 2015 Fully Online Course

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Learning Microsoft Publisher , (Weixel et al)

Earthsoft s EQuIS Database Lower Duwamish Waterway Source Data Management

Science Olympiad Competition Model This! Event Guidelines

Axiom 2013 Team Description Paper

STABILISATION AND PROCESS IMPROVEMENT IN NAB

MGT/MGP/MGB 261: Investment Analysis

Specification of the Verity Learning Companion and Self-Assessment Tool

PeopleSoft Human Capital Management 9.2 (through Update Image 23) Hardware and Software Requirements

Diploma in Library and Information Science (Part-Time) - SH220

Transcription:

Unit 1 Fundamentals, Course 1: Introduction to Data Science Learn what it takes to become a data scientist. This is the first stop in the Data Science curriculum from Microsoft. It will help you get started with the program, plan your learning schedule, and connect with fellow students and teaching assistants. Along the way, you ll get an introduction to working with and exploring data using a variety of visualization, analytical, and statistical techniques. How the Microsoft Data Science curriculum works How to navigate the curriculum and plan your course schedule Basic data exploration and visualization techniques in Microsoft Excel Foundational statistics that can be used to analyze data Duration: 2 weeks Total effort: 12 24 hours Level: Introductory Prerequisite knowledge: none 1

Unit 1 Fundamentals, Course 2: Querying Data with Transact-SQL From querying and modifying data in SQL Server or Azure SQL to programming with Transact-SQL, learn essential skills that employers need. Transact-SQL is an essential skill for data professionals and developers working with SQL databases. With this combination of expert instruction, demonstrations, and practical labs, step from your first SELECT statement through to implementing transactional programmatic logic. Work through multiple modules, each of which explore a key area of the Transact- SQL language, with a focus on querying and modifying data in Microsoft SQL Server or Azure SQL Database. The labs in this course use a sample database that can be deployed easily in Azure SQL Database, so you get hands-on experience with Transact-SQL without installing or configuring a database server. Create Transact-SQL SELECT queries Work with data types and NULL Query multiple tables with JOIN Explore set operators Use functions and aggregate data Work with subqueries and APPLY Use table expressions Group sets and pivot data Modify data Program with Transact-SQL Implement error handling and transactions Duration: 3 weeks Total effort: 24 30 hours Level: Intermediate Prerequisite knowledge: Basic understanding of databases and IT systems. 2

Unit 1 Fundamentals, Course 3a: Analyzing and Visualizing Data with Excel Excel is one of the most widely used solutions for analyzing and visualizing data. It now includes tools that enable the analysis of more data, with improved visualizations and more sophisticated business logics. In this data science course, you will get an introduction to the latest versions of these new tools in Excel 2016 from an expert on the Excel Product Team at Microsoft. Learn how to import data from different sources, create mashups between data sources, and prepare data for analysis. After preparing the data, find out how business calculations can be expressed using the DAX calculation engine. See how the data can be visualized and shared to the Power BI cloud service, after which it can be used in dashboards, queried using plain English sentences, and even consumed on mobile devices. Do you feel that the contents of this course is a bit too advanced for you and you need to fill some gaps in your Excel knowledge? Do you need a better understanding of how pivot tables, pivot charts and slicers work together, and help in creating dashboards? If so, check out DAT205x: Introduction to Data Analysis using Excel. Gather and transform data from multiple sources Discover and combine data in mashups Learn about data model creation Explore, analyze, and visualize dana Duration: 2 weeks Total effort: 12 24 hours Level: Intermediate Prerequisite knowledge: Understanding of Excel analytic tools such as tables, pivot tables and pivot charts. Also, some experience in working with data from databases and also from text files will be helpful. System Requirements 1) Windows operating system: Windows 7 or later. 2) Microsoft Excel on Windows operating system: a) Microsoft Excel 2016 Professional Plus or standalone edition b) Microsoft Excel 2013 Professional Plus or standalone edition c) Microsoft Excel 2010 d) Other versions of Microsoft Excel are not supported Syllabus Week 1 Setup the lab environment by installing Office applications. Learn how to perform data analysis in Excel using classic tools, such as pivot tables, pivot charts, and slicers, on data that is already in a worksheet / grid data. Explore an Excel data 3

model, its content, and its structure, using the Power Pivot add-in. Create your first DAX expressions for calculated columns and measures. Learn about queries (Power Query add-in in Excel 2013 and Excel 2010), and build an Excel data model from a single flat table. Learn how to import multiple tables from a SQL database, and create an Excel data model from the imported data. Create a mash-up between data from text-files and data from a SQL database. Week 2 Get the details on how to create measures to calculate for each cell, filter context for calculation, and explore several advanced DAX functions. Find out how to use advanced text query to import data from a formatted Excel report. Perform queries beyond the standard user interface. Explore ways to create stunning visualizations in Excel. Use the cube functions to perform year-over-year comparisons. Create timelines, hierarchies, and slicers to enhance your visualizations. Learn how Excel can work together with Power BI. Upload an Excel workbook to the Power BI service. Explore the use of Excel on the mobile platform. 4

Unit 1 Fundamentals, Course 3b: Analyzing and Visualizing Data with Power BI Learn Power BI, a powerful cloud-based service that helps data scientists visualize and share insights from their data. Power BI is quickly gaining popularity among professionals in data science as a cloud-based service that helps them easily visualize and share insights from their organizations data. In this data science course, you will learn from the Power BI product team at Microsoft with a series of short, lecture-based videos, complete with demos, quizzes, and hands-on labs. You ll walk through Power BI, end to end, starting from how to connect to and import your data, author reports using Power BI Desktop, and publish those reports to the Power BI service. Plus, learn to create dashboards and share with business users on the web and on mobile devices. Connect, import, shape, and transform data for business intelligence (BI) Visualize data, author reports, and schedule automated refresh of your reports Create and share dashboards based on reports in Power BI desktop and Excel Use natural language queries Create real-time dashboards Duration: 2 weeks Total effort: 12 24 hours Level: Introductory Prerequisite knowledge: Some experience in working with data from Excel, databases, or text files. Syllabus Week 1 Understanding key concepts in business intelligence, data analysis, and data visualization Importing your data and automatically creating dashboards from services such as Marketo, Salesforce, and Google Analytics Connecting to and importing your data, then shaping and transforming that data Enriching your data with business calculations Visualizing your data and authoring reports Scheduling automated refresh of your reports Creating dashboards based on reports and natural language queries Sharing dashboards across your organization Consuming dashboards in mobile apps 5

Week 2 Leveraging your Excel reports within Power BI Creating custom visualizations that you can use in dashboards and reports Collaborating within groups to author reports and dashboards Sharing dashboards effectively based on your organization s needs Exploring live connections to data with Power BI Connecting directly to SQL Azure, HD Spark, and SQL Server Analysis Services Introduction to Power BI Development API Leveraging custom visuals in Power BI 6

Unit 1 Fundamentals, Course 4: Essential Statistics for Data Analysis using Excel Gain a solid understanding of statistics and basic probability, using Excel, and build on your data analysis and data science foundation. If you re considering a career as a data analyst, you need to know about histograms, Pareto charts, Boxplots, Bayes theorem, and much more. In this applied statistics course, the second in our Microsoft Excel Data Analyst XSeries, use the powerful tools built into Excel, and explore the core principles of statistics and basic probability from both the conceptual and applied perspectives. Learn about descriptive statistics, basic probability, random variables, sampling and confidence intervals, and hypothesis testing. And see how to apply these concepts and principles using the environment, functions, and visualizations of Excel. As a data science pro, the ability to analyze data helps you to make better decisions, and a solid foundation in statistics and basic probability helps you to better understand your data. Using real-world concepts applicable to many industries, including medical, business, sports, insurance, and much more, learn from leading experts why Excel is one of the top tools for data analysis and how its built-in features make Excel a great way to learn essential skills. Before taking this course, you should be familiar with organizing and summarizing data using Excel analytic tools, such as tables, pivot tables, and pivot charts. You should also be comfortable (or willing to try) creating complex formulas and visualizations. Want to start with the basics? Check out DAT205x: Introduction to Data Analysis using Excel. As you learn these concepts and get more experience with this powerful tool that can be extremely helpful in your journey as a data analyst or data scientist, you may want to also take the third course in our series, DAT206x Analyzing and Visualizing Data with Excel. This course includes excerpts from Microsoft Excel 2016: Data Analysis and Business Modeling from Microsoft Press and authored by course instructor Wayne Winston. Descriptive statistics Basic probability Random variables Sampling and confidence intervals Hypothesis testing Duration: 2 weeks Total effort: 12 24 hours Level: Intermediate Prerequisite knowledge: Secondary school (high school) algebra. Ability to work with tables, formulas, and charts in Excel. Ability to organize and summarize data using Excel analytic tools such as tables, pivot tables, and pivot charts. 7

System Requirements Excel 2016 is required for the full course experience. Excel 2013 will work but will not support all the visualizations and functions. Syllabus Module 1: Descriptive Statistics You will learn how to describe data using charts and basic statistical measures. Full use will be made of the new histograms, Pareto charts, Boxplots, and Treemap and Sunburst charts in Excel 2016. Module 2: Basic Probability You will learn basic probability including the law of complements, independent events, conditional probability and Bayes Theorem. Module 3: Random Variables You will learn how to find the mean and variance of random variables and then learn about the binomial, Poisson, and Normal random variables. We close with a discussion of the beautiful and important Central Limit Theorem. Module 4: Sampling and Confidence Intervals You will learn the mechanics of sampling, point estimation, and interval estimation of population parameters. Module 5: Hypothesis Testing You will learn null and alternative hypotheses, Type I and Type II error, One sample tests for means and proportions, Tests for difference between means of two populations, and the Chi Square Test for Independence. 8

Unit 2 - Core Data Science, Course 5a: Introduction to R for Data Science Learn the R statistical programming language, the lingua franca of data science in this hands-on course. R is rapidly becoming the leading language in data science and statistics. Today, R is the tool of choice for data science professionals in every industry and field. Whether you are full-time number cruncher, or just the occasional data analyst, R will suit your needs. This introduction to R programming course will help you master the basics of R. In seven sections, you will cover its basic syntax, making you ready to undertake your own first data analysis using R. Starting from variables and basic operations, you will eventually learn how to handle data structures such as vectors, matrices, data frames and lists. In the final section, you will dive deeper into the graphical capabilities of R, and create your own stunning data visualizations. No prior knowledge in programming or data science is required. What makes this course unique is that you will continuously practice your newly acquired skills through interactive in-browser coding challenges using the DataCamp platform. Instead of passively watching videos, you will solve real data problems while receiving instant and personalized feedback that guides you to the correct solution. Introductory R language fundamentals and basic syntax What R is and how it s used to perform data analysis Become familiar with the major R data structures Create your own visualizations using R Duration: 2 weeks Total effort: 12 24 hours Level: Introductory Prerequisite knowledge: none, but previous experience in basic mathematics is helpful. Syllabus Module 1: Introduction to Basics Take your first steps with R. Discover the basic data types in R and assign your first variable. Module 2: Vectors Analyze gambling behaviour using vectors. Create, name and select elements from vectors. Module 3: Matrices 9

Learn how to work with matrices in R. Do basic computations with them and demonstrate your knowledge by analyzing the Star Wars box office figures. Module 4: Factors R stores categorical data in factors. Learn how to create, subset and compare categorical data. Module 5: Data Frames When working R, you ll probably deal with Data Frames all the time. Therefore, you need to know how to create one, select the most interesting parts of it, and order them. Module 6: Lists Lists allow you to store components of different types. Module 6 will show you how to deal with lists. Module 7: Basic Graphics Discover R s packages to do graphics and create your own data visualizations. 10

Unit 2 - Core Data Science, Course 5b: Introduction to Python for Data Science The ability to analyze data with Python is critical in data science. Learn the basics, and move on to create stunning visualizations. Python is a very powerful programming language used for many different applications. Over time, the huge community around this open source language has created quite a few tools to efficiently work with Python. In recent years, a number of tools have been built specifically for data science. As a result, analyzing data with Python has never been easier. In this practical course, you will start from the very beginning, with basic arithmetic and variables, and learn how to handle data structures, such as Python lists, Numpy arrays, and Pandas DataFrames. Along the way, you ll learn about Python functions and control flow. Plus, you ll look at the world of data visualizations with Python and create your own stunning visualizations based on real data. Explore Python language fundamentals, including basic syntax, variables, and types Create and manipulate regular Python lists Use functions and import packages Build Numpy arrays, and perform interesting calculations Create and customize plots on real data Supercharge your scripts with control flow, and get to know the Pandas DataFrame Duration: 2 weeks Total effort: 12 24 hours Level: Introductory Prerequisite knowledge: Some experience in working with data from Excel, databases, or text files. Syllabus Module 1: Python Basics Take your first steps in the world of Python. Discover the different data types and create your first variable. Module 2: Python Lists Get the know the first way to store many different data points under a single name. Create, subset and manipulate Lists in all sorts of ways. Module 3: Functions and Packages Learn how to get the most out of other people's efforts by importing Python packages and calling functions. Module 4: Numpy 11

Write superfast code with Numerical Python, a package to efficiently store and do calculations with huge amounts of data. Module 5: Matplotlib Create different types of visualizations depending on the message you want to convey. Learn how to build complex and customized plots based on real data. Module 6: Control flow and Pandas Write conditional constructs to tweak the execution of your scripts and get to know the Pandas DataFrame: the key data structure for Data Science in Python. 12

Unit 2 - Core Data Science, Course 6: Data Science Essentials Explore data visualization and exploration concepts with experts from MIT and Microsoft, and get an introduction to machine learning. Demand for data science talent is exploding. Develop your career as a data scientist, as you explore essential skills and principles with experts from Duke University and Microsoft. In this data science course, you will learn key concepts in data acquisition, preparation, exploration, and visualization taught alongside practical application oriented examples such as how to build a cloud data science solution using Microsoft Azure Machine Learning platform, or with R, and Python on Azure stack. Explore the data science process Probability and statistics in data science Data exploration and visualization Data ingestion, cleansing, and transformation Introduction to machine learning The hands-on elements of this course leverage a combination of R, Python, and Microsoft Azure Machine Learning Duration: 2 weeks Total effort: 18 24 hours Level: Intermediate Prerequisite knowledge: Familiarity with basic mathematics. Introductory level knowledge of either R or Python. Syllabus Explore the data science process An Introduction Understand data science thinking Know the data science process Use AML to create and publish a first machine learning experiment Lab: Creating your first model in Azure Machine Learning Probability and statistics in data science Understand and apply confidence intervals and hypothesis testing Understand the meaning and application of correlation Know how to apply simulation Lab: Working with probability and statistics Lab: Simulation and hypothesis testing Working with data Ingestion and preparation Know the basics of data ingestion and selection 13

Understand the importance and process for data cleaning, integration and transformation Lab: Data ingestion and selection new Lab: Data munging with Azure Machine Learning, R, and Python on Azure stack Data Exploration and Visualization Know how to create and interpret basic plot types Understand the process of exploring datasets Lab: Exploring data with visualization with Azure Machine Learning, R and Python Introduction to Supervised Machine Learning Understand the basic concepts of supervised learning Understand the basic concepts of unsupervised learning Create simple machine learning models in AML Lab: Classification of people by income Lab: Auto price prediction with regression Lab: K-means clustering with Azure Machine Learning 14

Unit 2 - Core Data Science, Course 7: Principles of Machine Learning Get hands-on experience building and deriving insights from machine learning models using R, Python, and Azure Machine Learning. Machine learning uses computers to run predictive models that learn from existing data in order to forecast future behaviors, outcomes, and trends. In this data science course, you will be given clear explanations of machine learning theory combined with practical scenarios and hands-on experience building, validating, and deploying machine learning models. You will learn how to build and derive insights from these models using R, Python, and Azure Machine Learning. Explore classification Regression in machine learning How to improve supervised models Details on non-linear modeling Clustering Recommender systems The hands-on elements of this course leverage a combination of R, Python, and Microsoft Azure Machine Learning Duration: 2 weeks Total effort: 18 24 hours Level: Intermediate Syllabus Module 1: Explore classification Understand the operation of classifiers Use logistic regression as a classifier Understand the metrics used to evaluate classifiers Lab: Classification with logistic regression taught using Azure Machine Learning Module 2: Regression in machine learning Understand the operation of regression models Use linear regression for prediction and forecasting Understand the metrics used to evaluate regression models Lab: Predicting bike demand with linear regression taught using Azure Machine Learning Module 3: How to improve supervised models Process for feature selection 15

Understand the problems of over-parameterization and the curse of dimensionality Use regularization on over-parameterized models Methods of dimensionality reduction Apply cross validation to estimating model performance Lab: Improving diabetes patient classification using Azure Machine Learning Lab: Improving bike demand forecasting using Azure Machine Learning Module 4: Details on non-linear modeling Understand how and when to use common supervised machine learning models Applying ML models to diabetes patient classification Applying ML models to bike demand forecasting Clustering Understand the principles of unsupervised learning models Correctly apply and evaluate k-means clustering models Correctly apply and evaluate hieratical clustering model Lab: Cluster models with AML, R and Python Module 5: Recommender systems Understand the operation of recommenders Understand how to evaluate recommenders Know how to use alternative to collaborative filtering for recommendations Lab: Creating and evaluating recommendations 16

Unit 3 - Applied Data Science, Course 8a: Programming with R for Data Science Learn the fundamentals of programming with R, from reading and writing data to customizing visualizations and performing predictive analysis. In this computer science course from Microsoft, developed in collaboration with the Technical University of Denmark (DTU), get the knowledge and skills you need to use R, the statistical programming language for data scientists, in the field of your choice. In this course you will learn all you need to get up to speed with programming in R. Explore R data structures and syntaxes, see how to read and write data from a local file to a cloud-hosted database, work with data, get summaries, and transform them to fit your needs. Plus, find out how to perform predictive analytics using R and how to create visualizations using the popular ggplot2 package. Explore R language fundamentals, including basic syntax, variables, and types How to create functions and use control flow. Details on reading and writing data in R Work with data in R Create and customize visualizations using ggplot2 Perform predictive analytics using R Duration: 5 weeks Total effort: 24 48 hours Level: Intermediate Prerequisite knowledge: Course Introduction to R for Data Science will help. Syllabus Module 1: Introduction Module 2: Functions Module 3: Control flow and Loops Module 4: Working with Vectors and Matrices Module 5: Reading in Dana Module 6: Writing Dana Module 7: Reading from SQL Server Module 8: Working with Dana Module 9: Manipulating Dana Module 10: Simulation Module 11: Linear model Module 12: Graphics in R 17

Unit 3 - Applied Data Science, Course 8b: Programming with Python for Data Science Learn the fundamentals of programming with R, from reading and writing data to customizing visualizations and performing predictive analysis. In this computer science course from Microsoft, developed in collaboration with the Technical University of Denmark (DTU), get the knowledge and skills you need to use R, the statistical programming language for data scientists, in the field of your choice. In this course you will learn all you need to get up to speed with programming in R. Explore R data structures and syntaxes, see how to read and write data from a local file to a cloud-hosted database, work with data, get summaries, and transform them to fit your needs. Plus, find out how to perform predictive analytics using R and how to create visualizations using the popular ggplot2 package. What machine learning is and the types of problems it is adept to solving How to represent raw data in a manner conducive to deriving valuable information How to use various data visualization techniques How to use principal component analysis and isomap intelligently to simplify your dana How to apply supervised learning algorithms to your data, such as random forest and support vector classifier Concepts such as model selection, pipelining, and cross validation Duration: 5 weeks Total effort: 48 54 hours Level: Intermediate Prerequisite knowledge: Course introduction to Python for Data Science will help. 18

Unit 3 - Applied Data Science, Course 9a: Implementing Predictive Solutions with Spark in Azure HDInsight Learn how to use Spark in Microsoft Azure HDInsight to create predictive analytics and machine learning solutions. Are you ready for big data science? In this course, learn how to implement predictive analytics solutions for big data using Apache Spark in Microsoft Azure HDInsight. See how to work with Scala or Python to cleanse and transform data and build machine learning models with Spark ML (the machine learning library in Spark). Using Spark to explore data and prepare for modeling Build supervised machine learning models Evaluate and optimize models Build recommenders and unsupervised machine learning models Duration: 3 weeks Total effort: 18 24 hours Level: Intermediate Prerequisite knowledge: Familiarity with Azure HDInsight. Familiarity with databases and SQL. Some programming experience. A willingness to learn actively in a self-paced manner. System Requirements To complete the hands-on elements in this course, you will require an Azure subscription and a Windows client computer. You can sign up for a free Azure trial subscription (a valid credit card is required for verification, but you will not be charged for Azure services). Note that the free trial is not available in all regions. Syllabus Module 1: Introduction to Data Science with Spark Get started with Spark clusters in Azure HDInsight, and use Spark to run Python or Scala code to work with data. Module 2: Getting Started with Machine Learning Learn how to build classification and regression models using the Spark ML library. Module 3: Evaluating Machine Learning Models Learn how to evaluate supervised learning models, and how to optimize model parameters. Module 4: Recommenders and Unsupervised Models Learn how to build recommenders and clustering models using Spark ML. 19

Unit 3 - Applied Data Science, Course 9b: Analyzing Big Data with Microsoft R Learn how to use Microsoft R Server to analyze large datasets using R, one of the most powerful programming languages. The open-source programming language R has for a long time been popular (particularly in academia) for data processing and statistical analysis. Among R's strengths are that it's a succinct programming language and has an extensive repository of third party libraries for performing all kinds of analyses. Together, these two features make it possible for a data scientist to very quickly go from raw data to summaries, charts, and even full-blown reports. However, one deficiency with R is that traditionally it uses a lot of memory, both because it needs to load a copy of the data in its entirety as a data.frame object, and also because processing the data often involves making further copies (sometimes referred to as copy-on-modify). This is one of the reasons R has been more reluctantly received by industry compared to academia. The main component of Microsoft R Server (MRS) is the RevoScaleR package, which is an R library that offers a set of functionalities for processing large datasets without having to load them all at once in the memory. RevoScaleR offers a rich set of distributed statistical and machine learning algorithms, which get added to over time. Finally, RevoScaleR also offers a mechanism by which we can take code that we developed on our laptop and deploy it on a remote server such as SQL Server or Spark (where the infrastructure is very different under the hood), with minimal effort. In this course, we will show you how to use MRS to run an analysis on a large dataset and provide some examples of how to deploy it on a Spark cluster or a SQL Server database. Upon completion, you will know how to use R for big-data problems. Read data from flat files into R s data frame object, investigate the structure of the dataset and make corrections, and store prepared datasets for later use Prepare and transform the dana Calculate essential summary statistics, do crosstabulation, write your own summary functions, and visualize data with the ggplot2 package Build predictive models, evaluate and compare models, and generate predictions on new data Duration: 3 weeks Total effort: 8 16 hours Level: Intermediate Prerequisite knowledge: Familiarity with R. 20

Unit 4 - Capstone Project, Capstone Project Solve a real-world data science problem in this capstone project for the Microsoft Professional Program in Data Science. Showcase the knowledge and skills you ve acquired during the Microsoft Professional Program for Data Science, and solve a real-world data science problem in this program capstone project. The project takes the form of a challenge in which you will explore a dataset and develop a machine learning solution that is tested and scored to determine your grade. Duration: 2 weeks Total effort: 12 16 hours Level: Advanced 21