Analyzing Big Data with Microsoft R

During this course you will learn how to use Microsoft R Server to create and run an analysis on a large dataset, and how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.

Målgrupp

The primary audience for this course is people who wish to analyze large datasets within a big data environment.
The secondary audience are developers who need to integrate R analyses into their solutions. 

Ämnesområden

Module 1: Microsoft R Server and R Client

This module gives an overview of how Microsoft R Server and Microsoft R Client work

  • What is Microsoft R server
  • Using Microsoft R client
  • The ScaleR functions

Module 2: Exploring Big Data

This module module covers how to use R Client with R Server to explore big data held in different data stores.

  • Understanding ScaleR data sources
  • Reading data into an XDF object
  • Summarizing data in an XDF object

Module 3: Visualizing Big Data

This module covers how to how to visualize data by using graphs and plots.

  • Visualizing In-memory data with ggplot2
  • Visualizing big data with rxLinePlot and rxHistogram

Module 4: Processing Big Data

This module explains how to transform and clean big data sets.

  • Transform big data using rxDataStep
  • Perform sort and merge operations over big data sets

Module 5: Parallelizing Analysis Operations

This module explains how to implement options for splitting analysis jobs into parallel tasks.

  • Use the rxLocalParallel compute context with rxExec
  • Use the RevoPemaR package to write customized scalable and distributable analytics.

Module 6: Creating and Evaluating Regression Models

This module covers how to build and evaluate regression models generated from big data.

  • Cluster big data to reduce the size of a dataset.
  • Create linear and logit regression models and use them to make predictions.

Module 7: Creating and Evaluating Partitioning Models

This module explains how to create and score partitioning models generated from big data.

  • Create partitioning models using the rxDTree, rxDForest, and rxBTree algorithms.
  • Test partitioning models by making and comparing predictions.

Module 8: Processing Big Data in SQL Server and Hadoop

This module covers how to transform and clean big data sets.

  • Using R in SQL Server
  • Using Hadoop Map/Reduce
  • Using Hadoop Spark

Förkunskaper

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices.
  • Basic knowledge of the Microsoft Windows operating system and its core functionality.

För att alltid hålla en hög kvalitet på våra teknikkurser använder vi både engelsk- och svensktalande experter som kursledare.

Boka kursen

Boka din plats redan idag.

Om kursen

Pris: 26 450,00 kr

exklusive moms

Längd 3 dagar
Kurskod M20773

Software AssuranceSA-voucher gäller på denna kurs
KompetenskortKompetenskort gäller på denna kurs 

Boka kursen

Välj ort och kursstart

lc LiveClass innebär att kursen hålls som en lärarledd interaktiv onlineutbildning.

30 september

Kunduppgifter

Kursanmälan är bindande. För mer information och avbokningsregler se våra allmänna villkor.