Curs Big Data Expert

Acest curs de instruire de cinci zile este dedicat sustinerii examenului Big Data Expert.

5 zile x 8 ore/zi

Agenda cursului Big Data Expert

Module 1:


  • Understanding of Data Science
  • Importance of Python for Data Science
  • Significance in the industry and understanding the need of the hour
  • Data Science along with Python – how leading companies are benefitting with its use
  • Typical Analytics/ Data Science Projects – different phases and role of python
  • Python vs Anaconda

Module 2:


  • Understanding of Python
  • Installing Python
  • Understanding of Python editors and IDE’s (Canopy, PyCharm, Jupyter, Ipython etc.)
  • Understand Jupyter notebook and customize settings
  • Concept of Libraries/Packages ‐ Important Packages (NumPy/ SciPy/ scikit‐learn, Pandas, Matplotlib, etc.)
  • Loading & Installing Packages & Name Spaces
  • Data Types & Data Objects/structures (strings, Tuples, Lists, Dictionaries)
  • List & Dictionary Comprehensions
  • Variables & Value Labels – Date & Time Values
  • General Operations – Mathematical, string, date
  • Read and write data
  • Basic Plotting
  • Control flow and conditional statements
  • Code Profiling and Debugging
  • Creating Class and Modules and ways to call them
  • Technical distributions used in python for Data Science – NumPy, scify, pandas, scikitlearn, stat models, nltk etc.

Module 3:


  • Getting data from various sources (Csv, txt, excel, access etc.)
  • Connecting to Database
  • Observing Data objects – subsetting and methods
  • Sending data to various formats
  • Significant python modules – Pandas, beautifulsoup

Module 4:


  • Using Python for cleansing data
  • Steps for Data Manipulation (sorting, filtering, merging, duplicates, appending, subsetting, derived variables, sampling, data type conversions, renaming, formatting etc.)
  • Understanding tools for data manipulation (operators, functions, packages, control structures, loops, arrays etc.)
  • Built‐in functions of Python
  • User defined functions of Python
  • Pulling out irrelevant information
  • Data normalization
  • Data formatting
  • Important modules of Python for data manipulation (Pandas, NumPy, re, math, string, date, time etc.)

Module 5:


  • Understanding of exploratory data analysis
  • Descriptive statistics and frequency tables
  • Single variable analysis ‐ Distribution of Data and Graphical Analysis
  • Analyzing Cross tabs, Distributions & Relationships and Graphical Analysis)
  • Creating various graphs like bar, pie, linechart, histogram, boxplot, scatter, density etc.
  • Significant packages for Exploratory Analysis (NumPy, Arrays, Matplotlib, Seaborn, Pandas, SciPy etc.)

Module 6:


    • Basic stats for Measures of Central Tendencies and Variance
    • Going through and implementing Building Blocks – Probability Distributions – Normal Distributions – Central Limit Theorem
    • Going through and implementing Inferential Statistics – Sampling – Hypothesis Testing
    • Understanding the Statistical Methods ‐ Z/t‐tests, Anova, Correlation, Chi‐square
    • Understanding the relevant modules for statistical methods – NumPy, SciPy, Pandas

Module 7:


  • Understanding of Machine Learning and Predictive Modeling
  • Different kinds of Business Problems – Mapping of Techniques – Regression vs classification vs segmentation vs forecasting
  • Classification of Learning Algorithms – Supervised Learning vs Unsupervised Learning
  • Diverse phases of Predictive Modeling (Data Pre‐processing, Sampling, Model Building, Validation)
  • Bias‐Variance Trade Off & Performance Metrics
  • Dimension reduction and Feature engineering
  • Understanding of the concept optimization and cost function
  • Understanding the concept of Gradient descent algorithm
  • Understanding the concept of Cross Validation (Bootstrapping, K‐Fold Validation etc.)
  • Understanding Model Performance Metrics (R‐sqaure, RMSE, MAPE, AUC, ROC curve, recall, sensitivity, specificity, confusion metrics etc.)

Module 8:


  • Machine Learning – Linear and Logistic Regression
  • Segmentation, Cluster Analysis (K‐means)
  • Decision Tress (CART/CD 5.0)
  • Learning Random Forest, Bagging & Boosting)
  • Artificial Neural Networks (ANN)
  • Support Vector Machines (SVM)
  • Different Techniques (KNN, PCA)
  • Text Mining Introduction using NLTK
  • Understanding of Time Series Forecasting (Decomposition & ARIMA)
  • Significant Python Modules for Machine Learning (SciKit Learn, Stats Models, SciPy, NLTK, etc.)
  • Harmonizing models using Hyper parameters, grid search, piping etc.

Module 9:


  • Introduction and Relevance of Big Data
  • Big Data Analytics usage in various industries like Telecom, E‐commerce, Finance, Insurance, etc.
  • Issues with Traditional Large‐Scale Systems

Module 10:


  • Introduction to Hadoop
  • Diverse projects of Apache
  • Projects role in Hadoop’s environment
  • Big Data requirements through important technology foundations
  • Conditions and solutions of existing Data Analytics Architecture
  • Differentiation between traditional data management systems and Big Data management systems
  • Analyzing the key framework requirements for Big Data Analytics
  • Importance of Real Time Data
  • Usage of Big and Real Time Data as a Business Planning Tool
  • Hadoop Ecosystem & Hadoop 2.x core components
  • Hadoop Master‐Slave design
  • Distributed File System for Hadoop – notion of data storage
  • Various types of cluster set ups – fully distributed, PSEUDO etc.
  • Installing Hadoop cluster set up
  • Hadoop 2.x cluster designing
  • Hadoop cluster modes
  • Going through cluster management tools like Cloudera manager/Apache ambari

Module 11:


  • Understanding HDFS and HDFS data storage
  • Getting the data from local machine to Hadoop and vice versa
  • Understanding of MapReduce and understanding of Mapper and Reducer (Traditional way vs MapReduce way)
  • Get to know about the framework of MapReduce program
  • Use of basic JAVA for developing MapReduce Program
  • Use of streaming basic API for developing MapReduce Program

Module 12:


  • Merging of Hadoop with an existing Enterprise
  • Usage of Sqoop for loading data from RDBMS into HDFS
  • Usage of Flume for managing real time data
  • Using Legacy systems for accessing HDFS

Module 13:


  • Data Analysis Tools introduction
  • Apace PIG – MapReduce vs PIG, PIG Use cases
  • Data Model of PIG
  • Streaming of PIG
  • PIG Latin program and its execution
  • Understanding of PIG Latin – Relational Operators, File Loaders, Group Operator, Diagnostic Operators, PIG UDF, Joins and COGROUP
  • Composing JAVA UDF’s
  • Embedded PIG in JAVA
  • PIG Macros
  • Substitution Parameter
  • Automating the design using PIG and implementing MapReduce applications
  • Usage of PIG for applying a structured data to an unstructured Big Data

Module 14:


  • Apache Hive – Hive vs PIG, Hive use cases
  • Understanding the HIVE data storage principle
  • Understanding the File Formats and the Records Formats supported by the Hive environment
  • Perform operations with data in Hive
  • Hive QL – Joining Tables, Dynamic Partitioning, Custom Map, Map/Reduce Scripts
  • Hive UDF, Hive Script
  • Methods of loading data in Hive
  • Serialization and Deserialization
  • Hive Persistence formats
  • Using Hive for handling Text Data
  • Merging external BI tools with Hadoop Hive

Module 15:


  • Introduction of Impala and its format
  • Execution of queries through Impala and its relevance
  • Hive vs PIG Vs Impala
  • User defined functions for extending Impala

Module 16:


  • NoSQL Database – Hbase
  • Introduction to oozie

Module 17:


  • Understanding Apace Spark
  • In Memory Data vs the Streaming Data
  • Spark Vs MapReduce
  • Different modes of Spark
  • Demonstration of Spark Installation
  • Understanding Spark on a cluster
  • Spark Standalone Cluster

Module 18:


  • Invoking Spark Shell
  • Establishing the Spark Context
  • Getting a file loaded in shell
  • Conducting basic operations on files in Spark Shell
  • Overview of Cache
  • Distributed Persistence
  • Overview of Spark Streaming

Module 19:


  • Scrutinize Hive and Spark SQL format
  • Analyzing Spark SQL
  • Spark SQL Context
  • Merging Hive and Spark SQL
  • Supporting JSON and Parquet File Formats
  • Implementing Data Visualization in Spark
  • Data Loading
  • Queries of Hive Through Spark
  • Spark’s Performance Tuning Tips
  • Shared Variables – Broadcast Variables and Accumulators

Module 20:


  • Usage of Spark Streaming for extracting and analyzing data from twitter
  • Understanding of comparison between Spark and Storm

Module 21:


  • Understanding of GraphX module in Spark
  • Designing graphs with GraphX

Module 22:


  • Overview of Machine Learning Framework
  • Implementing few of the Machine Learning algorithms using Spark MLLib

Alte cursuri Beingcert de care ai putea fi interesat

Detinem in portofoliu mai mult de 25 tipuri diferite de cursuri Beingcert adresate companiilor (organizate la orice data si in orice locatie din tara)

Standard de calitate

Conform insusi principiului de baza al Sistemului de management al calitatii, implementat de IT Learning, obiectivul nostru este satisfactia clientului. Pentru atingerea acestui obiectiv, evaluarea calitatii serviciilor livrate este esentiala.

In acest sens va incurajam sa folositi orice cale si metoda de comunicare (feedback la cald si la rece, testimonial scris, telefon, e-mail, blog, forum, retele sociale etc.), pentru a va exprima, nu atat satisfactia pentru calitatea serviciilor noastre, care reprezinta in fapt angajamentul nostru ferm, asumat prin contract, cat mai ales, daca este cazul, insatisfactia de orice fel privind prestatia noastra, care ne va ajuta sa imbunatatim standardul acestor servicii, in beneficiul dvs.

Inscriere / Facturare / Plata / “Money back guarantee”

  • Daca sunteti persoana fizica, sau grup de maxim 5 participanti din partea unei companii, va puteti inscrie doar in clasele noastre deschise, anuntate pe site la pagina “Calendar Cursuri Open
  • Rezervarea locului / locurilor in sala de curs Open se face telefonic (0787.692.238) sau prin e-mail la adresa , in reply urmand sa primiti fisa de inscriere care trebuie completata de dvs. si retrimisa noua impreuna cu datele de facturare
  • Urmeaza emiterea facturii, pe care o veti primi tot pe e-mail, in baza careia urmeaza sa efectuati plata (transfer bancar sau depunere numerar in contul IT Learning) si sa ne trimiteti confirmarea aferenta
  • Veti primi la randul dvs. confirmarea noastra ferma ca sunteti inscris(a) la cursul respectiv, impreuna cu detaliile organizatorice (orarul cursului, coffee-break, pauza de pranz etc.)
  • Urmeaza livrarea cursului, iar la finalul acestuia completarea feedback-ului, conform formularului de evaluare , in baza caruia se poate invoca clauza “Money back guarantee”: “In cazul obtinerii unui nivel de satisfactie mai mic de 75 % , reflectat de formularul de feedback, garantam returnarea taxei de participare sau reluarea cursului fara nici un cost”.
  • Daca sunteti persoana juridica si doriti inscrierea unui grup de peste 5 participanti, vom formula o oferta personalizata cu discount de volum, exclusiv pentru compania dvs., conform specificatiilor primite telefonic, prin fax (0371.602.780) sau pe e-mail la adresa
  • Dupa acceptarea ofertei (livrabile, costuri, agenda, perioada si locatia organizarii cursului) urmeaza etapa contractuala
  • Odata agreata forma finala a contractului de legal-ul partilor, urmeaza semnarea si livrarea efectiva a serviciilor convenite, a caror facturare si plata se va face numai dupa primirea feadback-urilor completate de absolventi la finalul instruirii si numai in virtutea clauzei “Money back guarantee”:”In cazul obtinerii unui nivel de satisfactie mai mic de 75 % , reflectat de formularului atasat , garantam renuntarea la contravaloarea instruirii sau reluarea cursului cu alt trainer, fara nici un cost aditional”)

Inscrie-te la curs



Despre mine

Tip mesaj