Course intended for:

Data analysts and programmers who want to start analyzing big sets of data.

Course objective:

The training is designed to prepare participants for the role of a big data analyst. The training focuses on the smooth entry into the basics of each tool so that the participant can easily navigate the Hadoop ecosystem in the future.

Course strengths

Learning about multiple tools and programming languages; training aims to show how easy it is to analyze data without the use of the console and IDE tools


Basic SQL, basic programming skills, especially in: Python, R lub Java

Course parameters

2*8 hours (2*7 net hours) of lectures and workshops. Group size: max 8-10 people

Course curriculum

  1. Introduction to Big Data and MapReduce
  2. Apache Hadoop ecosystem
  3. Big Data Analyst ecosystem
  4. Hadoop architecture
    1. HDFS
    2. YARN
    3. MapReduce, Tez
    4. Basic operations in Hadoop
  5. Hive
    1. Introduction
    2. Basic commands
    3. Basic SQL queries
    5. Functions
    6. Workshop on data exploration
  6. Pig
    1. Introduction
    2. Pig Latin
    3. Pig Shell
    4. Creating ETL processes
    5. Functions
    6. User-defined functions
    7. Using other sources of data
    8. Data exploration
  7. Spark
    1. Introduction
    2. RDD
    3. Transformations and actions
    4. Spark SQL
    5. Integration with Hive
    6. Data exploration using PySpark and Spark
  8. Machine Learning
    1. Introduction
    2. Supervised and unsupervised learning
    3. Machine Learning tasks
    4. Solving typical Machine Learning tasks
      1. Spark ML
      2. H2O

Any questions?

* Required.

Phone +48 22 2035600
Fax +48 22 2035601