Course intended for:

The training is intended for analysts and programmers, who would like to make their first step towards getting familiar with the Big Data technology, where the processed data volume is of the highest priority and exceeds the capabilities of traditional architecture and systems such as relational databases or even data warehouses.

Course objective:

The training participants will acquire basic knowledge of the Big Data scale problems, understand the MapReduce algorithm, get to know the BigTable, the NoSQL databases using the example of HBase and HDFS distributed file systems, they will get familiar with the Pig and Hive analytical tools. The users will be able to identify strengths and weaknesses of specific technologies, they will know when to use a given technology.

Course strengths:

The program offers a quick review of the basic technologies of the Apache Hadoop ecosystem. Apart from presentations, the participants will be able to attend a workshop and explore data sets on their own.


The training participants are required to have basic knowledge of SQL, bash, Python (or a different script language), Java.

Course parameters:

8 hours (7 net hours) of lectures and workshops (with a visible emphasis on workshops).

Group size: no more than 8-10 persons

Course curriculum

  1. Introduction to Big Data

    1. Definition

    2. BI, Big Data and data warehouses

    3. Genesis and history, BigTable, MapReduce, GFS

    4. Problem classification

    5. The concepts of real time, batch in the context of data processing

    6. Data storage – files, databases of NoSQL

    7. A review of Big Data systems and platforms

    8. A review of the Hadoop ecosystem

    9. Big Data distributions

  2. Introduction to MapReduce – the example of Hadoop platform

    1. Architecture

    2. HDFS and YARN

    3. Map Reduce Framework

    4. Map Reduce Streaming

    5. Workshop

      • HDFS

      • Map Reduce

  3. Introduction to data processing – the example of Pig

    1. Architecture

    2. Work modes

    3. Data types, keywords

    4. Syntax

    5. The Pig workshop

  4. Introduction to data analysis – the example of Hive

    1. Architecture

    2. Work modes

    3. Data types

    4. Syntax

    5. Data formats

    6. Comparison with Pig

    7. The Hive workshop

  5. Introduction to NoSQL on the basis of HBase

    1. What is NoSQL, NoSQL vs. relational databases

    2. A review of non-relational databases, CAP theorem

    3. Designing of non-relational databases

    4. HBase Architecture

    5. Data model

    6. Use

    7. CLI

    8. Data saving, reading

    9. HBase workshop

  6. Cluster monitoring and management – the example of Ambari

    1. CLI

    2. A review of Apache Ambari

Any questions?

* Required.

Phone +48 22 2035600
Fax +48 22 2035601