Course intended for:

The training is aimed at developers, architects and application administrators who want to create or maintain data warehouses (DWH) using the Pentaho Business Intelligence Suite.

Course objective:

Course participants will learn how to design, implement, monitor, start-up, tune data warehouse components, know the general assumptions of the data warehouse such as the most popular relational/ROLAP data warehouse schemas. After the training, participants will be able to choose the right set of tools and techniques for their projects. In addition to a general introduction to DWH concepts, training is focusing on the Pentaho Business Intelligence Suite.

Course strengths:

The program includes both a general introduction to the subject of ETL, DWH, OLAP, as well as an overall presentation of the Pentaho Business Intelligence product stack. The training is unique because its subject is not fully recognized in the literature and knowledge of the DWH is highly fragmented. Training program is constantly updated due to a rapid development of ETL solutions


The participants are required basic knowledge of databases, basic programming skills in Java.

Course parameters:

5 * 7 hours of lectures and workshops at a ratio of 1:3. During workshops, in addition to simple exercises, participants will solve problems by implementing its ETL processes, model DWH data structures, perform basic administrative tasks. Group size: max. 8-10 people

Course curriculum

  1. Program of training

  2. Introduction

    1. Data warehouses basic concepts:

      1. OLTP, OLAP, database, data mart, data warehouse


      3. Normalization, aggregation, facts, dimensions

      4. SQL, MDX, XML/A

      5. ETL

      6. BigData, BigTable, NoSQL, non-relational databases and data warehouses

      7. Others

    2. Pentaho BI Suite

  3. DWH physical data structures

    1. Fact and dimension database tables

    2. Indices, views and materialized views

    3. ROLAP: star, snowflake, constellation database schema

    4. The TPC Benchmark H

    5. ROLAP vs MOLAP

    6. Database performance tuning in DWH

    7. Pentaho Aggregate Designer and database tuning

    8. Time dim in DWH

    9. GeoSpatial DWH and other domain specific DWHs

  4. ETL and Pentaho Data Integration (PDI)

    1. ETL

      1. Extraction of data

      2. Transformation, cleaning, replenishment od data

      3. Loading

      4. Data quality

      5. Staging

      6. Real-time DWH

      7. ETL performance problems

      8. ETL tools

    2. Pentaho Data Integration

      1. Architecture

        1. Kettle

        2. Spoon

        3. Pan

        4. Kitchen

        5. Carte

    3. Working with Spoon

      1. Installation, starting up, look & feel

      2. Variables

      3. Hops

      4. Working with XML files and repositories

      5. Data sharing

      6. Transformations

        1. Working with data sources

          1. Inputs and Outputs

          2. Table input/output

          3. Text file input/output

          4. XML file input/output

          5. Deserialize from/Serialize to

          6. Others

      7. Jobs

        1. Jobs (kjb) and transformations (ktr)

        2. Complex jobs

        3. Custom code jobs

        4. Workflows

        5. Working with files

        6. Monitoring

        7. Versioning

    4. Kitchen and Pan

      1. Running jobs and transformations

      2. Scheduling

      3. Error handling

      4. IO redirection

    5. Cartle

  5. OLAP and Pentaho Analysis Services (Mondian)

    1. OLAP cubes

    2. Pentaho Schema Workbench (PSW)

    3. Logical and physical structures

      1. cubes

      2. metrics

      3. dimensions, hierarchies, levels

      4. tables

      5. relations

      6. aggregations

      7. expressions

    4. MDX

    5. Slice, Dice, Drill

    6. OLAP tuning

  6. Pentaho Report Designer (PRD)

    1. Defining reports witch PRD

      1. Data sources

      2. Queries

      3. Data extraction

      4. Filtering and narrowing

      5. Visualization

    2. Embedding reports

    3. PDF, HTML, RTF export

    4. Report Wizard

  7. Ad-hoc reporting and analytics

    1. Pentaho BI Platform/Portal (BA Platform)

    2. Pentaho Interactive Reporting (PIR)

    3. Pentaho Metadata Editor (PME)

    4. Pentaho Analyzer (PAZ)

  8. Pentaho Dashboard Designer (PDD)

    1. Dashboards PDD

      1. simple bar, line, area, pie, dial charts

      2. Tables

      3. Reports

      4. Parameters

      5. Templates

    2. Embedding dashboards

  9. Pentaho Data Mining (WEKA)

    1. Architecture

    2. WEKA Explorer

    3. Preprocessing

      1. ARFF data format

      2. Preparing data do mining

      3. Choosing proper data attributes (data correlation)

      4. Filtering with WEKA np.: filtering, normalization, discretization

      5. Visualization

      6. Large dataset on 32bit JVM

      7. Stream mining, incremental learning

    4. Data mining

      1. Classification

      2. Grouping

      3. Association rules

      4. Reducing dimensions of data

      5. Other techniques

    5. Extending WEKA

    6. Using WEKA with Pentaho BI Suite

  10. Pentaho Mobile BI

Any questions?

* Required.

Phone +48 22 2035600
Fax +48 22 2035601