Home
Big Data Training
Apache Spark Training
SMACK Stack for Data Science Training Course

SMACK Stack for Data Science Training Course

SMACK is a collection of data platform softwares, namely Apache Spark, Apache Mesos, Apache Akka, Apache Cassandra, and Apache Kafka. Using the SMACK stack, users can create and scale data processing platforms.

This instructor-led, live training (online or onsite) is aimed at data scientists who wish to use the SMACK stack to build data processing platforms for big data solutions.

By the end of this training, participants will be able to:

Implement a data pipeline architecture for processing big data.
Develop a cluster infrastructure with Apache Mesos and Docker.
Analyze data with Spark and Scala.
Manage unstructured data with Apache Cassandra.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

This course is available as onsite live training in Australia or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction

SMACK Stack Overview

What is Apache Spark? Apache Spark features
What is Apache Mesos? Apache Mesos features
What is Apache Akka? Apache Akka features
What is Apache Cassandra? Apache Cassandra features
What is Apache Kafka? Apache Kafka features

Scala Language

Scala syntax and structure
Scala control flow

Preparing the Development Environment

Installing and configuring the SMACK stack
Installing and configuring Docker

Apache Akka

Using actors

Apache Cassandra

Creating a database for read operations
Working with backups and recovery

Connectors

Creating a stream
Building an Akka application
Storing data with Cassandra
Reviewing connectors

Apache Kafka

Working with clusters
Creating, publishing, and consuming messages

Apache Mesos

Allocating resources
Running clusters
Working with Apache Aurora and Docker
Running services and jobs
Deploying Spark, Cassandra, and Kafka on Mesos

Apache Spark

Managing data flows
Working with RDDs and dataframes
Performing data analysis

Troubleshooting

Handling failure of services and errors

Summary and Conclusion

Requirements

An understanding of data processing systems

Audience

Data Scientists

14 Hours

Number of participants

Online

Classroom

Select Location

Please select a Venue

Price per participant

Open Training Courses require 5+ participants.

SMACK Stack for Data Science Training Course - Booking

Full Name *

Email *

Phone *

Job Title

Company Name

Address 1 *

City *

State / Province

Country *

Postcode *

Start Date

Tax ID

Dates are subject to availability and take place between 09:30 and 16:30.

Payment *

Bank Transfer (Invoice, PO)

Debit / Credit Card

Comments

Terms and Conditions *

I am an authorised representative of the above named client and I wish to book the above courses or services in accordance with NobleProg Terms and Conditions and Privacy Policy.

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

SMACK Stack for Data Science Training Course - Enquiry

Full Name *

Email *

Phone *

Number of participants

Company Name

Company Address

How do you want to take the course?

Client Premises

Online

Classroom

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

SMACK Stack for Data Science - Consultancy Enquiry

Full Name *

Phone *

Email *

Company Name

Consultancy Subject *

Consultancy Goal

Who will the consultant work with?

Consultancy Urgency *

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Testimonials (1)

very interactive...

Richard Langford

Course - SMACK Stack for Data Science

Provisional Upcoming Courses (Require 5+ participants)

SMACK Stack for Data Science

2026-04-27 09:30

14 hours

London Circuit

5000 AUD (Online)

11860 AUD (Classroom)

SMACK Stack for Data Science

2026-05-11 09:30

14 hours

200 Mary Street, Brisbane

5000 AUD (Online)

11860 AUD (Classroom)

SMACK Stack for Data Science

2026-05-25 09:30

14 hours

Adelaide City Central

5000 AUD (Online)

11860 AUD (Classroom)

SMACK Stack for Data Science

2026-06-08 09:30

14 hours

Melbourne 385 Bourke Street

5000 AUD (Online)

11860 AUD (Classroom)

Related Courses

Introduction to Data Science and AI using Python

35 Hours

This is a 5 day introduction to Data Science and Artificial Intelligence (AI).

The course is delivered with examples and exercises using Python

Apache Airflow for Data Science: Automating Machine Learning Pipelines

21 Hours

This instructor-led, live training in Australia (online or onsite) is aimed at intermediate-level participants who wish to automate and manage machine learning workflows, including model training, validation, and deployment using Apache Airflow.

By the end of this training, participants will be able to:

Set up Apache Airflow for machine learning workflow orchestration.
Automate data preprocessing, model training, and validation tasks.
Integrate Airflow with machine learning frameworks and tools.
Deploy machine learning models using automated pipelines.
Monitor and optimize machine learning workflows in production.

Anaconda Ecosystem for Data Scientists

14 Hours

This instructor-led, live training in Australia (online or onsite) is aimed at data scientists who wish to use the Anaconda ecosystem to capture, manage, and deploy packages and data analysis workflows in a single platform.

By the end of this training, participants will be able to:

Install and configure Anaconda components and libraries.
Understand the core concepts, features, and benefits of Anaconda.
Manage packages, environments, and channels using Anaconda Navigator.
Use Conda, R, and Python packages for data science and machine learning.
Get to know some practical use cases and techniques for managing multiple data environments.

AWS Cloud9 for Data Science

28 Hours

This instructor-led, live training in Australia (online or onsite) is aimed at intermediate-level data scientists and analysts who wish to use AWS Cloud9 for streamlined data science workflows.

By the end of this training, participants will be able to:

Set up a data science environment in AWS Cloud9.
Perform data analysis using Python, R, and Jupyter Notebook in Cloud9.
Integrate AWS Cloud9 with AWS data services like S3, RDS, and Redshift.
Utilize AWS Cloud9 for machine learning model development and deployment.
Optimize cloud-based workflows for data analysis and processing.

Introduction to Google Colab for Data Science

14 Hours

This instructor-led, live training in Australia (online or onsite) is aimed at beginner-level data scientists and IT professionals who wish to learn the basics of data science using Google Colab.

By the end of this training, participants will be able to:

Set up and navigate Google Colab.
Write and execute basic Python code.
Import and handle datasets.
Create visualizations using Python libraries.

A Practical Introduction to Data Science

35 Hours

Participants who complete this training will gain a practical, real-world understanding of Data Science and its related technologies, methodologies and tools.

Participants will have the opportunity to put this knowledge into practice through hands-on exercises. Group interaction and instructor feedback make up an important component of the class.

The course starts with an introduction to elemental concepts of Data Science, then progresses into the tools and methodologies used in Data Science.

Audience

Developers
Technical analysts
IT consultants

Format of the Course

Part lecture, part discussion, exercises and heavy hands-on practice

Note

To request a customized training for this course, please contact us to arrange.

Data Science Programme

245 Hours

The explosion of information and data in today’s world is un-paralleled, our ability to innovate and push the boundaries of the possible is growing faster than it ever has. The role of Data Scientist is one of the highest in-demand skills across industry today.

We offer much more than learning through theory; we deliver practical, marketable skills that bridge the gap between the world of academia and the demands of industry.

This 7 week curriculum can be tailored to your specific Industry requirements, please contact us for further information or visit the Nobleprog Institute website

Audience:

This programme is aimed post level graduates as well as anyone with the required pre-requisite skills which will be determined by an assessment and interview.

Delivery:

Delivery of the course will be a mixture of Instructor Led Classroom and Instructor Led Online; typically the 1st week will be 'classroom led', weeks 2 - 6 'virtual classroom' and week 7 back to 'classroom led'.

Data Science for Big Data Analytics

35 Hours

Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.

Data Science essential for Marketing/Sales professionals

21 Hours

This course is meant for Marketing Sales Professionals who are intending to get deeper into application of data science in Marketing/ Sales. The course provides
detailed coverage of different data science techniques used for “upsale”, “cross-sale”, market segmentation, branding and CLV.

Difference of Marketing and Sales - How is that sales and marketing are different?

In very simplewords, sales can be termed as a process which focuses or targets on individuals or small groups. Marketing on the other hand targets a larger group or the general public. Marketing includes research (identifying needs of the customer), development of products (producing innovative products) and promoting the product (through advertisements) and create awareness about the product among the consumers. As such marketing means generating leads or prospects. Once the product is out in the market, it is the task of the sales person to persuade the customer to buy the product. Sales means converting the leads or prospects into purchases and orders, while marketing is aimed at longer terms, sales pertain to shorter goals.

Kaggle

14 Hours

This instructor-led, live training in Australia (online or onsite) is aimed at data scientists and developers who wish to learn and build their careers in Data Science using Kaggle.

By the end of this training, participants will be able to:

Learn about data science and machine learning.
Explore data analytics.
Learn about Kaggle and how it works.

Accelerating Python Pandas Workflows with Modin

14 Hours

This instructor-led, live training in Australia (online or onsite) is aimed at data scientists and developers who wish to use Modin to build and implement parallel computations with Pandas for faster data analysis.

By the end of this training, participants will be able to:

Set up the necessary environment to start developing Pandas workflows at scale with Modin.
Understand the features, architecture, and advantages of Modin.
Know the differences between Modin, Dask, and Ray.
Perform Pandas operations faster with Modin.
Implement the entire Pandas API and functions.

PySpark and Machine Learning

21 Hours

This training provides a practical introduction to building scalable data processing and Machine Learning workflows using PySpark. Participants learn how Apache Spark operates within modern Big Data ecosystems and how to efficiently process large datasets using distributed computing principles.

GPU Data Science with NVIDIA RAPIDS

14 Hours

This instructor-led, live training in Australia (online or onsite) is aimed at data scientists and developers who wish to use RAPIDS to build GPU-accelerated data pipelines, workflows, and visualizations, applying machine learning algorithms, such as XGBoost, cuML, etc.

By the end of this training, participants will be able to:

Set up the necessary development environment to build data models with NVIDIA RAPIDS.
Understand the features, components, and advantages of RAPIDS.
Leverage GPUs to accelerate end-to-end data and analytics pipelines.
Implement GPU-accelerated data preparation and ETL with cuDF and Apache Arrow.
Learn how to perform machine learning tasks with XGBoost and cuML algorithms.
Build data visualizations and execute graph analysis with cuXfilter and cuGraph.

Python and Spark for Big Data (PySpark)

21 Hours

In this instructor-led, live training in Australia, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.

By the end of this training, participants will be able to:

Learn how to use Spark with Python to analyze Big Data.
Work on exercises that mimic real world cases.
Use different tools and techniques for big data analysis using PySpark.

Stratio: Rocket and Intelligence Modules with PySpark

14 Hours

Stratio is a data-centric platform that integrates big data, AI, and governance into a single solution. Its Rocket and Intelligence modules enable rapid data exploration, transformation, and advanced analytics in enterprise environments.

This instructor-led, live training (online or onsite) is aimed at intermediate-level data professionals who wish to use the Rocket and Intelligence modules in Stratio effectively with PySpark, focusing on looping structures, user-defined functions, and advanced data logic.

By the end of this training, participants will be able to:

Navigate and work within the Stratio platform using Rocket and Intelligence modules.
Apply PySpark in the context of data ingestion, transformation, and analysis.
Use loops and conditional logic to control data workflows and feature engineering tasks.
Create and manage user-defined functions (UDFs) for reusable data operations in PySpark.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Related Categories

SMACK Stack for Data Science Training Course

Course Outline

Requirements

Testimonials (1)

Richard Langford

Course - SMACK Stack for Data Science

Provisional Upcoming Courses (Require 5+ participants)

SMACK Stack for Data Science

SMACK Stack for Data Science

SMACK Stack for Data Science

SMACK Stack for Data Science

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites