Overview

The Big Data Engineer career path prepares learners to use the Big Data platform and methodologies in order to collect and analyse large amounts of data from different sources. This will require skills in Big Data architecture, such as Apache Hadoop, Ambari, Spark, Big SQL, HDFS, YARN, MapReduce, ZooKeeper, Knox, Sqoop, and HBase.

Learning Outcome

After completing this course, you should be able to understand the following topics:
  • Big Data and Data Analytics
  • Hortonworks Data Platform (HDP)
  • Apache Ambari
  • Hadoop and the Hadoop Distributed File System
  • MapReduce and Yarn
  • Apache Spark
  • Storing and Quering data
  • ZooKeeper, Slider, and Knox
  • Loading data with Sqooq
  • Dataplane Service
  • Stream Computing
  • Data Science essentials
  • Drew Conway’s Venn Diagram - and that of others
  • The Scientific Process applied to Data Science
  • The steps in running a Data Science project
  • Languages used for Data Science (Python, R, Scala, Julia)
  • Survey of Data Science Notebooks
  • Markdown language with notebooks
  • Resources for Data Science, including GitHub
  • Jupyter Notebook
  • Essential packages: NumPy, SciPy, Pandas, Scikit-learn, NLTK, BeautifulSoup...
  • Data visualisations: matplotlib, PixieDust
  • Using Jupyter “Magic” commands
  • Using Big SQL to access HDFS data
  • Creating Big SQL schemas and tables
  • Querying Big SQL tables
  • Managing the Big SQL Server
  • Configuring Big SQL security
  • Data federation with Big SQL
  • IBM Watson Studio
  • Analysing data with Watson Studio

Who should Attend?

  • New entrants to the industry who want to pick up a working knowledge on how to analyse large amounts of data from different sources using big data platforms

Eligibility Criteria

Course attendees are expected to have:
  1. Academic Level of at least GCE O Level or equivalent
  2. English language proficiency of at least IELTS 5.0 or equivalent
  3. Basic SQL knowledge
  4. Working knowledge with big data and Hadoop technologies
  5. Have a basic understanding of notebook technologies for data science
    (Attendees can attend free courses at www.bigdatauniversity.com to acquire the necessary requirements)
This course is endorsed under Critical Infocomm Technology Resource Programme Plus (CITREP+) Programme.
To find out more about CITREP+ Funding, please refer to Programme Support under CITREP+ page


Information is accurate as of 10 June 2019