Big Data Analysis: MapReduce, Spark, BigTable/HBase, Distributed Data Analysis with Alexey Dral at Harbour.Space University

BIG DATA ANALYSIS: 
MAPREDUCE, SPARK, 
BIGTABLE/HBASE, 
DISTRIBUTED 
DATA ANALYSIS

ALEXEY DRAL

We offer innovative university degrees taught in English by industry leaders from around the world, aimed at giving our students meaningful and creatively satisfying top-level professional futures. We think the future is bright if you make it so.

During this course, the students will master and sharpen their knowledge in basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce, Spark, and NoSQL. The subject of particular interest during this course is efficient data warehousing using Hive, Spark SQL, and Spark Dataframes.

Being supervised, they will study the intricacies of systems’ internals and their applications, as well as learn the distributed file systems, the purpose of their existence, and the ways of their application. The listeners will also practice using the MapReduce framework, a workhorse for many modern Big Data applications. The possibility to apply knowledge into practice in order to process texts and solve sample business cases is the part and parcel of this course.

The participants will deal with the next-generation computational framework - Spark, from its basic concepts up to advanced applications made to squeeze maximum performance. Finally, they will build and deploy their own service which will utilise SQL or NoSQL databases on the big scale.

Founder, CEO at BigData Team

Alexey Dral is the Head of Big Data and Machine Learning and the Senior Lecturer at the department of Algorithms and Programming Technologies at Moscow Institute of Physics and Technologies. He is also the Head of Data Science School at the Corporate University of Sberbank.

His 10 year working experience for the top Russian and international companies, including Amazon AWS, Yandex, Rambler and qualification on large-scale problems, leading R&D teams, defining strategy for the whole departments build up a great mentor with an aspiration to offer insights to his students.

He teaches around the globe, launches new onsite and online courses to reduce the gap between industry and academia, supervise Data Science and Data Engineering initiatives.

After completing this course, a student will be able to:

- Explain the principles of work and usage of NoSQL databases compared to traditional RDBMS systems

- Identify practical problems which can be solved with machine learning

- Build, tune and apply linear models with Spark MLLib

- Construct their own Big Data Service

- Apply the acquired skills in finance, social networks, telecommunications and many other fields

SKILLS:

- Big Data

- Machine Learning

- Recommender Systems

ABOUT ALEXEY

HARBOUR.SPACE

WHAT YOU WILL LEARN

DATE: 19 Aug - 6 Sep, 2019

DURATION: 3 Weeks

LECTURES: 3 Hours per day

LANGUAGE: English

LOCATION: Barcelona, Harbour.Space Campus

COURSE TYPE: Offline

HARBOUR.SPACE UNIVERSITY

DATE: 19 Aug - 6 Sep, 2019

DURATION: 3 Weeks

LECTURES: 3 Hours per day

LANGUAGE: English

LOCATION: Barcelona, Harbour.Space Campus

COURSE TYPE: Offline

COURSE OUTLINE

Session 1

Working with distributed file systems (HDFS)

Session 6-7

Spark: in-memory computational model

Session 4-5

SQL over BigData: Hive

Session 2-3

Understanding and working with MapReduce

BIG DATA
ANALYSIS: 
MAPREDUCE, 
SPARK, 
BIGTABLE/HBASE, 
DISTRIBUTED 
DATA ANALYSIS

BIBLIOGRAPHY

"Hadoop: The Definitive Guide" 
by Tom White
(O'Reilly Media, 2015)

"Big Data Analysis: Hive, Spark SQL, 
DataFrames and GraphFrames"
(Coursera)

During this course, the students will master and sharpen their knowledge in basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce, Spark, and NoSQL. The subject of particular interest during this course is efficient data warehousing using Hive, Spark SQL, and Spark Dataframes.

Being supervised, they will study the intricacies of systems’ internals and their applications, as well as learn the distributed file systems, the purpose of their existence, and the ways of their application. The listeners will also practice using the MapReduce framework, a workhorse for many modern Big Data applications. The possibility to apply knowledge into practice in order to process texts and solve sample business cases is the part and parcel of this course.

The participants will deal with the next-generation computational framework - Spark, from its basic concepts up to advanced applications made to squeeze maximum performance. Finally, they will build and deploy their own service which will utilise SQL or NoSQL databases on the big scale.

After completing this course, a student will be able to:

- Explain the principles of work and usage of NoSQL databases compared to traditional RDBMS systems

- Identify practical problems which can be solved with machine learning

- Build, tune and apply linear models with Spark MLLib

- Construct their own Big Data Service

- Apply the acquired skills in finance, social networks, telecommunications and many other fields

Alexey Dral is the Head of Big Data and Machine Learning and the Senior Lecturer at the department of Algorithms and Programming Technologies at Moscow Institute of Physics and Technologies. He is also the Head of Data Science School at the Corporate University of Sberbank. His 10 year working experience for the top Russian and international companies, including Amazon AWS, Yandex, Rambler and qualification on large-scale problems, leading R&D teams, defining strategy for the whole departments build up a great mentor with an aspiration to offer insights to his students.

He teaches around the globe, launches new onsite and online courses to reduce the gap between industry and academia, supervise Data Science and Data Engineering initiatives.

"Big Data Essentials: HDFS, 
MapReduce and Spark RDD"
(Coursera)