BIG DATA ANALYSIS: MAPREDUCE, SPARK, BIGTABLE/HBASE, DISTRIBUTED DATA ANALYSIS
ALEXEY DRAL
We offer innovative university degrees taught in English by industry leaders from around the world, aimed at giving our students meaningful and creatively satisfying top-level professional futures. We think the future is bright if you make it so.
During this course, the students will master and sharpen their knowledge in basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce, Spark, and NoSQL. The subject of particular interest during this course is efficient data warehousing using Hive, Spark SQL, and Spark Dataframes.
Being supervised, they will study the intricacies of systems’ internals and their applications, as well as learn the distributed file systems, the purpose of their existence, and the ways of their application. The listeners will also practice using the MapReduce framework, a workhorse for many modern Big Data applications. The possibility to apply knowledge into practice in order to process texts and solve sample business cases is the part and parcel of this course.
The participants will deal with the next-generation computational framework - Spark, from its basic concepts up to advanced applications made to squeeze maximum performance. Finally, they will build and deploy their own service which will utilise SQL or NoSQL databases on the big scale.
Alexey Dral is the Head of Big Data and Machine Learning and the Senior Lecturer at the department of Algorithms and Programming Technologies at Moscow Institute of Physics and Technologies. He is also the Head of Data Science School at the Corporate University of Sberbank.
His 10 year working experience for the top Russian and international companies, including Amazon AWS, Yandex, Rambler and qualification on large-scale problems, leading R&D teams, defining strategy for the whole departments build up a great mentor with an aspiration to offer insights to his students.
He teaches around the globe, launches new onsite and online courses to reduce the gap between industry and academia, supervise Data Science and Data Engineering initiatives.
After completing this course, a student will be able to:
- Explain the principles of work and usage of NoSQL databases compared to traditional RDBMS systems
- Identify practical problems which can be solved with machine learning
- Build, tune and apply linear models with Spark MLLib
- Construct their own Big Data Service
- Apply the acquired skills in finance, social networks, telecommunications and many other fields
SKILLS:
- Big Data
- Machine Learning
- Recommender Systems
ABOUT ALEXEYHARBOUR.SPACE WHAT YOU WILL LEARN
DATE: 19 Aug - 6 Sep, 2019
DURATION: 3 Weeks
LECTURES: 3 Hours per day
LANGUAGE: English
LOCATION: Barcelona, Harbour.Space Campus
COURSE TYPE: Offline
HARBOUR.SPACE UNIVERSITY
DATE: 19 Aug - 6 Sep, 2019
DURATION: 3 Weeks
LECTURES: 3 Hours per day
LANGUAGE: English
LOCATION: Barcelona, Harbour.Space Campus
COURSE TYPE: Offline
All rights reserved. 2017
COURSE OUTLINESession 1
Working with distributed file systems (HDFS)
Session 6-7
Spark: in-memory computational model
Session 4-5
SQL over BigData: Hive
Session 2-3
Understanding and working with MapReduce
BIG DATA ANALYSIS: MAPREDUCE, SPARK, BIGTABLE/HBASE, DISTRIBUTED DATA ANALYSIS
BIBLIOGRAPHYDuring this course, the students will master and sharpen their knowledge in basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce, Spark, and NoSQL. The subject of particular interest during this course is efficient data warehousing using Hive, Spark SQL, and Spark Dataframes.
Being supervised, they will study the intricacies of systems’ internals and their applications, as well as learn the distributed file systems, the purpose of their existence, and the ways of their application. The listeners will also practice using the MapReduce framework, a workhorse for many modern Big Data applications. The possibility to apply knowledge into practice in order to process texts and solve sample business cases is the part and parcel of this course.
The participants will deal with the next-generation computational framework - Spark, from its basic concepts up to advanced applications made to squeeze maximum performance. Finally, they will build and deploy their own service which will utilise SQL or NoSQL databases on the big scale.
After completing this course, a student will be able to:
- Explain the principles of work and usage of NoSQL databases compared to traditional RDBMS systems
- Identify practical problems which can be solved with machine learning
- Build, tune and apply linear models with Spark MLLib
- Construct their own Big Data Service
- Apply the acquired skills in finance, social networks, telecommunications and many other fields
Alexey Dral is the Head of Big Data and Machine Learning and the Senior Lecturer at the department of Algorithms and Programming Technologies at Moscow Institute of Physics and Technologies. He is also the Head of Data Science School at the Corporate University of Sberbank. His 10 year working experience for the top Russian and international companies, including Amazon AWS, Yandex, Rambler and qualification on large-scale problems, leading R&D teams, defining strategy for the whole departments build up a great mentor with an aspiration to offer insights to his students.
He teaches around the globe, launches new onsite and online courses to reduce the gap between industry and academia, supervise Data Science and Data Engineering initiatives.