Pavel has graduated from the Lomonosov Moscow State University in 2009, department of Computational Mathematics and Cybernetics. In 2013 he has finished his PhD program. Pavel started his professional career as a software engineer at one of the largest Russian telecom companies Beeline. Then he moved to Rambler&Co, which is the largest Internet media company in Russia. He developed machine learning algorithms for Rambler/News (second largest news aggregator in Russia).

Then he worked as the technical leader for Rambler/News and Rambler/Portal. Last 4 years Pavel worked as the Head of Machine Learning Department at Rambler&Co, supervising machine learning solutions for all kinds of business areas (computer vision, advertising, recommendations, credit scoring). Now he works as the Chief Data Scientist at NVIDIA supervising all machine learning tasks which use telemetry data collected from GPUs all over the world. Pavel is also the founder of the Moscow Spark community which unites several hundred professional users and developers of Apache Spark.

The graduates will be able to:

• Explain the principles of work and usage of NoSQL databases compared to traditional RDBMS systems

• Identify practical problems which can be solved with machine learning

• Build, tune and apply linear models with Spark MLLib

• Construct their own Big Data Service

• Apply the acquired skills in finance, social networks, telecommunications and many other fields

SKILLS:

- Algorithms

- Machine Learning

- MapReduce

- Python

- C++

- Software Development

- Natural Language Processing

DATE: 11 – 29 Jun, 2018

DURATION: 3 Weeks

LECTURES: 3 Hours per day

LANGUAGE: English

LOCATION: Barcelona, Harbour.Space Campus

COURSE TYPE: Offline

Session 1

Working with distributed file systems (HDFS)

WHAT YOU WILL LEARN

COURSE OUTLINE

ABOUT ALEXEY

BIBLIOGRAPHY

HARBOUR.SPACE

Session 2-3

Understanding and working with MapReduce

Session 4-5

SQL over BigData: Hive

BIG DATA ANALYSIS:
MAPREDUCE, SPARK,
BIGTABLE/HBASE,
DISTRIBUTED DATA ANALYSIS

During this course, the students will master and sharpen their knowledge in basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce, Spark, NoSQL. The subject of particular interest during this course is efficient data warehousing using Hive, Spark SQL and Spark Dataframes.

Being supervised, they will study the intricacies of systems’ internals and their applications as well as learn the distributed file systems, the purpose of their existence, the ways of their application. The listeners will also practice using the MapReduce framework, a workhorse for many modern Big Data applications. The possibility to apply knowledge into practice in order to process texts and solve sample business cases is the part and parcel of this course. The participants will deal with the next-generation computational framework - Spark, from its basic concepts up to advanced applications made to squeeze maximum performance. Finally, they will build and deploy their own service which will utilise SQL or NoSQL databases on the big scale.

ALEXEY DRAL

Founder, CEO at
BigData Team

We offer innovative university degrees taught in English by industry leaders from around the world, aimed at giving our students meaningful and creatively satisfying top-level professional futures. We think the future is bright if you make it so.

HARBOUR.SPACE UNIVERSITY

DATE: 11 – 29 Jun, 2018

DURATION: 3 Weeks

LECTURES: 3 Hours per day

LANGUAGE: English

LOCATION: Barcelona, Harbour.Space Campus

COURSE TYPE: Offline

BIG DATA ANALYSIS:
MAPREDUCE, SPARK,
BIGTABLE / HBASE,
DISTRIBUTED DATA ANALYSIS

PAVEL
KLEMENKOV

Alexey Dral is the Head of Big Data and Machine Learning and the Senior Lecturer at the department of Algorithms and Programming Technologies at Moscow Institute of Physics and Technologies. He is also the Head of Data Science School at the Corporate University of Sberbank.

His 10 year working experience for the top Russian and international companies, including Amazon AWS, Yandex, Rambler and qualification on large-scale problems, leading R&D teams, defining strategy for the whole departments build up a great mentor with an aspiration to offer insights to his students. He teaches around the globe, launches new onsite and online courses to reduce the gap between industry and academia, supervise Data Science and Data Engineering initiatives.

Research/Academic Interests: Big Data, Machine Learning, Recommender Systems

ABOUT PAVEL

Learning Spark: Lightning-Fast Big 
Data Analysis by Karau, H., Konwinski,
A., Wendell, P., & Zaharia, M

PAVEL 
KLEMENKOV

Session 6-7

Spark: in-memory computational model

SHOW MORE

Chief Data Scientist (Data Platform) at NVIDIA
Founder of the Moscow Spark tech community
Co-author of “Big Data for Data Engineers” Coursera specialization

BigData Administrator & DevOps engineer, BigData Team.

OLEG
IVCHENKO

Oleg is a PhD student at Moscow Institute of Physics and Technology, Department of Algorithms and Programming technologies. Oleg started to work with BigData at 2015. Now he is the Head of the BigData course at the Department of Algorithms and Programming technologies and co-developer of the testing framework for “Big Data for Data Engineers” Coursera specialization. He is also an Hadoop and HPC administrator at Yandex School of Data Analysis.

Under the direction of Alexey Dral he developed HJudge - the testing system for application in Hadoop ecosystem (Rospatent num. 2016660616). The next generation testing framework is used for autonomous testing of students' applications in "Big Data Analysis" course.

Research/Academic Interests: Big Data, Neural Networks, DevOps, Java

ABOUT OLEG (course TA)

BIG DATA ANALYSIS: MAPREDUCE, SPARK, BIGTABLE/HBASE,DISTRIBUTED DATA ANALYSIS

ALEXEY DRAL

BIG DATA ANALYSIS: MAPREDUCE, SPARK,BIGTABLE / HBASE,DISTRIBUTED DATA ANALYSIS

PAVEL KLEMENKOV

BIG DATA ANALYSIS:
MAPREDUCE, SPARK,
BIGTABLE/HBASE,
DISTRIBUTED DATA ANALYSIS

BIG DATA ANALYSIS:
MAPREDUCE, SPARK,
BIGTABLE / HBASE,
DISTRIBUTED DATA ANALYSIS

PAVEL
KLEMENKOV