Pavel has graduated from the Lomonosov Moscow State University in 2009, department of Computational Mathematics and Cybernetics. In 2013 he has finished his PhD program. Pavel started his professional career as a software engineer at one of the largest Russian telecom companies Beeline. Then he moved to Rambler&Co, which is the largest Internet media company in Russia. He developed machine learning algorithms for Rambler/News (second largest news aggregator in Russia).
Then he worked as the technical leader for Rambler/News and Rambler/Portal. Last 4 years Pavel worked as the Head of Machine Learning Department at Rambler&Co, supervising machine learning solutions for all kinds of business areas (computer vision, advertising, recommendations, credit scoring). Now he works as the Chief Data Scientist at NVIDIA supervising all machine learning tasks which use telemetry data collected from GPUs all over the world. Pavel is also the founder of the Moscow Spark community which unites several hundred professional users and developers of Apache Spark.
The graduates will be able to:
• Explain the principles of work and usage of NoSQL databases compared to traditional RDBMS systems
• Identify practical problems which can be solved with machine learning
• Build, tune and apply linear models with Spark MLLib
• Construct their own Big Data Service
• Apply the acquired skills in finance, social networks, telecommunications and many other fields
SKILLS:
- Algorithms
- Machine Learning
- MapReduce
- Python
- C++
- Software Development
- Natural Language Processing
DATE: 11 – 29 Jun, 2018
DURATION: 3 Weeks
LECTURES: 3 Hours per day
LANGUAGE: English
LOCATION: Barcelona, Harbour.Space Campus
COURSE TYPE: Offline
Session 1
Working with distributed file systems (HDFS)
WHAT YOU WILL LEARN
COURSE OUTLINE
ABOUT ALEXEY
BIBLIOGRAPHY
HARBOUR.SPACE
Session 2-3
Understanding and working with MapReduce
Session 4-5
SQL over BigData: Hive
During this course, the students will master and sharpen their knowledge in basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce, Spark, NoSQL. The subject of particular interest during this course is efficient data warehousing using Hive, Spark SQL and Spark Dataframes.
Being supervised, they will study the intricacies of systems’ internals and their applications as well as learn the distributed file systems, the purpose of their existence, the ways of their application. The listeners will also practice using the MapReduce framework, a workhorse for many modern Big Data applications. The possibility to apply knowledge into practice in order to process texts and solve sample business cases is the part and parcel of this course. The participants will deal with the next-generation computational framework - Spark, from its basic concepts up to advanced applications made to squeeze maximum performance. Finally, they will build and deploy their own service which will utilise SQL or NoSQL databases on the big scale.
HARBOUR.SPACE UNIVERSITY
DATE: 11 – 29 Jun, 2018
DURATION: 3 Weeks
LECTURES: 3 Hours per day
LANGUAGE: English
LOCATION: Barcelona, Harbour.Space Campus
COURSE TYPE: Offline
Alexey Dral is the Head of Big Data and Machine Learning and the Senior Lecturer at the department of Algorithms and Programming Technologies at Moscow Institute of Physics and Technologies. He is also the Head of Data Science School at the Corporate University of Sberbank.
His 10 year working experience for the top Russian and international companies, including Amazon AWS, Yandex, Rambler and qualification on large-scale problems, leading R&D teams, defining strategy for the whole departments build up a great mentor with an aspiration to offer insights to his students. He teaches around the globe, launches new onsite and online courses to reduce the gap between industry and academia, supervise Data Science and Data Engineering initiatives.
Research/Academic Interests: Big Data, Machine Learning, Recommender Systems
ABOUT PAVEL
Learning Spark: Lightning-Fast Big Data Analysis by Karau, H., Konwinski, A., Wendell, P., & Zaharia, M
PAVEL KLEMENKOV
All rights reserved. 2018
Session 6-7
Spark: in-memory computational model
Chief Data Scientist (Data Platform) at NVIDIA
Founder of the Moscow Spark tech community
Co-author of “Big Data for Data Engineers” Coursera specialization
OLEG IVCHENKO
Oleg is a PhD student at Moscow Institute of Physics and Technology, Department of Algorithms and Programming technologies. Oleg started to work with BigData at 2015. Now he is the Head of the BigData course at the Department of Algorithms and Programming technologies and co-developer of the testing framework for “Big Data for Data Engineers” Coursera specialization. He is also an Hadoop and HPC administrator at Yandex School of Data Analysis.
Under the direction of Alexey Dral he developed HJudge - the testing system for application in Hadoop ecosystem (Rospatent num. 2016660616). The next generation testing framework is used for autonomous testing of students' applications in "Big Data Analysis" course.
Research/Academic Interests: Big Data, Neural Networks, DevOps, Java
ABOUT OLEG (course TA)