This course aims at proving students with knowledge on how to design and realize databases dedicated to Big Data. Two aspects are addressed: data organization (representation, storage, distribution, scaling, etc.) and organization of operations on data (definition, distribution, restitution, etc.). Overview of the course:
  • Introduction to distributed databases for Big Data: requirements and characteristics
  • Basic concepts of NoSQL DBMSs (vs. SQL): implicit schema, key-value pair, document or column oriented databases
  • WHAT properties (vs. TIPS/ACID, RICE), NewSQL
  • Development of distributed NoSQL databases (e.g. Hadoop, Spark & Storm)