Learning outcomes

The course is intended to be a natural extension of the database courses given in the Bachelor of Computer Science, Management Engineering and Mathematical Sciences. Its objective is to extend knowledge and skills in database engineering in a "big data" context, considering three main dimensions: 1) technological dimension: we will study the different families of contemporary "scalable" databases, in particular NoSQL databases: graph-based, document-based, column-based, key-value, multi-model. 2) structural dimension: we will no longer consider only one database with one schema, but several (usually heterogeneous) data sources with several schemas to be integrated/aggregated/consolidated/mapped. The methodological aspects of this process will be emphasised. 3) Algorithmic dimension: languages and families of algorithms allowing efficient handling of large volumes of data will be discussed. In particular, we will discuss parallel processing with - among others - the MapReduce framework and its implementations (Hadoop). The course will also aim to take a critical look at the different technological paradigms discussed in the course (strengths, weaknesses and risks) and to understand the links between the choice of paradigms, the choice of modelling and the type of processing.

Assessment method

The evaluation will consists of a group project (written report + oral defense). It will focus on the understanding/appropriation of the theoretical concepts seen in the course and on the application of those concepts trained during the practical exercices. Particular attention will be paid to the critical view that the student will be able to take of each paradigm as well as the ability to choose the most appropriate one according to the context. 

Sources, references and any support material

The course material will take the form of a list of slide shows. A list of additional readings may also be made available to students.

Language of instruction

French