Big data: engineering and processing
- UE code IDASM101
-
Schedule
30 15Quarter 2
- ECTS Credits 5
-
Language
French
- Teacher Cleve Anthony
The course is intended to be a natural extension of the database courses given in the Bachelor of Computer Science, Management Engineering and Mathematical Sciences. Its objective is to extend knowledge and skills in database engineering in a "big data" context, considering three main dimensions: 1) technological dimension: we will study the different families of contemporary "scalable" databases, in particular NoSQL databases: graph-based, document-based, column-based, key-value, multi-model. 2) structural dimension: we will no longer consider only one database with one schema, but several (usually heterogeneous) data sources with several schemas to be integrated/aggregated/consolidated/mapped. The methodological aspects of this process will be emphasised. 3) Algorithmic dimension: languages and families of algorithms allowing efficient handling of large volumes of data will be discussed. In particular, we will discuss parallel processing with - among others - the MapReduce framework and its implementations (Hadoop). The course will also aim to take a critical look at the different technological paradigms discussed in the course (strengths, weaknesses and risks) and to understand the links between the choice of paradigms, the choice of modelling and the type of processing.
The course alternates, for each of the three dimensions addressed, a presentation of the theoretical concepts and a practical application of these concepts. The practical application may take various forms such as : • preparation of exercises at home and resolution in class; • preparation and solution of exercises in class ; • demonstration of the use of technology in the classroom; * a group project. These exercises will be done on paper or on a machine.
The evaluation will consists of a group project (written report + oral defense). It will focus on the understanding/appropriation of the theoretical concepts seen in the course and on the application of those concepts trained during the practical exercices. Particular attention will be paid to the critical view that the student will be able to take of each paradigm as well as the ability to choose the most appropriate one according to the context.
The course material will take the form of a list of slide shows. A list of additional readings may also be made available to students.
Training | Study programme | Block | Credits | Mandatory |
---|---|---|---|---|
Master in Computer Science, Professional focus in Data Science | Standard | 0 | 5 | |
Master in Business Engineering, Professional focus in Data Science | Standard | 0 | 5 | |
Master in Mathematics, Professional focus in Data Science | Standard | 0 | 5 | |
Certificat d'université d'Executive Master en Data Science | Standard | 0 | 5 | |
Master in Computer Science, Professional focus in Data Science | Standard | 1 | 5 | |
Master in Business Engineering, Professional focus in Data Science | Standard | 1 | 5 | |
Certificat d'université d'Executive Master en Data Science | Standard | 1 | 5 | |
Master in Mathematics, Professional focus in Data Science | Standard | 2 | 5 |