DISTRIBUTED SYSTEMS AND BIG DATA
Academic Year 2022/2023 - Teacher: Antonella DI STEFANOExpected Learning Outcomes
All the enterprise software systems and successful applications that we have today are wide-area distributed architecture which features a large quantity of data and codebase. This course aims to systematically present the main topics related to the design of these systems and to provide basic information on the related technologies
In this rapidly changing scenario, the main objective is to help the students improve their skills to apply the fundamental algorithms and patterns to design large scale complex solutions.
Knowledge and Understanding
Today there are many different continuously evolving solutions thanks to the research enterprise’s effort to meet the requirements of an ever-growing customer base. Thus, the goal is to give the students not only tools to meet today’s needs in the work environment but also to deal with the emerging challenges in the area of distributed software architecture maturing their abilities to adapt to novel solutions and technologies.
Among the main topics of the course:
-
Models and tools for software development, delivery, and deployment.
-
Working with *as a service (_aaS) approach.
-
Algorithms and patterns for resource sharing, concurrency control, and distributed data management.
-
Strategies for handling interoperability, QoS, Scalability, Consistency, and Fault Tolerance.
-
Transaction and Replica Management.
-
Microservices and the relevant main patterns and technologies.
-
Virtualization and containerization.
-
Notions of Storage and processing of Big Data.
Applied knowledge and Understanding
The intent of the course is to push the students to design software solutions and to improve their ability to autonomously engage in the design, development, and deployment, individually or in teams, of today’s real-world large distributed software architecture. The course uses practical activities and homework to achieve:
-
The ability to apply the right strategies to guarantee adequate QoS in terms of performance, scalability, availability, safety, and robustness in distributed systems.
-
Practical experience and know-how in the design and continuous development and deployment of microservices with container-based platforms (Docker) and Kubernetes orchestrators.
Course Structure
Lessons ex-catedra and laboratory activities.
Detailed Course Content
-
Algoritmi distribuiti sistemi sincroni vs. asincroni, modelli di clock:,clock di Lamport e vector clock, event ordering e concorrenza, global snapshot; Fault tolerance. Classificazione dei fault: crash e byzantine, dependability e algoritmi di consenso in presenza di varie topologie di fault, fault detector. algoritmi cooperativi: elezione. Mutua esclusione, deadlock detection. Transazioni distribuite. Consistenza e Replicazione. Distribuzione delle repliche. File System distribuiti
-
Paradigmi e modelli di progettazione del software in sistemi di ampie dimensioni - Qualità dei Servizi (QoS), Pattern per architetture sw distribuite, a componenti e servizi, e containers, Service Oriented Architecture (SOA e XaaS), sistemi multitier, message oriented systems, P2P, Cloud, Fog computing.
-
Big Data. Caratteristiche dei Big Data. CAP theorem. Big Data Storage. Partitioning, sharding. Saga. Data Stream processing.
-
Comunicazione, interoperabilità, naming e localizzazione - C/S e Group communication. Multicast e Broadcast. Messaging systems.
-
Paradigmi P2P - Strutturati (DHT) vs Non strutturati (flooding).
-
Microservizi: Docker e Kubernetes.
Textbook Information
-
“Distributed Systems: Concepts and Design”, G. Coulouris, J. Do llimore, T. Kindberg, G. Blair, 2011 – 5th edition”, Addison Wesley, 2011.
-
“Microservices Patterns, with examples java”, Chris Richardson, Madison
for technologies and freamworks you can refer to the websites (as it will be during the course)
other textbooks you can consult
-
“Distributed Computing: Principles, Algorithms, and Systems”, Kshemkalyani, M Singhal,
-
“Pattern Oriented Software Architecture” vol. 1-4 Buschmann, Schmidt et alii (particolare riferimento a vol. 2 e 4)
-
“Distributed and Cloud Computing”, K. Hwang, J.Dongarra, G.C. Fox, Morgan Kaufmann, 2011.
-
"Cloud Computing: Theory and Practice, Second Edition", Dan Marinescu, Morgan Kaufmann, 2018
-
K.P. Birman, Guide to Reliable Distributed Systems: Building High-Assurance Applications and Cloud-Hosted Services”, Springer, 2012.
-
“Kubernetes: Up and Running: Dive into the Future of Infrastructure”, K Hightower, O’Reilly 2017
Course Planning
Subjects | Text References | |
---|---|---|
1 | 1_tipi di Distributed Systems (DS) | |
2 | 2_caratteristiche dei Distributed Systems (DS) | |
3 | 3_lo scenario cloud | |
4 | 4_architettura sw: componenti, connettori, ruoli C/S e P2P | |
5 | 4a_modelli di comunicazione | |
6 | 5_placement, distribuzione verticale e orizzontale | |
7 | 6_SOA e REST | |
8 | 6b_springboot | |
9 | 7_monolithic vs microservice Architecture | |
10 | 8_DOCKER e Docker Compose | |
11 | 9_kubernetes | |
12 | 10_messaging | |
13 | 11_kafka | |
14 | 12a_HDFS, MapReduce, Function as a Service | |
15 | 12_Big Data. Data Lake e Data Pipeline | |
16 | 12b partitioning & sharding; NoSQL data model | |
17 | 13 graphDB e Neo4j | |
18 | 14 time series | |
19 | 15 Prometheus | |
20 | 1alg_Tempo | |
21 | 2alg_GlobalSnapshot | |
22 | 3alg_FaultDelivery | |
23 | 4alg_Consenso | |
24 | 5alg_Paxos; Raft | |
25 | 6alg_Flooding & DHT | |
26 | 6alg_a gossip | |
27 | 7alg_transactions & ACID properties | |
28 | 8alg_saga | |
29 | 9alg_repliche | |
30 | 10alg_ election; | |
31 | 11alg_CAP theorem & BASE properties |