DISTRIBUTED SYSTEMS AND BIG DATA

Academic Year 2022/2023 - Teacher: Antonella DI STEFANO

Expected Learning Outcomes

All the enterprise software systems and successful applications that we have today are wide-area distributed architecture which features a large quantity of data and codebase. This course aims to systematically present the main topics related to the design of these systems and to provide basic information on the related technologies  

In this rapidly changing scenario, the main objective is to help the students improve their skills to apply the fundamental algorithms and patterns to design large scale complex solutions. 

Knowledge and Understanding

Today there are many different continuously evolving solutions thanks to the research enterprise’s effort to meet the requirements of an ever-growing customer base. Thus, the goal is to give the students not only tools to meet today’s needs in the work environment but also to deal with the emerging challenges in the area of distributed software architecture maturing their abilities to adapt to novel solutions and technologies.

Among the main topics of the course:

  • Models and tools for software development, delivery, and deployment.

  • Working with *as a service (_aaS) approach.

  • Algorithms and patterns for resource sharing, concurrency control, and distributed data management.

  • Strategies for handling interoperability, QoS, Scalability, Consistency, and Fault Tolerance.

  • Transaction and Replica Management.

  • Microservices and the relevant main patterns and technologies.

  • Virtualization and containerization.

  • Notions of Storage and processing of Big Data.

Applied knowledge and Understanding

The intent of the course is to push the students to design software solutions and to improve their ability to autonomously engage in the design, development, and deployment, individually or in teams, of today’s real-world large distributed software architecture. The course uses practical activities and homework to achieve:

  • The ability to apply the right strategies to guarantee adequate QoS in terms of performance, scalability, availability, safety, and robustness in distributed systems.

  • Practical experience and know-how in the design and continuous development and deployment of microservices with container-based platforms (Docker) and Kubernetes orchestrators.

Course Structure

Lessons ex-catedra and laboratory activities.

Detailed Course Content

  • Algoritmi distribuiti sistemi sincroni vs. asincroni, modelli di clock:,clock di Lamport e vector clock, event ordering e concorrenza, global snapshot; Fault tolerance. Classificazione dei fault: crash e byzantine, dependability e algoritmi di consenso in presenza di varie topologie di fault, fault detector. algoritmi cooperativi: elezione. Mutua esclusione, deadlock detection. Transazioni distribuite. Consistenza e Replicazione. Distribuzione delle repliche. File System distribuiti

  • Paradigmi e modelli di progettazione del software in sistemi di ampie dimensioni - Qualità dei Servizi (QoS), Pattern per architetture sw distribuite, a componenti e servizi, e containers, Service Oriented Architecture (SOA e XaaS), sistemi multitier, message oriented systems, P2P, Cloud, Fog computing.

  • Big Data. Caratteristiche dei Big Data. CAP theorem. Big Data Storage. Partitioning, sharding. Saga. Data Stream processing. 

  • Comunicazione, interoperabilità, naming e localizzazione - C/S e Group communication. Multicast e Broadcast. Messaging systems.

  • Paradigmi P2P - Strutturati (DHT) vs Non strutturati (flooding).

  • Microservizi: Docker e Kubernetes.

Textbook Information

  • “Distributed Systems: Concepts and Design”, G. Coulouris, J. Do llimore, T. Kindberg, G. Blair, 2011 – 5th edition”, Addison Wesley, 2011.

  • “Microservices Patterns, with examples java”, Chris Richardson, Madison

for technologies and freamworks you can refer to the websites (as it will be during the course)

other textbooks you can consult 

  • “Distributed Computing: Principles, Algorithms, and Systems”, Kshemkalyani, M Singhal,

  • “Pattern Oriented Software Architecture” vol. 1-4 Buschmann, Schmidt et alii (particolare riferimento a vol. 2 e 4)

  • “Distributed and Cloud Computing”, K. Hwang, J.Dongarra, G.C. Fox, Morgan Kaufmann, 2011.

  • "Cloud Computing: Theory and Practice, Second Edition", Dan Marinescu, Morgan Kaufmann, 2018

  • K.P. Birman, Guide to Reliable Distributed Systems: Building High-Assurance Applications and Cloud-Hosted Services”, Springer, 2012.

  • “Kubernetes: Up and Running: Dive into the Future of Infrastructure”, K Hightower, O’Reilly 2017

Course Planning

 SubjectsText References
11_tipi di Distributed Systems (DS)
22_caratteristiche dei Distributed Systems (DS)
33_lo scenario cloud
44_architettura sw: componenti, connettori, ruoli C/S e P2P
54a_modelli di comunicazione
65_placement, distribuzione verticale e orizzontale
76_SOA e REST
86b_springboot
97_monolithic vs microservice Architecture
108_DOCKER e Docker Compose
119_kubernetes
1210_messaging
1311_kafka
1412a_HDFS, MapReduce, Function as a Service
1512_Big Data. Data Lake e Data Pipeline
1612b partitioning & sharding; NoSQL data model
1713 graphDB e Neo4j
1814 time series
1915 Prometheus
201alg_Tempo
212alg_GlobalSnapshot
223alg_FaultDelivery
234alg_Consenso
245alg_Paxos; Raft
256alg_Flooding & DHT
266alg_a gossip
277alg_transactions & ACID properties
288alg_saga
299alg_repliche
3010alg_ election;
3111alg_CAP theorem & BASE properties
VERSIONE IN ITALIANO