DISTRIBUTED SYSTEMS AND BIG DATA

Academic Year 2019/2020 - 2° Year
Teaching Staff: Antonella DI STEFANO
Credit Value: 9
Scientific field: ING-INF/05 - Information processing systems
Taught classes: 49 hours
Exercise: 30 hours
Term / Semester:

Learning Objectives

Currently, all enterprise software systems are distributed systems.

The course aims to deal with issues concerning large-scale distributed systems design in today's scenario. the aim of the course is to provide knowledge on modern technologies in distributed architecture, and especially to offer to student the ability to apply algorithms and project methodologies in the realization of complex, large-scale solutions.

 

More in detail, the course aims to

1) provide skills on

  • software structures for development delivery & deployment
  • main components of software as a Service (_aaS) architecture;
  • algorithms and patterns of resource sharing, concurrency control and data management.
  • algorithms and techniques for managing heterogeneity and interoperability of components and services, QoS, scalability, consistency and fault tolerance.
  • transactions and replicas management
  • microservices design patterns and technologies
  • resource virtualization techniques of a "containerization" system
  • Big Data storage and processing features

 

2) provide capability to apply technologies and tools for the design, development and deployment of large distributed systems in today's real world scenarios. In particular:

  • capability to apply the main strategies to guarantee certain levels of QoS in terms of performance, scalability, availability and robustness in distributed systems.
  • Knowledge of techniques for the design and deployment of microservices on Docker-based container platforms and for the Kubernetes orchestralization.
  • how to use of Storm and Spark for data stream processing

Course Structure

Lessons ex-catedra and laboratory activities.


Detailed Course Content

  • Algoritmi distribuiti sistemi sincroni vs. asincroni, modelli di clock:,clock di Lamport e vector clock, event ordering e concorrenza, global snapshot; Fault tolerance. Classificazione dei fault: crash e byzantine, dependability e algoritmi di consenso in presenza di varie topologie di fault, fault detector. algoritmi cooperativi: elezione. Mutua esclusione, deadlock detection. Transazioni distribuite. Consistenza e Replicazione. Distribuzione delle repliche. File System distribuiti

  • Paradigmi e modelli di progettazione del software in sistemi di ampie dimensioni - Qualità dei Servizi (QoS), Pattern per architetture sw distribuite, a componenti e servizi, e containers, Service Oriented Architecture (SOA e XaaS), sistemi multitier, message oriented systems, P2P, Cloud, Fog computing.

  • Big Data. Caratteristiche dei Big Data. CAP theorem. Big Data Storage. Partitioning, sharding. Saga. Data Stream processing. Ecosistema Apache Storm e Spark.

  • Comunicazione, interoperabilità, naming e localizzazione - C/S e Group communication. Multicast e Broadcast. Messaging systems.

  • Paradigmi P2P - Strutturati (DHT) vs Non strutturati (flooding).

Microservizi: Docker e Kubernetes.


Textbook Information

riferimenti principali

  • “Distributed Systems: Concepts and Design”, G. Coulouris, J. Do llimore, T. Kindberg, G. Blair, 2011 – 5th edition”, Addison Wesley, 2011.

  • “Microservices Patterns, with examples java”, Chris Richardson, Madison

  • per le tecnologie si consiglia di consultare i siti web indicati durante il corso

altri testi consultabili

  • “Distributed Computing: Principles, Algorithms, and Systems”, Kshemkalyani, M Singhal,

  • “Pattern Oriented Software Architecture” vol. 1-4 Buschmann, Schmidt et alii (particolare riferimento a vol. 2 e 4)

  • “Distributed and Cloud Computing”, K. Hwang, J.Dongarra, G.C. Fox, Morgan Kaufmann, 2011.

  • "Cloud Computing: Theory and Practice, Second Edition", Dan Marinescu, Morgan Kaufmann, 2018

  • K.P. Birman, Guide to Reliable Distributed Systems: Building High-Assurance Applications and Cloud-Hosted Services”, Springer, 2012.

  • “Learning Spark” H Karau, A Konwinski, P Wendell, and M Zaharia, O’Reilly

  • “Kubernetes: Up and Running: Dive into the Future of Infrastructure”, K Hightower, O’Reilly 2017