SS ZG526 DISTRIBUTED COMPUTING K V Vamsi Krishna kvvamsi@wilp.bits-pilani.ac.in

1 of

Published on Aug 19, 2023

Page 1 (0s)

[Audio] Good Morning everyone! Today we will be discussing SSZG526 - Distributed Computing. We'll be covering topics such as Logical Clocks, Vector Clocks, Lamport Timestamps, Matrix Clock, Singhal–kshemkalyani’s Differential Technique, and Fowler – Zwaenepoel’s Direct-dependency Technique. Please stay tuned for an interesting and informative session..

Page 2 (26s)

IMP Note to Self. SS ZG526 - Distributed Computing.

Page 3 (36s)

Contact Session – 2 Logical Clocks & Vector Clocks [T1: Chap - 3].

Page 4 (44s)

[Audio] Distributed Computing can help us leverage the power of remote computing in our day to day lives. To make the best use of this technology, it is important to understand that simply joining the session does not guarantee attendance. To ensure your attendance, you must stay in the session until it is complete. Additionally, we must make the session more interactive by responding to professors' questions and calls. Whenever your name or number is called, be sure to respond promptly or it may be counted as an absence..

Page 5 (1m 18s)

[Audio] Distributed Computing is a field that has seen much advancement in recent years. To explore it thoroughly, two books should be considered fundamental: "Distributed Computing: Principles, Algorithms, and Systems" by Ajay D. Kshemkalyani and Mukesh Singhal, which is an in-depth reference on the theoretical aspects, and "Distributed and Cloud Computing: From Parallel Processing to the Internet of Things" co-authored by Kai Hwang, Geoffrey C. Fox and Jack J. Dongarra, which provides the most up-to-date distributed computing methods and practices. Additionally, "P2P Networking and Applications", co-authored by John F. Buford, Heather Yu and Eng K. Lua, offers a comprehensive overview of the basics of Peer-to-Peer networks and applications. These books have been instrumental in condensing complicated distributed computing concepts, and are necessities for researchers, developers, and undergraduate students..

Page 6 (2m 18s)

Module Details. SS ZG526 - Distributed Computing.

Page 7 (2m 34s)

[Audio] Distributed Computing is fast becoming a key area of study in our digital world, and logical time is an essential factor in this. Implementing logical clocks, scalar time and Lamport timestamps is crucial for providing data that is both synchronized and accurate. Through using vector time and vector clocks, increased precision can be achieved, and the rules for vector clocks and updates make sure the data is reliable and the network is steady. The Singhal–Kshemkalyani’s Differential Technique and Fowler–Zwaenepoel’s Direct-Dependency Technique enable even more intricate networks to be created and preserved. Physical clock synchronization is the cornerstone of distributed computing, with the commonly used Network Time Protocol (NTP) for synchronization..

Page 8 (3m 22s)

[Audio] Distributed computing is a powerful technology that makes it possible to break down complex tasks into smaller, interdependent parts to be executed simultaneously. Rather than relying entirely on physical time to track the order of events and the underlying computation, causality is used to help understand and optimize distributed systems. By keeping track of causality, powerful reasoning and inference can be applied to better understand and refine the system..

Page 9 (3m 52s)

[Audio] Distributed computing is essential in our communication-driven society. It offers a range of solutions for distributed systems, ranging from design of distributed algorithms to the maintenance of liveness and fairness in mutual exclusion algorithms and consistency in replicated databases. Additionally, it assists us in forming deadlock detection algorithms, in order to evade phantom and undetected deadlocks. All these activities are essential for effective and efficient communication between distributed systems..

Page 10 (4m 25s)

[Audio] Distributed computing is a way to use the processing power of many computers at the same time. It enables us to monitor events occurring on multiple machines and measure the progression of processes within distributed calculations. This allows us to maintain a unified condition for resuming operations, creating milestones, and spotting and fixing file discrepancies. It's immensely useful for distributed debugging, restoring systems after a failure, and for replicated databases..

Page 11 (4m 57s)

[Audio] Research into the concept of concurrency measure is essential, focusing on the degree of overlap between tasks in a computer program. Concurrency emerges when events that are causally related need to be synchronized, whereas those that are not can be performed concurrently to benefit the program. The understanding of the number of events that are causally dependent is essential to help define the amount of concurrency present in any program..

Page 12 (5m 23s)

[Audio] We discuss the implementation of logical clocks to ensure consistency on a distributed system. Two issues must be addressed: the data structures local to every process to represent the logical time, and a protocol to ensure the consistency condition is met. This understanding will enable us to ensure data integrity and validity on our distributed system..

Page 13 (5m 48s)

[Audio] Distributed computing provides a solution to the problem of requiring too much computing power for a single machine to handle. Each process involved manages its own local logical clock, which allows it to track its own progress, as well as a logical global clock which assigns accurate timestamps to its local events. This combination of structures allows multiple processes to execute tasks simultaneously while still remaining synchronized..

Page 14 (6m 16s)

[Audio] Protocol ensures that a process' logical clock and its view of the global time is managed consistently. It contains two rules which are responsible for local and global logical clock updates..

Page 15 (6m 30s)

[Audio] Distributed Computing has become an essential part of modern computing. The Time Domain is represented by a set of non-negative integers, with the logical local clock of each process being compressed into a single integer variable. Rules R1 and R2 are used to update the clocks - every time a process p_i is about to execute an event, it adds d (a value greater than 0) to C_i. This ensures that each event is uniquely identified at a process while keeping the growth rate of d at its lowest..

Page 16 (7m 5s)

[Audio] R2 is a crucial part of distributed computing which allows secure data exchange between two systems. It provides timestamps that mark both the arrival time of the message and the sender's timestamp. This creates a common time frame between the two systems which act as an extra layer of security. Whenever a message is received, a process runs a set of tasks such as updating its own clock value, executing R1, and delivering the message. Having these timestamps ensures that communication is secure in distributed computing systems..

Page 17 (7m 42s)

SCALAR TIME. SS ZG526 - Distributed Computing. 17.

Page 18 (7m 50s)

[Audio] Distributed computing has multiple advantages to businesses, such as increased efficiency and higher scalability. The Lamport timestamp algorithm plays a huge role in distributed computing as it enables a partial ordering of events with reduced expenditure. The algorithm was created by Leslie Lamport, a stanch of distributed computing. Businesses that utilize this algorithm can be better equipped to efficiently use their distributed systems and gain improved scalability..

Page 19 (8m 22s)

[Audio] In distributed computing, ensuring functionality can be complex. An example of this complexity is seen when two processes communicate with each other as well as with a disk, requesting access. The disk grants access based on the order of the messages received. Issues can arise when one process sends a message to the disk and then sends a message to the other process, where the second process could send a message to the disk and result in an unintended order..

Page 20 (8m 52s)

[Audio] Distributed Computing is a field of study in computer science that focuses on the design and development of computer systems. Distributed systems can be complex and require precision when dealing with two messages sent from two different nodes, as it might be difficult to figure out which one was sent first. To address this issue, a logical clock algorithm is available to determine the chronological order of events within a distributed system. This algorithm is cost efficient, secure and robust, helping maintain the system's integrity..

Page 21 (9m 27s)

[Audio] Distributed computing is an effective way of processing large datasets. Lamport created a system which records the sequence of events numerically. Lamport logical clocks are essentially counters stored in each process. Before an event, a process adds to the counter and adds the value of the counter to any message it sends. On receiving a message, the process sets its counter to the maximum of the message counter and its own counter, then the message is considered as received. This system provides a reliable and efficient way of understanding the order of events in a distributed system..

Page 22 (10m 6s)

[Audio] Distributed computing has revolutionized the way computing power is used in today's world. Through this powerful technology a single logical clock is used to define the order of events happening within a distributed system. Conceptually, this logical clock can be thought of as a clock that only has meaning in relation to messages moving between processes. When a process receives a message, it resynchronizes its logical clock with that sender, enabling it to track the ordering of events that occur across the system..

Page 23 (10m 39s)

[Audio] Distributed Computing has enabled computers to work together and exchange data in multiple locations. To maintain data accuracy, each process holds a vector that captures its local time. This vector is employed to timestamp events, so that the outcomes are as accurate as possible. By utilizing these vectors, distributed computing is able to monitor the most recent updates in different places, resulting in a more efficient system..

Page 24 (11m 9s)

[Audio] Process Pi updates its clock by following two rules. Rule 1 states that before an event occurs, the local logical time must be increased by a positive value d. This ensures that all event timestamps are distinct and that events can be sequenced correctly..

Page 25 (11m 26s)

[Audio] Distributed computing is an effective way of tackling intricate computing issues quickly and competently. Each process in a distributed system deals with its own data or assignments, but also exchanges information with other procedures. To guarantee the correctness of results, each message sent between processes is correlated with a vector clock of the sender's process when the message was sent. When the recipient process gets this message, it revises its global logical time prior to taking action. This guarantees that the distributed computing yields results which are verifiable and reliable..

Page 26 (12m 5s)

[Audio] Distributed computing is a field of computer science that studies distributed systems, which are systems that involve a network of computers that communicate and coordinate their actions via messages. This type of computing is extremely powerful and can be used to solve problems that would normally take immense amounts of time and resources. With distributed computing, we can now process larger and more complex datasets, uncover more detailed insights, and achieve results more quickly than ever before. Distributed computing is allowing us to push the boundaries of what's possible with computing..

Page 27 (12m 42s)

[Audio] Distributed Computing is a way for systems to interact and coordinate activities, leveraging the power of multiple computers to make faster, more efficient operations. Vector Clocks is an algorithm that allows us to establish a partial ordering of events in such systems, aiding us in detecting causality violations. It works by having each process maintain a local copy of the global clock array, which is in turn used to update the state of the sending process's logical clock. Vector Clocks can be used in distributed computing systems to ensure better coordination and communication between processes..

Page 28 (13m 21s)

[Audio] Distributed Computing is a way of utilizing multiple computers to collaborate and solve a problem. Every computer has its own clock and keeps records of its own activities. To maintain logical order and ensure accurateness, every time a process has an inner event, its logical clock in the vector is increased by one. Furthermore, every time a process passes on a message, the entire vector gets sent, and whenever a message is received, the process's logical clock in the vector is increased by one and the vector's element is updated by taking the highest of the value of its own vector clock and the value of the received message. This way, communication between computers is certain to be consistent and exact, ensuring effective and precise distributed computing..

Page 29 (14m 9s)

[Audio] Matrix clocks are a valuable asset in distributed computing, offering a means of capturing chronological and causal links between happenings in a system. It allows each host to ascertain the network-wide understanding of time and set a lowest point of knowledge that each other host possesses. This is advantageous for a range of scenarios, for instance checkpointing and garbage collection. Matrix clocks make it possible for distributed systems to track the sequence and relevance of events taking place in them..

Page 30 (14m 41s)

[Audio] Distributed computing is a powerful way to accomplish large scale computing goals. Singhal–Kshemkalyani's differential technique provides an advanced approach to utilizing the potential of distributed computing. It is founded on the idea that only a small number of the vector clock of a process is likely to be altered in the time between consecutive message dispatches to the same process. Due to this property, the technique is especially advantageous for large scale distributed computing goals, wherein only a few processes interact regularly. This technique ensures that message passing is conducted as efficiently as possible by only including those entries that are dissimilar in the vector clock since the last message dispatched to a process..

Page 31 (15m 28s)

[Audio] Discussing a technique called Distributed Computing, it helps to minimize the size of messages, the communication bandwidth and buffer requirements. In the worst-case scenario, each element of the vector clock would require updating at each point of communication, meaning each message from point Pi to Pj would need to carry a vector timestamp of size n. However, usually, the size of the timestamp on a message will be less than n, making this technique efficient for cutting down on the size of messages, communication bandwidth and buffer requirements..

Page 32 (16m 0s)

Singhal–Kshemkalyani’s differential technique. SS ZG526 - Distributed Computing.

Page 33 (16m 6s)

[Audio] A scalar value can be sent to reduce the size of messages, which means that processes would not need to compute vector clocks for events in real time. The tracking of direct dependencies on other processes instead, and the construction of vector times for an event off-line could result in fewer and smaller messages, and consequently, improved scalability..

Page 34 (16m 28s)

Fowler–Zwaenepoel’s direct-dependency technique. SS ZG526 - Distributed Computing.

Page 35 (16m 34s)

[Audio] Clock synchronization is essential in distributed systems as different machines in the system have different clocks. If not synchronized, events occurring in different machines may be out of order. To ensure events are ordered in a consistent manner within the system, a single reference clock should be maintained that all machines adhere to. This ensures that order of events is consistent across the system and prevents any ambiguity..

Page 36 (17m 3s)

[Audio] Distributed computing involves leveraging the resources of multiple computers to solve a problem. However, there is no global clock or shared memory to precisely synchronize all the components, making it a challenge to coordinate them. This lack of agreement on a common time can create issues for applications that heavily rely on it. Hence, processors must have their own internal clocks that are accurately calibrated to address any discrepancies. This solution adds complexity to the setup, but ultimately provides the most dependable approach to maintain a consistent idea of time..

Page 37 (17m 42s)

[Audio] Clock synchronization is a vital part of distributed computing that is essential for secure systems, identifying faults and recoveries, scheduled operations, database systems, and much more. By keeping distributed processors in time and giving them a single, unified time reference, network protocols and applications can make use of timeouts, making their design easier and improving efficiency. Precise clock synchronization is thus vital when it comes to distributed computing and can significantly affect many fields..

Page 38 (18m 14s)

[Audio] Synchronizing clocks in distributed computing is important in order to maintain accurate real-time standards like the Universal Coordinated Time (UTC). This can be achieved through distributed physical clocks, which must not only be synchronized with each other but must also adhere to a physical time. However, due to differences in their respective rates, these physical clocks may diverge and thus need to be periodically corrected in order to ensure accuracy and reliability in distributed computing..

Page 39 (18m 45s)

[Audio] NTP (Network Time Protocol) has been used for a long time to synchronize clocks across the internet. There is a hierarchical structure to the time servers, with the root server synchronizing to UTC (Coordinated Universal Time) and secondary servers as backups. The lowest level of the tree consists of the clients, who receive time updates either from the primary server or from the secondary servers. Through this organized system, each client is always matched with the same reference time..

Page 40 (19m 19s)

[Audio] We've just finished discussing Distributed Computing, and there is much more left to learn and explore about this topic. I urge you to keep learning and sharing your findings with others. Until we meet again..

Page 41 (19m 33s)

[Audio] As we come to the final slide of our presentation, I'd like to talk about the importance of knowing when to press the "stop" button when it comes to covert recording in the workplace. Valemus Law offers some excellent advice on this subject, providing individuals with essential guidance on when to record and when to discontinue recording in the workplace. Thank you for your attention and bearing with us throughout the presentation..