ABSTRACT
Distributed application development requires the programmer to specify the inter process communication – a daunting task for the programmer when program involves complex data structures. The programmer should also explicitly handle any replication of data at a node to reduce network usage. Performance of the system is very sensitive to the various factors of the replicas. Consistency of the replicated data also burdens the programmer.
This project creates a middleware for java-based distributed application developers, providing transparency for distribution, replication and consistency of objects using Distributed Shared Memory (DSM). The DSM runtime systems intercept user accesses to remote objects and translate them into messages appropriate to the underlying communication media. The programmer is thus given the illusion of a large global object space encompassing all contributing nodes. DSM approach is attractive since most programmers find it easier to use, than a message passing paradigm, which requires them to explicitly manage communication values.
The system uses prediction to dynamically replicate objects and also change the location of object replicas according to different access patterns of the objects. The replica of each object is able to propagate, perish and migrate depending on object usage. Replication is transparent to the application developer. Also the middleware takes care of transparently maintaining the replicas in a consistent state using adaptive home based lazy release consistency (AHLRC) protocol
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
Developing applications over distributed systems is non-trivial. Such application requires the programmer to specify inter process communication. When complex data structures are involved, such distributed applications development is a daunting task for the programmer. The programmer has to explicitly manage the communication values along with the algorithmic development. Distributed middleware such as CORBA and .NET alleviates some of these problems by hiding lower level network issues from the programmer.
Replication is an important issue in distributed systems. Distributed object middleware do not address replication issues naturally. The programmer has to explicitly handle any replication of data at a node to reduce network usage. If replicas are well distributed, most accesses will hit locally and good performance can be achieved. If replicas are unevenly distributed, systems performance may be greatly degraded due to increased traffic caused by updating and unnecessarily repeated fetching from the replicas present at other nodes. Hence replication of objects is a key factor in the performance of distributed applications.
Maintaining the replicas in a consistent state is also an important issue in distributed systems. Maintaining the replicas in a consistent state and synchronizing all the replicas is also explicitly managed by the programmer. Hence application development in a distributed environment is a daunting task. Adequate infrastructure that provides a sufficient level of abstraction is necessary.
Distributed Shared Memory (DSM) is an attempt to combine the simplicity of shared memory programming with inexpensiveness of message passing implementation. This idea of emulating a cache coherent multiprocessor by using the virtual memory mechanism was proposed in [1], [2]. DSM provides an illusion of globally shared memory, in which process can share data, without the application developer needing to specify explicitly where data is stored and how it should be accessed. This approach is attractive since most programmers find it easier to use than a message-passing paradigm, which requires them to explicitly partition data and manage communication. With a global address space, the programmer can focus on algorithmic development than on managing partitioned data sets and communicating values. In distributed shared memory systems (DSM), replication and consistency are the key issues that are handled extensively. DSM systems also focus on reducing the communication required for consistency maintenance. It provides the software implementation of more relaxed for of consistency.
Recent increases in PC performance, the exceptionally low cost of PCs relative to that of workstations and the introduction of advanced PC operating systems combine to make networks of PCs an attractive alternative for large scientific computations. Recent improvements in commodity general-purpose networks and processors have made networks of PCs an inexpensive alternative to large monolithic multiprocessor systems. By providing an abstraction of globally shared memory on top of the physically distributed memories present on networked workstations, it is possible to combine the programming advantages of shared memory and the cost advantages of distributed memory. These distributed shared memory (DSM), or shared virtual memory (SVM), runtime systems transparently intercept user accesses to remote memory and translate them into messages appropriate to the underlying communication media [31]. The programmer is thus given the illusion of a large global address space encompassing all available memory as seen in Figure 1.1.
Figure 1.1 Distributed Shared Memory
There are several factors that limit the performance of shared virtual memory (SVM). Software handlers and expensive network communication between clusters to maintain data consistency greatly limits system performance. There are two performance improvements avenues: relaxed consistency models which aim at reducing the communication traffic and additional hardware support provided in the communication architecture which can reduce the cost of communication. Since the first solution increases the programming complexity, while the second one increases the cost of the system, the research challenge was to determine how far to go in pushing for better performance without compromising the advantage of the software approach
1. 2 LITERATURE SURVEY
Researchers have proposed many relaxed consistency models. The first shared virtual memory (SVM) implementation [2] used sequential consistency (SC) [4] model, which meant that coherence operations had to be propagated immediately and processes had to wait for memory operations to complete before moving on to new ones. Progress was slow until the release consistency (RC) model [5] breathed new life into the software approach in the early 1990s and lead to eager release consistency (ERC) implementation in Munin [6] and lazy release consistency (LRC) [7] in TreadMarks. Entry consistency (EC) [8], home-based lazy release consistency (HLRC) [9] and scope consistency (ScC) [10] are other relaxed consistency models.
In eager release consistency (ERC) a processor delays propagating its modifications to shared data until it comes to release the lock on data. At that time it propagates the modifications to all other processors that cached the modified pages. But in lazy release consistency (LRC) the propagation of updates is further delayed until next acquiring of the lock on data. And only the processor that has acquired the lock is propagated the updated data.
HLRC [9] is a variant of the lazy release consistency (LRC) protocol [7] that requires no hardware support and can be easily implemented on workstation clusters or multi computers with traditional network interfaces. For these reasons, HLRC has been used in many software DSMs, including Tuple Spaces [3], GeNIMA [11], ORION [12], SHRIMP [13], ADSM [14], and KDSM [15].
Good performance results have been reported using these models. Software DSM protocols such as lazy release consistency are able to minimize false sharing and subsequent network messages by delaying the propagation of page invalidations or updates until the latest possible time. However, these protocols introduce substantial memory and other coherence-related overhead.
Home-based software DSM [16] provides a conceptually simpler way to build software DSMs. LRC systems maintain changes to shared pages locally, and multiple messages may be necessary to bring a stale page up to date. HLRC protocols, on the other hand, require changes to be flushed to a designated home node (assigned on a per-page basis). Requests to bring a stale page up to date can be satisfied with a single message to the home node, and such messages result in the entire page being sent back to the requester. HLRC has several advantages over LRC. First, the average critical path delay of each page access fault is reduced to one round trip. Second, coherence-related metadata for each page is less. Finally, memory overhead on each node is smaller because local page versioning is not required.
1.2.1 Home-based Lazy Release Consistency (HLRC) Protocol
The key idea in the HLRC protocol is that one node is assigned to be the home node of each shared page. Home node is a node where the page resides. Shared pages are invalidated on non-home nodes as required to maintain consistency. Accesses to invalid pages on non-home nodes require a fetch of the updated page from the home node. Details of the protocol can be found in [16].
In HLRC, each shared page is assigned a single home node, which typically does not change. Therefore, initial distribution of home nodes is important for good performance. Round robin, first touch, and block distribution are all examples of common page distribution algorithms. Some systems allow the application programmer to set the home node for a given shared address range in an attempt to assign the best home node for each page. As an example of the results of poor home assignment, suppose node 0 is initially assigned to be the home node for shared page i, however it never accesses the page. If node 1 reads and writes page i frequently, the home assignment is detrimental to performance since node 1 has to repeatedly fetch the whole page from node 0. Node 0 is interrupted frequently by incoming updates for that page from node 1, which also hinders forward computational progress.
Home-based software DSM system performance is very sensitive to the distribution of home pages. If the homes of shared pages are well distributed, most accesses will hit locally and good performance can be achieved. Otherwise, system performance may be greatly degraded due to increased traffic caused by updating home nodes and unnecessarily fetching pages repeatedly from the same home node.
1.2.2 Adaptive Home Protocols
There have been many adaptive protocols that seek to reduce the impact of poor home node distribution [12], [14], [17], [18], [19], [20], [21]. The idea behind these mechanisms is to detect specific application sharing patterns such as one producer-arbitrary consumer(s) [12], migratory [14], single writer [14], [19], etc. discussed in section 1.3.4, and redistribute the home pages accordingly in those specific cases. Although these schemes can achieve some performance improvements, they are tailored for specific memory access patterns and are not able to solve home node assignment problems in other memory access patterns such as multiple writer cases. As an example, consider two nodes that write to the same page frequently. In home-based software DSMs with HLRC and the above adaptive variants, at most one writer can be the home, and the other node still has to fetch the updated page from that home node when it wants to access it. The page fetch is still on the critical path of the second node, which prevents further performance improvement. Moreover, if the home node is initially neither of the two writers, it is difficult for the above adaptive protocols to decide how to migrate the home node for the best optimization, limiting performance improvement in those cases.
To the best of our knowledge, all adaptive HLRC protocols suffer from the following two limitations: (1) The protocols change home-distribution only after a specific memory access pattern is detected; therefore, home-redistribution lags behind changes in the memory sharing pattern. (2) Many adaptive protocols only deal with specific memory access patterns such as single writer or single producer-multiple consumer patterns. The performance may degrade for dynamically changing memory access behavior and other general memory access patterns such as multiple-writer, which are nevertheless common in parallel applications [22].
1.2.3 Adaptive HLRC (AHLRC)
Adaptive HLRC [23] is a home-based protocol to make the redistribution of home pages general enough to be applied to any sharing access pattern. Like HLRC, each page is assigned a home node, and changes to shared pages are propagated to each home node at release synchronization events. Similar to the variants with adaptive mechanisms, AHLRC is able to detect memory access patterns and change the home page distribution accordingly. However, in AHLRC every shared page can have more than one home node, with each home node maintaining an updated copy of the page after synchronization. In AHLRC, every node adaptively decides to be a home node of each specific shared page independent of other nodes participating in the computation. Home pages are expected to be redistributed better for general memory sharing patterns, including migratory and single-writer cases discussed in section 1.3.4. Such redistribution is based on predictions made by local online home predictors, not system-wide sharing pattern detection [12], [14], [18]. Consequently, AHLRC is able to redistribute home changes quickly and without costly global coordination between nodes. Hence AHLRC is a good candidate for the system.
1.2.4 Object based DSM on middleware
Distributed shared memory is implemented using one or more combinations of specialized hardware, conventional paged virtual memory or middleware. Hardware based solution are costly, and paged virtual memory implementation are suited to a collection of homogeneous computers, with common data and paging formats.
On the other hand language such as Orca [24] and middleware such as Linda [25] and its derivatives JavaSpaces [26] or TSpaces [27] support forms of DSM without any hardware or paging support, in a platform-neutral way. In this type of implementation, sharing is implemented by communication between instances of the user-level support layer in client and server. Processes make call to this layer when they accesses local data items and communicate as necessary to maintain consistency.
Object based DSM have better performance than a page based DSM due to larger granularity of sharing in page based DSMs [28] due to false sharing. Object based DSM alleviates the problem by more fine-grained sharing. Example of object based DSM include Linda [25], JDSM [29], as well as object based DSM in .NET environment [30]. Hence object based middleware is a good candidate for the system.
1.3 OBJECTIVES
The goal is to design and implement a software DSM system called HDSM that is an object-based middleware for java that uses the adaptive home-based lazy release consistency protocol (AHLRC) [23]. The Adaptive Home based Lazy Release Consistency is inspired by the research in AHLRC [23]. But the work was on page-based software DSM. The novelty of this work is to borrow from AHLRC and apply it to object-based middleware for java. The developer will be able to use the HDSM middleware for developing java based distributed applications, without specifying the inter process communication, without specifying creation, migration and perishing of replica and without specifying consistency maintenance.
1.4 HDSM SYSTEM LAYERS
The local HDSM API will provide the necessary functionality. The various layers of the HDSM middleware are as seen in figure 1.2.
Figure 1.2 HDSM system layers
The HDSM middleware shall provide the functionalities transparently to client application. The client application will use the local HDSM API for accessing the middleware. The middleware will provide the transparency for distribution of objects, transparency for replication of objects and transparency in maintaining objects in consistent state. The architecture of the system is seen in the figure 1.3. Sample distributed application for java objects using HDSM is discussed in section 4.2.
The APIs provided by HDSM middleware are: creating new object in HDSM, getting object ID for the objects in HDSM, reading objects from HDSM, writing to object in HDSM and removing object from HDSM. These APIs will be used by the distributed application developer without handling any inter-process communication, replication issues or consistency of the replicated objects. The middleware will handle these issues for the application developer.
The HDSM middleware contains four layers. The client’s distributed applications are written in the bottom layer called Client Application layer. The client application will directly use the Local HDSM API available at each contributing node. These HDSM APIs are provided by the HDSM API layer. The third layer is the Local HDSM Manager layer which takes care of all the local HDSM middleware operations. The fourth layer is the HDSM Home Set Manager layer which joins all the contributing nodes in to HDSM.
The Local Coordinator coordinates all the Local HDSM Manager layer operations. All the objects at a home node are stored in the Local Object Store. All the prediction related data is stored in Local Predictor Store. The Local lock Manager handles all the lock for the objects present at current node. Online Home Predictor does prediction for the objects present at the current node. Online Home Statistics Recorder records all the prediction related data into Local Predictor Store. Remote Reader allows non-home nodes to read objects from home node. Remote Object Requester performs the remote read operation from the non-home node to a home node.
Update Sender, Multicast Receiver, and Acknowledgement Receiver are for performing multicast operation during a write operation. Update Sender sends all the multicast messages. Multicast messages are lock request, unlock request, updated object, lock acknowledgement, unlock acknowledgement, and update acknowledgement. Multicast Receiver receives all the lock, unlock and update messages from updating nodes. Acknowledgement Receiver receives lock acknowledgement, unlock acknowledgement, and update acknowledgement sent from home nodes.
Home Set Coordinator coordinates all the HDSM Home Set Manager layer operations. Home Set Data stores all the home set related data. Home Set performs the home set related operations in the Home Set Data. Nodes List has the list of home nodes for an object.
Figure 1.3 HDSM system architecture
1.4.1 Object Updating on Multiple Home Nodes
In the system there can be multiple home nodes for the same object. Therefore, shared objects must be kept updated on all home nodes when required to do so by the coherence protocol. To achieve this, a set of homes (the object home set) is maintained for each shared object to record the current list of home nodes. When updates are sent out, they must be propagated to all nodes in the home set, and each home node applies the update to its copy of the object. Since every home node keeps an updated copy, when a non-home node wants to fetch the object, it can do so from any of the available home nodes. This strategy eliminates possible “hot spots” in HLRC, as fetch requests for the same object are not necessarily directed to a single location. When a node needs to fetch a copy of an object from a home node, the system currently selects a random node from the home set from which to fetch the object.
1.4.2 Online Statistics Recording
Home nodes in HLRC do not incur the delay of remote object fetches since a object is always up-to-date on a home node. However, the home node is frequently interrupted by incoming updates sent by other nodes, and must apply these changes. Similarly, a non-home node saves the time of receiving and processing updates, but it has to fetch whole objects from the home node when it accesses an invalid object. Consequently, if a node accesses a particular shared object very frequently, better performance would likely be achieved were it a home node; on the other hand, if a node accesses a shared object rarely, that node should not be a home node for that object. Therefore, the system compares the cost of being a home node (i.e., the object updating time tupd, including time to receive object updates and apply those updates to the object) with the cost of not being a home node (i.e., the object fetch time tfetch, including time to send out object request, wait for incoming updated copy and apply that copy to the object). In other words, if tupd > tfetch, then the node should not be a home node during the current interval; if tupd < tfetch, then the node should be made a home node during the current interval.
In order to make this comparison, the node must know the object fetch time (constant to a first-order approximation for a given system), and the object update time. The node dynamically measure tupd by recording (V, t) on the home node, where this pair represents the total object update time between the current object version number and the last object version number. The object version number is updated on each home node after processing all updates flushed from other nodes.
1.4.3 Online Home Prediction
When a node first accesses a shared object after a release synchronization event, it uses a local online home predictor to determine whether or not to become a home node, drop from the home set, or do neither. Normally, memory-sharing patterns in applications are strongly correlated with past history. Thus, predictions made based on past history are fairly accurate [31]. Also, since the decision is one of two possible outcomes, “to become a home node” or “to drop from the home set”, a two-level adaptive branch predictor [32] is a good candidate for the online home predictor. In HDSM implements the online home predictor in terms of a Pap branch predictor in which each shared object has a separate history register (HR) that indexes a separate pattern history table (PHT) for the object, and each PHT entry is a saturating counter. By comparing the indexed PHT entry and a pre-defined threshold, a binary prediction is generated. Afterward, the PHT entry and the HR will be updated according to the predicted and real outcome.
Online Home Prediction on a Home Node
Suppose the current version number of object i is Vi,curr, and the version number when a home node last accessed this object is Vi,last. The home node retrieves the object update records and calculates the actual total object update time: tupd = ?last
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more