Scalability


A discussion of scalability in Lilith, including timing results

Lilith's principle task is to span a tree of machines executing user-defined code. The tree structure is chosen to provide logarithmic scaling. Beginning from a single object, Lilith recursively links host objects on adjacent machines until the entire tree is occupied. The hosts propagate user code down the tree. The user code performs user-designated functions on every machine. Finally, the hosts propagate the results of the user code back up the tree. The user code can do further processing on the results from lower nodes as they are passed back up the tree. Overall, the user code undergoes a three phase process: distribution, execution, and result collection.

Three phase process for user code under Lilith: distribution, execution, and result collection. Lilith handles the details of the user code distribution and result collection as well as communication amongst nodes on the tree. The tree structure is used for scalability.


Tools can be created using Lilith by suitable development of the user code. The logarithmic scaling of Lilith allows Lilith-based tools to complete their tasks in just a fraction of the time that analogous tools undergoing serial queries and code distribution would demand: a one second operation performed on a thousand nodes would take 16 minutes under a serial distribution, as opposed to 10 seconds via a binary tree distribution.

Scalability greatly improves performance on large clusters.


To demonstrate the scaling behavior of Lilith we consider two cases: constant work per processor (increasing total work) and constant total work. In both of these cases, we first establish the tree. Then, the wall clock time is measured in the client, and the user code is sent to the tree. After the return to the client, the wall clock time is again measured and the results are tabulated for varying numbers of hosts in the tree. At each node, the user code is first passed to the children in the tree, the user code is then executed locally, and finally the results from the children are collected. Timings were generated on a 32 processor SGI Origin 2000, with one host per processor.

The plot of scaling behavior for constant work per processor, i.e., increasing total work covers two cases: the outer plot presents results for 100 random prime numbers generated per processor; inset is 500. This calculation was chosen since each node could be assigned the same seed, guarenteeing the same amount of work per node. We observe the overall expected logarithmic scaling. The step-like behavior occurs due to increasing tree depth through addition of hosts such that additional time is required for communications. The steps are still seen for the case of greater work/processor, although the curve flattens out. As the ratio of work to communications time increases, the work overwhelms the communications time causing all nodes to essentially run in parallel. It is no suprise that the greatest benefit of the logarithmic scaling thus comes in cases where the communications/work ratio is kept small.

Scaling behavior for constant work per processor, i.e., increasing total work. Larger plot is 100 prime numbers generated per processor; inset is 500. Scaling is overall logarithmic. Steps occur due to increasing tree depth such that additional time is required for communications. Steps are still seen for increasing work/communications time, though the curve flattens out.


The plot of scaling behavior for constant total work, i.e., decreasing work per processor shows the timings for calculation of a fixed total number of random numbers. In this case, the total number of primes generated per host decreases as the number of participating hosts increases. As expected the overall time for the calculation decreases and logarithmic scaling is again observed.

Scaling behavior for constant total work (a fixed total number of random number calculated), i.e., decreasing work per processor. Overall time for the calculation decreases with logarithmic scaling.