NZ HPC Applications Workshop 2012 presentations
About this page
An MPI N-body Close Encounter Code
Philip W. Sharp, University of Auckland, W.I. Newman, UCLA, K.R. Grazier (Jet Propulsion Laboratory)
N-body simulations of the Sun, the four major planets, and a large number n of small bodies such as asteroids and comets are used extensively to study the long-term evolution of the Solar System. In many applications, it is realistic to assume the small bodies do not interact with one another. The simulations are then easily parallelised over k processors by putting n/k small bodies along with a copy of the Sun and the four planets on each processor.
As described above, the parallelisation is trivial and close to linear speedup is assured when n is large. In practice, the parallelisation for most simulations is noticeably more complicated because small bodies are removed if they have a close encounter with the Sun or planet, or are ejected from the Solar System. This removal means different processors can have different numbers of small bodies, leading to load imbalancing.
Much of this imbalancing can be avoided by transferring asteroids between processors at simulation-dependent intervals. This transfer increases the communication costs for the simulation, leading to a trade-off between load imbalancing and wall clock time.
We used Open MPI to implement a parallelised version of our n-body code that handles close encounters, and tested the implementation on the NESI node at the University of Auckland. We report on the results of our tests.
HPC challenges in large scale bioengineering models.
Chris Bradley, Auckland Bioengineering Institute, University of Auckland, Thiranja Babarenda Gamage, Randall Britten, David Ladd, David Nickerson, Adam Reeve, Soroush Safaei, Xiani Yan, Ting Yu. All from the Auckland Bioengineering Institute, University of Auckland.
Over the last few decades there has been a dramatic increase in computational power that is available to scientists and engineers. This increase in power has enabled mathematical and computer modelling of physical systems to advance from looking at simple phenomenon in isolation to the analysis of complex coupled multi-scale and multi-physics problems. An example of such a model is an integrated model of the heart being developed at the Auckland Bioengineering Institute.
The integrated heart model couples together a number of different physical processes including the electrical activation of heart muscle, the mechanics of the muscle contractions and the fluid mechanics of the blood ejected from the heart as part of the heartbeat. This model must deal with multiple physical processes that occur on spatial scales ranging from µm to m and temporal scales ranging from ns to ks and above.
The multiple scales in the heart model generate a very large number of equations involving a very large number of degrees-of-freedom. The efficient solution of these equations presents a number of challenges in high performance computing. These challenges include performance and scalability, IO, adaptive meshing and dynamic load balancing in a distributed environment and the efficient use of high performance computer systems that involve a hierarchy of heterogeneous computing elements. This presentation illustrates these challenges in the context of an integrated heart model and discusses the development of OpenCMISS, a custom code developed at the Auckland Bioengineering Institute for solving bioengineering problems on high performance computers.
Integration of HPC and Global Biosecurity
Dr. Laura M. Boykin, Bio-Protection Research Centre, Lincoln University
The research to be presented will highlight the utility the HPC facilities available in New Zealand for species delimitation of highly invasive species. Species delimitation directly impacts on global biosecurity. It is a critical element in the decisions made by national governments in regard to the flow of trade and to the biosecurity measures imposed to protect countries from the threat of invasive species. Here we outline a novel approach to species delimitation, 'tip to root', for two highly invasive insect pests, Bemisia tabaci (sweetpotato whitefly) and Lymantria dispar (Asian gypsy moth). Both species are of concern to biosecurity, but illustrate the extremes of phylogenetic resolution that present the most complex delimitation issues for biosecurity; B. tabaci having extremely high intra-specific genetic variability and L. dispar composed of relatively indistinct subspecies. This study tests a series of analytical options, utilizing BeSTGRID, to determine their applicability as tools to provide more rigorous species delimitation measures and consequently more defensible species assignments and identification of unknowns for biosecurity. Data from established DNA barcode datasets (COI), which are becoming increasingly considered for adoption in biosecurity, were used here as an example. The analytical approaches included the commonly used Kimura two-parameter (K2P) inter-species distance plus four more stringent measures of taxon distinctiveness, 1) Rosenberg's reciprocal monophyly, P(AB) 2) Rodrigo's P(randomly distinct) 3) genealogical sorting index, gsi, and 4) General mixed Yule- coalescent (GMYC) which identifies the transition point from population-level to species-level divergence. For both insect datasets, a comparative analysis of the methods revealed that the K2P distance method does not capture the same level of species distinctiveness revealed by the other three measures; in B. tabaci there are more distinct groups than previously identified using the K2P distances and for L. dipsar far less variation is apparent within the predefined subspecies. A consensus for the results from P(AB), P(randomly distinct) and gsi offers greater statistical confidence as to where genetic limits might be drawn. In the species cases here, the results clearly indicate that there is a need for more gene sampling to substantiate either the new cohort of species indicated for B. tabaci or to detect the established subspecies taxonomy of L. dispar. Given the ease of use through Geneious plugins, similar analysis of such multi-gene datasets would be easily accommodated. Overall, the approach here is recommended where careful consideration of species delimitation is required to support crucial biosecurity decisions based on accurate species identification.
- A novel approach ('tip to root') to delineate species objectively is described herein; this approach can be effectively applied to any phylogeny.
- K2P genetic distances alone do not identify all taxonomic distinctiveness present in a given phylogeny; other measures such as P(AB), P(RD), gsi and GMYC are more useful in identifying taxonomic distinctiveness.
- A consensus of such analyses statistically supported five more distinct clades within the Bemisia tabaci mtDNA phylogeny than are taxonomically described to date, while delineation of Lymantria dispar subspecies remains problematic due to lack of phylogenetic resolution.
- As part of the regulation of trade under international biosecurity arrangements, the capacity to delimit species using DNA data, will have a direct bearing on whether trade takes place and will represent a significant departure from the current processes which are mostly reliant of morphological separation.
Long-Timescale Simulations of Urease Proteins Using Highly Parallel Code
Benjamin P. Roberts, University of Auckland, Billy R. Miller III (University of Florida), Kenneth M. Merz Jr. (University of Florida)
Parallel computing approaches greatly increase our ability to study the time evolution of complex systems of all sizes, ranging from the huge (e.g., weather systems) to the tiny (atoms and molecules). We present the application of a highly parallel molecular simulation approach to ureases, a family of bacterial, plant and fungal enzymes implicated in disruption to soil pH balance and stomach ulcers. Long-timescale (> 0.1 ¿s) simulations, requiring dozens of CPUs over a period of months, shed new light on these enzymes' dynamic behaviour. Of particular interest is a protein flap that appears to regulate substrate ingress to and product egress from the catalytic site. This flap, and other structural features of the protein, potentially offer novel targets for drug design, in an example of significant medical applications of high performance computing.
Making Computational Chemistry More Accessible
Joseph R. Lane, University of Waikato
Computational chemistry is one of the fastest growing chemistry disciplines and is now a core requirement of most leading international degree programs. However, many academics and students are hesitant to utilize computational chemistry methods because of both real and perceived technical barriers.
We have recently tried to address these issues and now teach computational chemistry at both the senior undergraduate and graduate level. Our graduate students are actively encouraged to use computational chemistry methods to complement their existing experimental research projects.
To manage an appreciable number of relatively inexperienced users, computational chemistry software is only installed on a central high performance Linux cluster. Users securely access this cluster from their Windows based desktop computers either on campus or at home using MobaXterm. MobaXterm is a free, portable (no install necessary) terminal and X server that incorporates most common Linux commands and network utilities. Encrypted SSH-keys are used for authentication, which are pre-configured with MobaXterm so that users simply click a bookmark to connect. Most of our computational chemistry users setup their jobs and interpret their results with the GaussView graphical user interface. As such, we have bootstrapped GaussView directly to the cluster¿s scheduling software with some simple shell scripts.
In summary, users can access computational chemistry software from any computer with a suitable broadband connection, setup their jobs, submit to the scheduler and finally interpret the results, completely avoiding the command line interface. This low barrier approach to computational chemistry has already resulted in several combined theoretical/experimental publications by graduate students with no particular expertise in high performance computing.
Multiple Device Parallel Acceleration for High Performance Applications
Daniel Playne, Director, Centre for Parallel Computing, Massey University
Graphical Processing Units and other devices have proved a valuable mechanism to speed up many applications using a data-parallel programming model that can accelerate a conventional CPU. It is becoming economically viable to host multiple GPUs on a single CPU host node, and clusters using this multi-GPU assisted node model are becoming prevalent. We describe our experiments with the hosting of up to eight GPUs on a single CPU using PCI bus extension technology and report on the attainable performance of a range of simulation and complex systems applications. We have explored a range of combinations of multi-core CPUs, running various thread management software systems to manage their multiple GPU accelerators.
We report on the scalability achievable using this approach and discuss its implications for future generation HPC systems including clusters and supercomputer facilities.
Parallel learning in neural networks using MPI
Lech Szymanski, University of Otago
Artificial neural networks are trainable models that are inherently parallel. This gives a certain freedom in the way their parallelisation can be carried out, but also makes it necessary to carefully consider the impact of the chosen strategy on the learning dynamics. In this presentation I will share my experience of writing multi-process code for training feed forward neural networks on HPC, which I developed during my PhD studies when needed to run simulations on very large datasets.
The tropospheric oxidizing capacity: Is it influenced by stratospheric change?
Olaf Morgenstern and Guang Zeng, NIWA, Lauder, New Zealand.
Ozone recovery and climate change are both anticipated to influence the composition of the troposphere, particularly its ozone burden and the oxidizing capacity. Increasingly, links between tropospheric chemistry and stratospheric changes are being recognized; these include transport of ozone-rich air into the troposphere, attenuation of tropospheric UV radiation by changing stratospheric ozone with an impact on photolysis, and tropospheric meteorology and its coupling with stratospheric dynamics. However, few studies have been published that adequately account for these links using a high-top climate model with a whole-atmosphere chemistry mechanism. I will present results from a study using the UKCA whole-atmosphere model, assessing the consequences of climate change and stratospheric ozone recovery on tropospheric chemistry. I will assess the HPC challenges which need to be overcome in modelling the whole-atmosphere chemistry-climate system.
Towards Petascale DFT-calculations: a new parallelization approach for linear response and exact exchange
Dr Nicola Varini, Supercomputing specialist, IVEC
Simulating complex materials and biological systems is nowadays a big challenge despite the availability of supercomputers. One of the most promising techniques used for study the properties of these systems is based on Density Functional Theory approach and its extension.
This technique is widely used in solid state physics where the sample is usually periodic. Unfortunately disordered solids and biophysical systems cannot be treated by using this assumption so a large supercell it is necessary.
Here I present a new parallelization approach aimed to overcome the limit of the traditional parallelization approach in plane-waves DFT codes. In the framework of PRACE I&II IP a new parallelization strategy over the electronic bands has been introduced. In order to show the efficacy of this approach two challenging kernels has been identified: the linear response (used to compute NMR spectra) and the Exact-Exchange (to evaluate the Fock exchange operator, in hybrid-DFT functionals).
Those two kernels are well isolated inside the Quantum-ESPRESSO distribution, and are constituted by several nested loops. Moreover, they are very time consuming, and any improvement towards increasing their scalability, is going to show up in the total wall time execution.
The benchmark results I report, prove that this parallelization approach achieve good scaling on many thousands of CPUs, and strongly advise to spread the band parallelization to the whole QE package. This strategy is a breakthrough in plane-waves DFT codes and it will be mandatory to fully exploit the petascale hardware and beyond.