Skip to main content

eResearch Symposium 2010 Presentations and Recordings

eResearch Symposium 2010 opening speech

Dr. George Slim
Acting General Manager
Science Group
Ministry of Research Science + Technology
00:00
00:00

In the event's opening speech , Dr George Slim of MoRST talked about how,  in the future, all research will be eresearch, and the need to start dealing with it now lest we all face problems with it in the future. MoRST has been involved with the field over the last few years,  both with  'easy wins' such as video conferencing and more under-the-radar work such as data access management.  He also talked about problems such as unpublished data languishing in archives and desk drawers, rather than being part of the knowledge base of science. 

Photos: 
eResearch Symposium 2010 opening speech

The Past, Present and Future of Research Data

Dr. Andrew Treloar
Director of Technology
Australian National Data Service
+61 3 990 2057
00:00
00:00

Research data is increasingly becoming important in its own right, not just as the means to deriving a publication. We have been dealing with the data deluge since the turn of the millennium, and the scale of the challenges continue to increase. This presentation will review how we got to where we are today, looking at the pivotal role of data and data management in the history of communication. It will then move to consider the present role of data in scholarly communication by examining a range of problems in the published literature. It will conclude by examining some of the initiatives being taken to start to fix the future of data, and the sorts of services and approaches that will be required.

Presenter Bio: 

Dr Andrew Treloar is the Director of Technology for the Australian National Data Service (ANDS) (http://ands.org.au/), with particular responsibility for the Data Capture and Metadata Stores programs. In 2008 he led the project to establish ANDS. Prior to that he was associated with a number of e-research projects as Director or Technical Architect: ARCHER (http://archer.edu.au/ - an e-Research support environment), DART (http://dart.edu.au - data acquisition and analysis), and ARROW (http://arrow.edu.au/ - institutional repository software), as well as the development of an Information Management Strategy for Monash University (http://www.monash.edu.au/staff/information-management/). His research interests include data management, institutional repositories and scholarly communication. He never seems to be able to make enough time for practising his cello, or reading, but does try to prioritise talking to his chooks and working in his vegetable garden and orchard. Further details at http://andrew.treloar.net/.

Resources: 
Photos: 
The Past, Present and Future of Research Data
The Past, Present and Future of Research Data
The Past, Present and Future of Research Data
The Past, Present and Future of Research Data
The Past, Present and Future of Research Data
The Past, Present and Future of Research Data

Cyberinfrastructure and the Environmental Sciences

Prof. William Michener
Director of e-Science Initiatives
University Libraries
University of New Mexico
00:00
00:00

The data challenges in the environmental sciences lie in discovering relevant data, dealing with extreme data heterogeneity, and converting data to information and knowledge. Addressing these challenges requires new approaches for managing, preserving, analyzing, and sharing data.  In this talk, I introduce several environmental science challenges and relate those to current cyberinfrastructure challenges. Second, I introduce DataONE  (Data Observation Network for Earth), which represents a new virtual organization that will enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it. DataONE encompasses a distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data. Third, I conclude by presenting several opportunities for international collaboration in the environmental sciences and cyberinfrastructure areas.

Presenter Bio: 

William Michener is Professor and Director of e-Science Initiatives for University Libraries at the University of New Mexico. He has authored four books related to ecological informatics and more than 70 journal articles and book chapters. He is a Certified Senior Ecologist and serves as Editor of Ecological Archives and Associate Editor of the International Journal of Ecological Informatics. He has directed several large interdisciplinary research programs including the National Science Foundation’s (NSF) Biocomplexity Program, the Development Program for the U.S. Long-Term Ecological Research Network, and numerous cyberinfrastructure research and development projects. His current efforts focus on developing information technologies for the biological, ecological, and environmental sciences through DataONE—a large, multi-institutional, international research project funded by NSF.  

Resources: 
Photos: 
Cyberinfrastructure and the Environmental Sciences
Cyberinfrastructure and the Environmental Sciences
Cyberinfrastructure and the Environmental Sciences

A City of Four Million People?

Dr. Shaun Hendy
Deputy Director
Advanced Materials and Nanotechnology
MacDiarmid Institute
+64 4 463 5809
00:00
00:00

Most commentators agree that the only way forward for New Zealand is to forge a high-productivity knowledge-based economy. However, in the late twentieth and early twenty-first centuries, it is the large global cities that have driven innovation and the generation of knowledge. If New Zealand is to take the high productivity path, it must overcome its geographical isolation and low population density by learning to act like a city of four million people. In this talk, I will discuss the nature and magnitude of this challenge by looking quantitatively at innovation and the generation of knowledge around the world. I will discuss how eResearch will play an essential role in building scale and collaboration within New Zealand and in extending the Kiwi knowledge network around the world.

Presenter Bio: 

Dr Shaun Hendy is the Deputy Director of the MacDiarmid Institute for Advanced Materials and Nanotechnology (based at the School of Chemical and Physical Sciences), and a Distinguished Scientist at Industrial Research Ltd. His PhD in physics from the University of Alberta was followed by a NZ Science and Technology post-doctoral fellowship at Industrial Research Ltd in Wellington. He joined Victoria University in 2003. Shaun writes a blog, 'A measure of science' as part of Sciblogs.co.nz a hub for New Zealand's science bloggers. Shaun also has a regular slot on Radio New Zealand Nights as physics correspondant.

Resources: 
Photos: 
A City of Four Million People?
A City of Four Million People?

Personalized Medical Genomics

Dr. Nicole Cloonan
Senior Research Officer
Queensland Centre for Medical Genomics
University of Queensland
+61 7 3346 2088
00:00
00:00

Cancer is Australia's largest disease burden, and arises as from the accumulation of genetic damage. Typically cancers accumulate multiple mutations, and these will vary from one cancer type to another, from person to person, and may even vary between different tumour sites in the same person. This variation could mean the best treatment for one patient might have no effect for another, or that a treatment that worked in the past might have no effect upon on a cancer relapse. The ultimate dream for cancer patients would be to determine exactly what mutations caused the disease, and exactly what treatments would work the best - a concept known as personalized medical genomics. Although conceptually simple, collecting, storing, and analysing the large scale biological data generated as part of medical genomics studies represents a huge informatics challenge - eclipsed only by the challenge of integrating this data with existing biological resources and knowledge.

Presenter Bio: 

Nicole Cloonan is an ARC Fellow working with Professor Sean Grimmond at the Queensland Centre for Medical Genomics (QCMG), based at The University of Queensland. Her work is multi-disciplinary in nature, involving computational biology and bioinformatics, biochemistry, cell biology, and molecular biology - all of which she uses to understand the complexity, function, and systems biology of RNA. Recently, she has worked at the intersection of genomics and bioinformatics to establish the technology enabling complete surveys of RNA, DNA, and epigenome content in mammalian systems, work that has been fundamental to the contribution of the QCMG to the International Cancer Genome Consortium.

Resources: 
Photos: 
Personalized Medical Genomics
Personalized Medical Genomics
Personalized Medical Genomics

Novel computational approaches to understanding tumour biology

Cristin Print
Associate Professor
School of Medical Sciences
University of Auckland
+64 9 373 7599 ext 85062
Dr. Mik Black
Senior Lecturer
Department of Biochemistry
Otago University
+64 3 479 7831
00:00
00:00

Deep engagement between biologists, clinicians and computational experts greatly increases the amount of biological understanding we can gain from high-­- content human data. In an example of this approach, we have performed a meta-­- analysis of breast cancer microarray data from around the world. Several novel analytic methods were applied to this data set, most of which would not be feasible without the use of collaboration tools, grid computing and high performance computing.

Drs Lance Miller (Wake Forest University, USA), Anita Muthukaruppan (University of Auckland) and Mik Black (University of Otago) assembled and annotated a collection of Affymetrix microarray datasets comprising breast tumours from 950 women (including NZ women) with extensive clinical details. This data set was large enough to allow studies of small subgroups of breast cancer patients that previously we could not explore with any degree of statistical power. Four novel analysis methods were developed in collaboration with clinicians and applied to this data set, to answer key questions about breast cancer:

1) What transcription factors in tumours are most relevant to the survival of breast cancer patients?

2) What gene networks are active in breast cancer patients?

3) What clinically significant genes are amplified in breast cancer?

4) What genes modulate transcription factor activity in breast cancer?

The use of these novel, computationally intensive methods to analyse a large clinical data set provides a good example of the power of generating a small collaborative eResearch community focused on a specific problem. With publicly available collections of clinical and molecular data continuing to grow rapidly, there are tremendous opportunities for biological discovery using approaches such as those outlined here: where eResearch tools are essential for this work.

Resources: 
Photos: 
Novel computational approaches to understanding tumour biology
Novel computational approaches to understanding tumour biology
Novel computational approaches to understanding tumour biology

Computing to stimulate academic drug discovery

Dr. Jack Flanagan
Senior Research Fellow
Auckland Cancer Society Research Centre
University of Auckland
+64 9 373 7599 ext 86155
00:00
00:00

The classical approach to drug discovery and development is to test large collections of chemical compounds for therapeutic activity in "Wet Labs" in solution within biological assays that report on a disease specific target. Once compounds active in the assays, "hits", have been identified a medicinal chemistry programme is initiated that explores the chemical space around a molecule by making a large series of directly related compounds, known as analogues. It is the resulting chemical structure and biological activity relationship that identifies the drug lead. This is known as hit to lead development, and while this phase can be accomplished within an academic setting, engaging in hit discovery is much less accessible, limited by the absence of High Throughput Chemical Screening facilities in New Zealand, and the cost of accessing Australian facilities.

When the 3-dimensional structure of a target is known at atomic resolution, it is possible to use this information to screen digital libraries of compounds by matching computed physico-chemical properties - this process is known as virtual screening. This virtual screening process is a digital equivalent to the High Throughput Chemical Screening approach, and with low setup costs represents a more readily accessed discovery platform capable of stimulating wet lab drug discovery within an academic setting by identifying small numbers of likely active compounds.

Within some of the drug discovery and development programmes at the Auckland Cancer Society Research Centre, virtual screening has proved useful for new "hit" discovery where the atomic structure of the target was known, with one screen taking approximately 7 weeks to complete on a desktop machine using 1 cpu. In a collaboration between the Auckland Cancer Society Research Centre and the Centre for eResearch, we are building on this discovery success by using the high performance computing environment provided by BeSTGRID to develop a large scale virtual screening environment based on the Grisu framework that will facilitate an increase in hit discovery performance in the University of Auckland environment.

Early results are showing significant improvements in time to discovery, with the current increases in scale of computing environments leading to a 7x speed up in analysis run times. Moreover, it has facilitated a change in the use of virtual screening, to include concurrent focussed screening around specific chemical features of hit compounds. Our current plans further increase the scale of analysis possible, both in terms of the digital libraries of compounds and with respect to the number of projects enabled by both Grisu and the scientific technology.

Photos: 
Computing to stimulate academic drug discovery

Computer aided drug design. Defining known drug space from first principles

Dr. Johannes Reynisson
Lecturer
Chemistry
University of Auckland
+64 9 373 7599 ext 83746
00:00
00:00

The use of calculated molecular descriptors is now an integral part of drug design. Nowadays medicinal chemists can rely on defined areas within chemical space such as fragment space, lead-like chemical space, drug-like chemical space and known drug space based on these descriptors. The molecular descriptors reflect the physicochemical properties of the chemicals under investigation, which in turn affects their pharmacokinetic profile. The molecular descriptors employed are based on empirical data or are even simple counts of phenomena in the chemical structure leading to a large theoretical uncertainty. In order to put the definition of regions within chemical space on a robust theoretical footing quantum chemical calculation were performed, which can be directly related to the physical properties of the molecules under investigation, e.g., polarisability and dipole moments. In order to perform these calculations a robust computational infrastructure is essential.

Resources: 
Photos: 
Computer aided drug design. Defining known drug space from first principles

Leveraging BeSTGRID to power biomedical systems research

Dr. Mike Cooling
Research Fellow
Bioengineering Institute
University of Auckland
+64 9 373 7599 ext 81492

Biomedical problems are often found in complex environments consisting of vast numbers of interrelated components at multiple temporal and spatial scales. The exponentially expanding number of possible interactions renders the development of understanding of many phenomena impossible through conventional experimentation alone. Systems Biology offers an alternative approach using computational analysis of quantitative models in order to discern the causes and effects of emergent behaviour in complex systems. Adopting this strategy of enquiry, the limiting factor for analysis leading to insight generation becomes one of available computer processing cycles. At present, our science is artificially limited to the questions we can tractably pursue in the computer time available.

BeSTGRID, providing online access to distributed computer resource, represents a significant scientific opportunity. Here we have developed a prototype model analysis software tool and user interface for developers of quantitative biomedical models. The tool understands the model-exchange protocol CellML, a proven format for Systems and Synthetic Biology. Each model is marked up with metadata providing simulation instructions, and instances of the model, each parameterised with pre-determined parameter sets, are scheduled on BeSTGRID via a Java-based user interface built using the Grisu framework. These BeSTGRID jobs are then computed using the open-source CellML Simulator, before time-course results are compiled and returned to a user-accessible area.

To verify the applicability of this prototype, we conducted a duplicate of the computationally expensive portion of the analysis of a significant intracellular signalling system in cardiac myocytes - previously performed using more limited computational resources. Briefly, a model of a signal transduction pathway was analysed to determine the molecular species and reactions most significant to cells’ production of a key signalling molecule (inositol 1,4,5-trisphosphate). Testable parameter sets were determined via the Morris method as previously, and BeSTGRID jobs scheduled to perform the ~600,000 model simulations required. As an indicator, the simulation computation was completed in less than 50 hours real-time on BeSTGRID, as opposed to 2 weeks using the high performance computing platform in the original study.

The ~6.7-fold decrease in the time taken to generate results for signalling research scientist enables the examination of systems of greater complexity. Where it was possible to analyse one pathway previously, we can now examine the behaviour of a network of 6 or 7 interacting pathways. Moreover, since we have used a generic model format and analysis technology, research speed increases here are applicable not just to signalling but any biomedical research model that can be expressed in CellML, including tissue- and organ-level work.

The increase in computational power will now enable additional innovations in model simulation technology. Our future work includes developing the simulation software and associated metadata specifications to include more flexible specification of the simulations and post-processing. This will facilitate the examining of not just one cellular activity at a time, but life-cycles of biological entities such as cells, tissues, organs or complete organisms. This is essential to the development of meaningful biomedical insights into the functioning of whole systems of life.

Photos: 
Leveraging BeSTGRID to power biomedical systems research

Remote research environment

Chris Myers
Chief Infrastructure Architect
VeRSI
+61 3 8540 4226
00:00
00:00

These service all form part of the VBL (Virtual Beamline) program of VeRSI and collaborations from the Australian Synchrotron, La Trobe University, Bragg at ANSTO, Monash University, ANU and CSIRO.

Within this talk we will show case (with a live demo) our remote experimentation environment to access high value research asset’s within the Australian Synchrotron and La Trobe University demonstrating remote control, collaboration, oversight Data transfer and metadata extraction technologies.

We will also show examples of metadata extraction using the MetaMan service across multiple instruments (The Australian Synchrotrons IR Beamline, MX1/2 Beamline, Powder diffraction Beamline and La Trobe Universities XPS instrument) data file formats and integration within the protein crystallography disciple to MyTARDIS (developed by Monash University) and the distribution of the data and metadata across its federated nodes. MetaMan can also be used to rescale and send image information to our high resolution multi screen display.

We will also demonstrate remote analysis of proprietary Bruker spectra taken on the Infrared Beamline at the Australian synchrotron using NoMachine NX remote access solutions to execute Bruker’s Opus software and also to execute ANU’s Drishti Software to visualize CT reconstructions taken from the Australian Synchrotron’s Medical Imaging Beamline that has been processed by CSIRO’s XLI software on the MASSIVE 0 test cluster.

All these tools, services and techniques have been integrated within or to work with in collaboration with the VBL remote research environment.

Photos: 
Remote research environment

Coordinated Services Initiative (CSI) for (e)Research: Bridging the inch wide mile deep gap

Kate Nolan
Research Development Advisor
Massey University
+64 6 350 5701 ext 81326
00:00
00:00

A divide exists between researchers and support services. Yet support services have potential to positively transform and enhance the Research/Project experience. In a research landscape shaped by the information era, this is particularly true. We produce more data and information than we can manage or use and we have new expectations about how we should or can work.  

1. Researchers work across discipline boundaries and manage complex multi-disciplinary and multi-institution projects including the complexity and quantity of data and information they generate; 

2. Researchers collaborate, share ideas and information, create synergies for knowledge production and create new kinds of knowledge; 

3. Researchers need administrative and management platforms of greater power and sophistication to support and facilitate new research enterprises, and the potentially paradigm changing approaches they are inventing. 

The Coordinated Services Initiative (CSI) is a new approach being piloted at Massey University. It aims to create a relationship between researchers and support services which benefits research project development capability and capacity.  CSI offers coordinated support that improves upon the existing silo based provision of core services, facilitates open communication between research academics and support services, and creates scope for joint project collaboration at concept stage and beyond. The coordinated service team includes (as required), Research Management Services (RMS), Information Technology Services (ITS), Library, Graduate Research School (GRS), Ethics, People and Organisational Development (POD), Centre for Academic Development and eLearning (CADeL) and Marketing.

The CSI approach is being piloted across two distinct projects. 

(i) The Global Entrepreneurial Leadership (GEL) project to deliver training and professional development, work with stakeholder groups to create research and development opportunities and work alongside communities of professional practice.

(ii) The ‘Manawatu Our Region Our River’ project engages Massey University in community-based research linked to our region

This presentation documents and shares the CSI story so far.

Photos: 
Coordinated Services Initiative (CSI) for (e)Research: Bridging the inch wide mile deep gap

Building e-Research Services, Data Capture and Metadata Storage

Russel Sim
Senior Software Developer
Monash e-Research Centre
Monash University
+61 3 990 20795
00:00
00:00

The Australian National Data Service (ANDS) is a centralised repository containing inter-linked records of public funded data, researchers and grants. It’s purpose is to promote data reuse and collaboration by making general descriptions of that data available to other Researchers.

The Monash e-Research Centre (MeRC) core competency is to provide the "bridge", the "link" or glue which facilitates between Researchers, ITS Service Division, Monash Library, Faculty of IT and DVC Research; and to provide the added value to ensure that the ICT Solution chosen meets the Researchers requirements. MeRC’s key areas are Collaboration Services, High Performance Computing, Data Storage and Management and Visualisation Services.

ANDS has funded the Monash University Data Capture and Metdata Store Program and MeRC has identified 8 areas that can benefit from the ANDS program of work:

* Climate and weather - Storage and Sharing of data.

* Ecosystem measurements - Storage and distribution of CO2 data

* Molecular biology - Storage and distribution of x-ray crystallography

* Multimedia collections and ARROW - Publishing multimedia collections

* History of Adoption - Automation & publication of data

* Interferome - Integration and analysis of data

* Microscopy - Storage and processing of data

* General Metadata Store Infrastructure

Each project can be classified under the following digital curation model:

* Data Capture e.g. Instrument, experiment, raw data or processed data

* Data Management e.g. store, retrieval, annotate, search etc..

* Data Re-Use e.g. linking other experiments if different disciplines e.g. ANDS Service

The project management method chosen to manage these projects is through PMBOK and the software management method used is AGILE/SCRUM. Seven of the eight projects focus on specific client interactions while the eighth is a general case which a generic solution will be developed for most disciplines. Projects chosen are diverse in nature and offer examples of how research data can be digitally collected, recorded and catalogued thus demonstrating how researchers can benefit from better organised and annotated data.

Photos: 
Building e-Research Services, Data Capture and Metadata Storage

Challenges, successes, and prospects in developing open-access global biodiversity databases

Dr. Mark Costello
Associate Professor
Leigh Marine Laboratory
University of Auckland
+64 9 373 7599 ext 83608
00:00
00:00

There are about 1.9 million species described on Earth, with several times this number of species names; including common names, misspellings, and multiple scientific names applied to the same species. This knowledge may represent only half of all species on Earth. No single person can be knowledgeable about more than a fraction of this number, necessitating the need for hundreds of experts to quality control nomenclature in global biodiversity. Thousands of experts are required to expand the biodiversity content into ecology, physiology, and other areas of biology. In turn their knowledge builds on millions of publications over four centuries. The past decade has seen the emergence of open-access online biodiversity databases providing authoritative information on species taxonomy (e.g. Species 2000, World Register of Marine Species), information on introduced pest species (e.g. Global Invasive Species Database, Delivering Alien Invasive Species Information for Europe), and data on the geographic distribution of species (e.g. Global Biodiversity Information Facility, Ocean Biogeographic Information System). 

Here, we provide examples of how these databases can now be used to conduct world-scale studies on biodiversity with and without modelling techniques. We then propose that these databases must work more closely together to (a) facilitate data quality control, (b) provide a more comprehensive (complete) and integrated biodiversity resource that is of more value to researchers, and (c) make most efficient use of the limited pool of scientific expertise. This synergy in infrastructures may be achieved in parallel with engagement of more experts, greater recognition of contributing individuals, institutions and funding agencies, and result in more substantial and prestigious global databases that provide services from national to global scales. 

Photos: 
Challenges, successes, and prospects in developing open-access global biodiversity databases

Using High-Performance Computing with Census Microdata

Dr. Lyndon Walker
Senior Lecturer
Department of Accounting and Finance
Unitec New Zealand, Auckland
00:00
00:00

This presentation will explain how social simulation was used to supplement the traditional statistical analysis in examining inter-ethnic partnership patterns over the period 1981–2006 using Census microdata. It will then discuss how the BeSTGRID cluster was used for the parallel processing of the simulation model, in order to use an evolutionary optimisation algorithm to search for optimal combinations of the partnering parameters.

The project was a Marsden funded study that investigated changes in the social structure of New Zealand by examining patterns of inter-ethnic partnering in married and de-facto relationships. The two main components of this study were a series of log-linear models examining the existing ethnic patterns, and a social simulation model of partnership formation that was populated with unit-level data from the New Zealand Census.

The simulation was written in Java and run on the Auckland cluster of the BeSTGRID computer network (www.bestgrid.org). The processing power of the cluster allowed the simulation to be run at a city level, with unit-level data that provided demographic information for all of the single eighteen to thirty year olds listed in the census in the Auckland, Wellington and Canterbury regions.

Resources: 
Photos: 
Using High-Performance Computing with Census Microdata

SKA and Trans-Tasman eResearch collaboration in radio astronomy

Prof. Sergei Gulyaev
Director
Institute for Radio Astronomy and Space Research
Auckland University of Technology, New Zealand
+64 9 921 9541
00:00
00:00

For twelve hours on 9 May 2010, a combination of six radio telescopes in Australia and New Zealand (including the first SKA dish in Western Australia and the AUT radio telescope in Warkworth), observed the core of the radio galaxy Centaurus A. A few weeks earlier the same set of Australian and NZ radio telescopes successfully observed an active galactic nucleus with a supermassive black hole and relativistic jet structure (PKS 1934-638). Both Centaurus A and PKS 1934-638 are the objects of greatest scientific interest. Following the installation of the KAREN connection at the AUT radio telescope, Warkworth data was transferred to Western Australia, where it was correlated, calibrated and imaged. The main objective of this activity was to virtually create a “skeleton” of the Australasian SKA to demonstrate the advantage of the 5500 km baseline and provide the first science from this Australasian SKA “prototype”. It was achieved on time and resulted in significant science return. Warkworth is now available to the international radio astronomy community for the VLBI (very long baseline interferometry) and its real-time eResearch version, eVLBI, as a part of the Australian Long Baseline Array. Challenges and future plans for this exciting international eResearch are outlined.

Resources: 
Photos: 
SKA and Trans-Tasman eResearch collaboration in radio astronomy
SKA and Trans-Tasman eResearch collaboration in radio astronomy

The Worlds Southernmost Grid Site for the Compact Muon Solenoid Experiment

Dr. Philip Allfrey
Department of Physics
University of Auckland
+64 9 373 7599 ext 84892

The Large Hadron Collider (LHC) and its associated high-energy physics experiments were one of the early drivers of Grid computing, due to the large quantities of data (Petabytes per year) they are expected to produce. A mature Grid computing and data storage infrastructure, and associated middleware, has been developed and deployed worldwide to handle this. Over the past year we have used the gLite middleware to set up a Grid site at the University of Auckland. This Grid site, designated NZ-UOA, is currently the only New Zealand Grid site belonging to the EGI Grid, and the southernmost site supporting the Compact Muon Solenoid (CMS) experiment at the LHC.

The NZ-UOA site consists of five components – a Computing Element (CE), one Worker Node (WN), a Storage Element (SE), a site-level Information Index (BDII) and a Accounting Box (APEL). NZ-UOA has been set up in collaboration with the Centre for eResearch at the University of Auckland, and shares some of their infrastructure. The CE and BDII are co-located in one Xen Virtual Machine, the SE and APEL run on two additional VMs. The 8-core worker node is part of the Centre for eResearch cluster, and shares the same Torque server, but is dedicated to a separate queue.

To support the Virtual Organisation (VO) of the CMS experiment, three further components are needed – the production versions of the CMS data analysis software, a Squid proxy server to reduce the load on the central CMS databases, and the CMS data transfer application PhEDEx based on the srm v2.2 protocol.

All of these components have been successfully installed, after the necessary site-specific configuration. NZ-UOA has passed commissioning tests by the Asia-Pacific Regional Operations Centre (APROC) to become a fully-fledged Production site, and is monitored on an ongoing basis via Nagios probes and automated tests by both APROC and CMS.

Resources: 

A web-based interactive, flexible translation service for classification systems and taxonomies

Dr. Tawan Banchuen
School of Environment
University of Auckland

 

From Linnaeus to Chomsky, classification systems and taxonomies are useful for our own understanding as well as sharing that understanding with others.  Scientists and organisations in various fields throughout the history have developed and applied such systems; the information and knowledge they have accumulated according to their systems could be of great value to other parties, but often cannot be used because other parties do not understand how to translate the information and knowledge into their own systems.  Biologists in the past gathered data according to the Linnaean system; what a great loss it would be if biologists today could not translate those data into the modern system.  Numerous databases of the Earth history were constructed based on various geologic timescales; geoscientists need to be able to translate one timescale to another to construct a comprehensive picture (see www.chronos.org).  These two examples demonstrate the importance of ability to translate between classification systems and taxonomies.  This paper describes ongoing work to create a web-based semantic translation service that allows users to: (i) experiment with the mapping between classification systems and taxonomies; (ii) visually explore translation results; and (iii) persist and share their translation maps with others.  Semantic equivalence and similarity are supported via underlying ontologies, which also facilitate the merging and re-grouping of classes.

The technologies we use are fully open and standards compliant: Sesame for the ontology store,  OWL for ontology encoding, SPARQL for ontology queries, WMS and WFS for transferring geographic maps through the Web, and SLD for styling them and experimenting with new classification schemes.  Our translation service is illustrated herein by experimenting with and interoperating between some of the various standard land cover and land use classifications in New Zealand, namely LCDB1, LCDB2, LUCAS and EcoSat.  The service is fully extensible to cover other kinds of classified geospatial data sets, including biology, geology, soil, forestry and agricultural data.

Semantic translation services are a relatively new technology.  The systems built to date are typically very limited in terms of flexibility and extensibility; the computer scripts used for describing the supported translations are hard-coded; and those systems provide little support for users to experiment with new mapping schemes.  On balance, our work makes two significant contributions: (i) our service has a highly interactive graphical interface, allowing users to compare two classification systems or taxonomies, and to plan, test and refine new mapping schemes; and (ii) mapping schemes, once created, can be serialised into the repository, browsed through and applied in new situations by the same or different users.

Resources: 

Developing for Grisu

Markus Binsteiner
Center for eResearch
University of Auckland

 

Grid middle-ware technology is usually relatively complex and it usually takes quite some time for developers not familiar with it to learn about things like grid security, job submission and monitoring, transferring files (using gridftp), information providers. In order to ease the pain and to allow for quicker development of client applications for the grid, we developed Grisu. Grisu is a (Java-)framework which encapsulates a set of commonly needed functionality for such clients and it allows access to this functionality via an easy-to-use and well-documented API. Developers who want to use Grisu to develop grid clients or connect existing applications to the grid can do that in different ways:

  • writing a simple (text-) template for the default grid client implementation, the "Grisu template client"
  • write a job submission work-flow using the Grisu jython library
  • use the Java Grisu client library to write a plug-in/extension for an existing (Java-) application or develop a Java grid client from scratch
  • use the Grisu SOAP and/or ReSTful API to develop grid clients in any other programming language

The poster we present will outline and explain those options and also show some example code to give a better idea how it all works. It’ll also highlight online resources which support developers who want to apply Grisu.

Resources: 

Using Grisu

Markus Binsteiner
Center for eResearch
University of Auckland

In order to submit jobs to a compute grid users need to learn about grid middleware as well as security and how to use complex and mostly unintuitive commandline clients and quite a few users prefer to do their calculations using different technologies just to avoid this complexity. Sometimes projects spend effort to create user-friendly grid clients like webportals or desktop applications, but usually this requires a substantial amount of time and money. And developers who know about the (grid-) technologies involved.

Grisu is a framework to support and ease the development of such grid clients. As an example as well as a generic default implementation of such a grid client, Grisu includes the so-called "Grisu template client". This generic, not-application specific grid client uses templates to render application-specific job parameter input masks for a wide range of applications that are used in eScience (e.g. blast, Namd, MrBayes, R...). Those templates are relatively easy to create -- even by non-computer scientists.

The poster that is presented displays the steps that are involved to submit and control a job to the grid using the template client. It show the login process, how to select the proper application, how to enter job parameters and submit/control the job.

Resources: 

R for the Grid

Stuart Charters
Department of Applied Computing
Lincoln University

 

The R language is used for a wide range of scientific and statistical applications. Traditionally these applications have been written in a single threaded manner and are therefore constrained in their execution to a single core of a single processor. This constraint limits the size of computation that can be achieved with R. To overcome the single threaded nature of R, a number of solutions have been developed including the Simple Network of Workstations (SNOW) package and Rmpi.

SNOW and Rmpi have both been used to run parallel R programs on cluster and high performance computers. We extend that work to show how R can be run within a grid computing environment.

We discuss two components of our work, the deployment and configuration of R, Rmpi and SNOW packages to grid computing resources and the implementation of parallel R code and packaging and submission requirements for a grid job using the Grisu tool.

In the deployment and configuration component we discuss how R, Rmpi and SNOW were deployed and configured across a range of different clusters running different local resource managers but all providing services via Globus to grid computing users.

In the implementation and submission component we discuss the issues that need to be considered in developing R code for deployment into the grid and approaches to packaging input and output data. We discuss the use of the Grisu tool and its template feature for identifying and deploying jobs to appropriate resources.

Resources: 

A Comparison of distribution scheme and computation times for embarssingly parallel jobs in a grid computing environment

Stuart Charters
Department of Applied Computing
Lincoln University

 

A large number of problems requiring computational power can be classed as embarrassingly parallel, that is the problem contains a number of identical sub-problems which are independent of each other. This type of problem can be decomposed and tackled in a number of ways; however it is not always clear what the most appropriate approach to the decomposition and distribution of this type of job is.

We present a comparison of a number of mechanisms for the distribution of embarrassingly parallel computing jobs in a variety of grid computing environments. The comparison examines the computational performance of each approach measured in time to complete one job and time to complete a number of jobs. That is a measure of serial performance versus parallel performance. The comparison will also examine the ease of use of the alternative distribution mechanisms and the time required to submit jobs for execution via each method. From the measurements taken we will present a theoretical analysis of platform performance for the platforms analysed to help in determining the appropriate type of resource to submit different sized embarrassingly parallel jobs to.

The platforms under analysis are a local Condor Scavenging Grid and cluster computers within the BeSTGRID environment. Jobs are submitted using a variety of submission tools, including the Condor Submit tool, Gricli and Grisu. Distribution schemes for the computational jobs include Condor scheduling, MPI, submitting jobs to a single cluster, manual distribution of jobs between clusters and Grisu scheduling.

The results of this analysis will aid users in understanding the appropriate way to structure and submit their jobs for grid computing resources in order to reduce the total time of execution.

Resources: 

Promoting the eResearch Agenda: What the New Zealand Social Science Data Service has to offer

Dr. Gerard Cotterell
COMPASS Research Centre
The University of Auckland
Martin von Randow
COMPASS Research Centre
The University of Auckland

 

The burgeoning eResearch agenda supports, among other things, the formation and operation of effective digitally-supported research communities. The New Zealand Social Science Data Service (NZSSDS) is a prime example of such a community, as this presentation demonstrates.

Set up in 2007, the NZSSDS was intended to provide a space for the maintenance of data and metadata related to past surveys of note from the social sciences in New Zealand. The service was established following international best practice, with close support from the Australian Social Science Data Archive (ASSDA) in the setup phase, including the use of Nesstar middleware for online distribution. Locally we consulted with the BeSTGRID team, who provided the server infrastructure for Nesstar and the archived data, including enabling seamless, user-friendly remote access for the NZSSDS team.

NZSSDS currently holds 50 data sets, all available for perusal and secondary analysis, covering political studies, health and social science topics; new data sets are added as they become available.

It was always envisaged that the archive would have uses beyond the simple preservation of data and metadata, hence the inclusion of the term ‘service’ in the title. Thus more recent developments have seen the incorporation of NZSSDS into the teaching of a postgraduate statistics course at The University of Auckland in 2009, and into a social research methods course in the first semester of 2010. In addition, guidebooks for generic teaching of basic statistics using IBM SPSS Statistics have been developed, with accompanying ‘teaching data sets’ and metadata in NZSSDS. In the same timeframe, materials were prepared to enable the presentation of ‘enhanced publications’ through Nesstar. These ‘enhanced publications’ are typically, but not necessarily, an article in a journal and contain the article itself, along with supplementary material. The supplementary material consists of, for example, research data, illustrative images, metadata sets, or post-publication data such as comments or rankings. The option of changing post-publication data allows an enhanced publication to develop over the course of time.

Proposed future developments include the addition of qualitative data sets to the service, along with associated teaching resources; the addition of further high value social science data sets; and the expansion of enhanced publications.

Resources: 

Getting Started with BeSTGRID

Aaron Hicks
UNIX Administrator and Research Assistant
Informatics
Landcare Research New Zealand Limited
Stephen Cope
Research Computing Analyst
Centre for eResearch
The University of Auckland
Resources: 

MataNui - Building a Grid Data Infrastructure that doesn't suck!

Guy Kloss
Institute of Information and Mathematical Sciences
Massey University


In science and engineering the problem of "sanity" in data management is quite common. Particularly, if partners within a project are geographically distributed, and require access to this data. These partners would ideally like to access or store data using a local server, still the data has to be accessible for the remote partners as well without manual intervention. As a target environment for this data management project the collaborative Microlensing Observations in Astrophysics (MOA) project between researchers in New Zealand and Japan is used. Within New Zealand partners from Massey University, the Universities of Auckland and Canterbury, as well as the observatory on Mount John, (close Lake Tekapo) require access to the data. Furthermore, data doesn't "live alone," but has a "partner," the meta-data. The problem in itself is not new, and neither are solutions for it. However, the currently available solutions all cater for different needs, and do not fully address the problems in our own environment. These solutions are for example not built to work seamlessly within a Grid infrastructure, they are complicated to deploy and maintain, and often do not feature easy to use GUI interfaces, which are still exible enough to adapt to the local working needs.

The most common way to integrate data services into a Grid environment is by using the GridFTP protocol. It is commonly used for scripts and automation, for compatibility with other Grid enabled tools, and it features among other the capability of using Grid certificate based authentication and third-party transfers. DataFinder is a data management tool designed for scientific and engineering purposes, and it allows for the definition and enforcement of project specific workflows, data policies, automation and integration with other tools. One of the big benefits is that it enables extensive use of arbitrary meta-data for organising and querying stored content. The last noteworthy access front end to the data server is a direct file system mount into the local infrastructure of a workstation or server.

For the development of such a data infrastructure the storage back-end needs to be capable of handling larger files or large collections of data, as well as managing arbitrary amounts of meta-data associated with data items. The data server needs to be useful in distributed projects. To reduce access latencies and improve throughput a federation of data storage back-ends is beneficial. This federation has to be able to operate also over geographic distances, abstracting storage and retrieval of data to the user. Another means to drastically improve performance for queries on the stored meta-data is to enable these queries to be executed on the server side, avoiding transfers of potentially large numbers of data sets to a client. Lastly, a new implementation of such a system is only worth undertaking, if it is envisioned to be robust as well as easy to deploy and use.

As the cornerstone of the MataNui infrastructure the Grid file system (GridFS) of the high performance database MongoDB is used. It handles out of the box the management of potentially large data items along with arbitrary meta-data. For Grid compatible access, the Grin GridFTP server is being equipped with a storage back-end accessing the GridFS. GridFTP allows only very limited access to mata-data. To enable the required meta-data capabilities through the DataFinder, a RESTful web service is placed in front of GridFS. This web service also authenticates using the same certificates used for the rest of the Grid infrastructure. Grin as well as the RESTful web service can be installed as accessible front ends to each of the federated MongoDB servers.

By choosing the mentioned existing building blocks, it becomes comparably simple to implement a consistent Grid data infrastructure. The implementation makes good progress, and is expected to be simple to deploy and con gure, as well as integrate seamlessly into the intended infrastructure of this and other projects.

Resources: 

Earthquake Data Shared Over BeSTGRID DataFabric

Vladimir Mencl
BlueFern Computing Services
University of Canterbury

On September 4th, a 7.1M earthquake struck Christchurch and the Canterbury region. As part of the relief effort, a good deal of spatial information needed to be gathered and shared between the first-responder agencies, including: Ministry of Civil Defence and Emergency, New Zealand Defence Force, Environment Canterbury, NZ Aerial Mapping and Christchurch City Council. Advances in imaging technology have provided data products that are very useful to the responding agencies, such as: high resolution photos to quickly assess broadscale harm, and massive LiDAR (Light Detection and Ranging) datasets to calculate subtle changes in elevation and determine most likely points of damage to buried infrastructure. However, at the time of the disaster, no data infrastructure or high-speed network was available universally to allow the responders to share these large datasets in an efficient and timely manner. The same problem manifested during the 2005 Hurricane Katrina episode in the USA; agencies needed relevant data fast, but data infrastructure for efficient sharing was not in place beforehand.

BeSTGRID / Centre for eResearch at Auckland were approached by the New Zealand Defense Force who needed the ability to rapidly upload, share, and download imagery between sites and organizations. Access to the KAREN network would provide the missing high-bandwidth connectivity and the services offered by BeSTGRID via KAREN would provide the required data services. The fabric needed to be accessible different individuals and organizations throughout the country. The BeSTGRID / Centre for eResearch team was able to provide a working solution within hours of being approached, building on work already underway to allow researchers around the country to share large datasets.

The BeSTGRID DataFabric was identified as the suitable platform for sharing files both within the closed group of the emergency response agencies staff and with the broader research community. Designed to handle large files and providing a web and webDAV interface, the DataFabric uses iRODS (integrated, Rule-Oriented Data System) as the distributed storage backend and Davis (webDAV interface for iRODS and SRB) as the frontend. These two technologies together provide tools for authentification, bulk upload and download of data and allow the DataFabric to be mounted as a ‘local’ drive in many operating systems.

It has been the right match for this project: the many contributors scattered over different government agencies and other institutions have been able to share all earthquake related data via a common infrastructure, that abstracts away all the details of where or how the data is stored, and makes upload and download simple.

Parts of the BeSTGRID infrastructure (including the DataFabric) are hosted at the University of Canterbury. These systems have been hosted in a state-of-the-art Data Centre, mounted on earthquake-resistant mountings and powered by a Uninterruptable Power Supply (UPS) and a diesel generator. Not only did these systems go through the earthquake unscathed, but within days after the earthquake, the systems were already helping scientists share data about the earthquake.

Resources: 

Linking your HPC resources into BeSTGRID

Vladimir Mencl
BlueFern Computing Services
University of Canterbury
Resources: