Summer of eResearch 2010 Projects
Geospatial Lattices with HEALPix
This project is about exploring non-relational databases and testing input and output speeds for large amounts of Geological data( in the 100 TB range). This data is to be spread across a cluster of nodes, so there is a need for networking capabilities.
Many geospatial science models use latitude-longitude grids for their mesh. Converging meridians at the poles present mathematical problems, as does the increasing requirement to maintain both low and high resolution features and support moving fluidly between them. This project seeks to use Adaptive Mesh Refinement (AMR) techniques to solve these problems in the context of New Zealand geospatial datasets.
This project will focus data storage and retrieval from newly encoded AMR geospatial datasets, seeking to compare different data storage strategies such as implementing the above for Cassandra(Java), CouchDB (NoSQL/javascript), PostGIS (SQL), Neo4J (NoSQL/java), HDFS ... in a way that can be executed efficiently on a GRID data/compute resource. It is expected to produce abstract models and prototype implementations, rather than final production implementations.
Invader Genetics for New Zealand Conservation
New Zealand has many native birds and animals that need to be protected from the detrimental effects of predators such as rats and stoats. Currently, there exist conservation initiatives which try to eradicate these pests from various islands, parks and reserves, by laying down traps for them. Researchers then send away samples from these pests for DNA analysis to discover where these pests are coming from, in hopes of finding better ways to eradicate these predators and keeping them off these protected areas.
These initiatives are run by volunteers, researchers, as well as employees, who lay down the traps and the bait, and record all the data. Researchers then use statistical analysis to try determine where pests are coming from and plan eradication attempts accordingly.
Information sharing is one of the problems that is faced by the Volunteers and Researchers. Many initiatives have their own small system which lets them record information, but it is only them that have access to it. We need to create a centralised system which will allow all initiative to submit their information to a single location. If this were possible, researchers would have access to a lot of genetic data to do all the statistical analysis they wish. This would lead to better eradication plans and more safety for birds. This centralised system is what we aim to create.
Open Access Databank and Model Ecosystem
The need for preserving ecosystem is growing day by day. As a species becomes extinct, it directly affects the living organisms in the entire ecosystem. So, to preserve and maintain the different species in an ecosystem is vital. This project aims to achieve a part of this goal. A team of Researchers tries to maintain the ecological balance by surveying the species information in a particular area(Little Barrier Island). From the results obtained, it shall be implemented to the entire New Zealand.
A team of Researchers went on a field trip to Little Barrier Island and collected samples of different species. The collected samples would then be sent to laboratory for DNA Sequencing. On collection of DNA, the sequence shall be compared with the NCBI Database by which the rare and extinct species shall be preserved.
The tool (modelEcosystem) will try to satisfy the needs for the Ecological Research community. The tool will provide interface for the Researchers to enter, update, and view the Genetic information about the species. The tool will also provide a Visual representation (using Google Maps) about the location of the species.
The project focuses on creating a data model to manage the data that were collected from the Island and from the Lab results. The project also aims in creating a website to allow users to add data and retrieve data. The project also makes use of Google Maps to visually explain express location details to people. Integrating statistical packages(such as R) is however out of the project scope.
Research Desktop
This project consists of two parts. The first of these is to provide an authentication service for the grid, exposed as a RESTful web service. This will allow both BeSTGRID and 3rd party grid based services to delegate their authentication processes to an external server, removing a lot of issues concerning firewalls, and also centralising authentication to one point - which should provide a more secure system overall. The second part of this project is to create a desktop application which provides access to grid services and tools in an easily accessible and extendable way. The overall aim for this project is to provide users of the Research Desktop application with one location in which all authentication is performed, and along with that, access to a range of different services available on the grid.
BeSTGRID Online
BeSTGRID coordinates New Zealand research organisations providing services and infrastructure to conduct research collaboratively. These services enable researchers & research collaborations to access computational resources, shared datasets and storage, and to support distributed work using collaboration tools. Grisu Web is a web based application that allows researchers to submit and monitor jobs on the Grid. BeSTGRID Online project's objective is to research and provide appropriate security for the Grisu Web application and also to improve the accessibility of the application for researchers to use it. These improvements introduce Single Sign-on service to reduce password fatigue which provides a more secure authentication process. Other tasks involve helping other projects to deploy their web applications so that it can be visible within Auckland University campus.
BLAST on BeSTGRID
BLAST is tool for finding regions of local similarity between sequences, which are frequently used by bioinformatics researchers. While the BLAST software supports parallelism only by threading (i.e., within a single address space), the mpiBLAST package extends BLAST with parallelism based on MPI, allowing it to scale well on a cluster or a massive parallel system – for example the BlueGene systems (both L and P).
BLAST and mpiBLAST can search against databases provided by the researcher - commonly databases provided by NCBI (National Center for Biotechnology Information, www.ncbi.nlm.nih.gov). These datasets are frequently updated by NCBI and their local copies at a site need to be regularly updated. Due to implementation reasons, mpiBLAST needs to have the NCBI dataset pre-formatted depending on the maximum number of processors that would be used to run the job. This needs to be done carefully so that jobs running at the time of the update are not affected.
The goal of this project is to make BLAST and mpiBLAST available for users of the BeSTGRID computational - in the form of a Grisu job template.
- ARCS have already invested a lot of effort and built up experience - and would be willing to share (Simon Yin / Intersect).
- Several researchers have already indicated strong interest in having BLAST available on BeSTGRID - and would be available to provide input into steering this project.
- Anthony Poole (University of Canterbury School of Biological Sciences) will be available to provide input into this project.
Web-based genetic marker design
Science needs to be reproducible. Traditionally the way to ensure this was through scientific journal publication describing the methods used in an experiment; however with increasing use of software with extensive configuration parameters in experimentation this is no longer sufficient. In the realm of Bioinformatics this is particularly the case as many solutions are homegrown.
Scientific tools need to be reusable. Particularly in the area of Bioinformatics scientists have developed custom and ah-hoc tools, and scripts to manipulate sequence information, the result of this often is that these tools can only be used for one task, and may only be usable by the person who developed them. The end result of this is a large number of tool variants being developed and reworked by multiple scientists, difficulty sharing tools and lack of reproducibility of scientific methods.
Plant and Food Research have a large number of tools that have been custom developed in the area of Genetic Marker Design, the aim of this project is to make those tools, reusable and scientific workflows using the tools reproducible and publishable.
Data analysis of drug discovery datasets
The underlying aim of this project is to analyze chemical compounds that can play a major role in drug discovery to treat cancer. So far, Dr. Jack Flanagan (and his team), had been carrying out lab tests - a process termed as Lab Screening - to determine which compounds (among several thousand if not more) have drug like behaviour. However, this process was time consuming and tedious. Therefore, they adopted a new form of testing and analysis - a process termed as Virtual Screening. In virtual screening large libraries of chemical compounds are used either with the help of computer or computational models to analyze the binding of the chemical structure of a compound with the target drug. This screening uses computational algorithms. Our sponsor used one such algorithm GOLD. GOLD is a program for visualizing and manipulating results obtained after running the compounds through this algorithm. More information on this algorithm is available here. These results would help the sponsor to consequently determine which ones would work best as a potential drug. As stated this project is to predict the likelihood of a molecule having ‘drug-like’ characteristics. Dr. Flanagan has done analysis based on GOLD. This project comes in after this experiment. Output from the experiments performed by our sponsor is the input for this project. We will provide a tool to visualize this input in 2-D and 3-D. This will help him to determine the compound that may be a candidate for future drug(s) for cancer.


Recent comments
1 year 2 weeks ago
1 year 2 weeks ago
1 year 2 weeks ago
1 year 2 weeks ago
1 year 2 weeks ago
1 year 3 weeks ago
1 year 13 weeks ago
1 year 13 weeks ago