Summer of eResearch 2010 Projects

This project involves developing a tablet application for audio geotagging. The student will work towards building a system to capture time-stamped audio and touchscreen map interactions. Ideally, the student would produce a prototype application that records where and when users touch a map display during an interview, and automatically relate the locational data to the time in the interview when the gesture was made. Following a recording, it should be possible to search within and across interviews using geographic queries (e.g. "show me all interviews that mention "Lake Taupo" or "take me to the point in this interview when the interviewee referred to Mairangi Bay"). It would be desirable for the project to include a small set of pilot interview usability tests.

Although this project is focused on geographic maps, it may be extended to any visual display (e.g. collections of photographs, diagrams). Audio created using this tool (both English and Maori) would ensure that rich geographic metadata is automatically generated for the digital item. This streamline an important aspect of cataloguing/description and ensures that place and time are point of entry for search -- two key organising principles for the retrieval and citation of information. These audio and /or video recordings could be oral history recordings, for family/community/etc local and public history projects, public lectures and teaching sessions. The tool also has public good applicability beyond the library and information professions, being invaluable for health professionals working with Alzheimer patients, trauma victims, etc. Scientists and geographers would also be able to use the tool for mobile data capture in their fieldwork.

Submitted by Tim McNamara on

BeSTGRID coordinates New Zealand research organisations providing services and infrastructure to conduct research collaboratively. These services enable researchers & research collaborations to access computational resources, shared datasets and storage, and to support distributed work using collaboration tools. Grisu Web is a web based application that allows researchers to submit and monitor jobs on the Grid. BeSTGRID Online project's objective is to research and provide appropriate security for the Grisu Web application and also to improve the accessibility of the application for researchers to use it. These improvements introduce Single Sign-on service to reduce password fatigue which provides a more secure authentication process. Other tasks involve helping other projects to deploy their web applications so that it can be visible within Auckland University campus.

Submitted by richard on

BLAST is tool for finding regions of local similarity between sequences, which are frequently used by bioinformatics researchers.  While the BLAST software supports parallelism only by threading (i.e., within a single address space), the mpiBLAST package extends BLAST with parallelism based on MPI, allowing it to scale well on a cluster or a massive parallel system – for example the BlueGene systems (both L and P).

BLAST and mpiBLAST can search against databases provided by the researcher - commonly databases provided by NCBI (National Center for Biotechnology Information,  These datasets are frequently updated by NCBI and their local copies at a site need to be regularly updated. Due to implementation reasons, mpiBLAST needs to have the NCBI dataset pre-formatted depending on the maximum number of processors that would be used to run the job.  This needs to be done carefully so that jobs running at the time of the update are not affected.
The goal of this project is to make BLAST and mpiBLAST available for users of the BeSTGRID computational - in the form of a Grisu job template.

  • ARCS have already invested a lot of effort and built up experience - and would be willing to share (Simon Yin / Intersect).
  • Several researchers have already indicated strong interest in having BLAST available on BeSTGRID - and would be available to provide input into steering this project.
  • Anthony Poole (University of Canterbury School of Biological Sciences) will be available to provide input into this project.
Submitted by richard on

The underlying aim of this project is to analyze chemical compounds that can play a major role in drug discovery to treat cancer. So far, Dr. Jack Flanagan (and his team), had been carrying out lab tests - a process termed as Lab Screening - to determine which compounds (among several thousand if not more) have drug like behaviour. However, this process was time consuming and tedious. Therefore, they adopted a new form of testing and analysis - a process termed as Virtual Screening. In virtual screening large libraries of chemical compounds are used either with the help of computer or computational models to analyze the binding of the chemical structure of a compound with the target drug. This screening uses computational algorithms. Our sponsor used one such algorithm GOLD. GOLD is a program for visualizing and manipulating results obtained after running the compounds through this algorithm. More information on this algorithm is available here. These results would help the sponsor to consequently determine which ones would work best as a potential drug. As stated this project is to predict the likelihood of a molecule having ‘drug-like’ characteristics. Dr. Flanagan has done analysis based on GOLD. This project comes in after this experiment. Output from the experiments performed by our sponsor is the input for this project. We will provide a tool to visualize this input in 2-D and 3-D. This will help him to determine the compound that may be a candidate for future drug(s) for cancer.

Submitted by richard on

This project is about exploring non-relational databases and testing input and output speeds for large amounts of Geological data( in the 100 TB range). This data is to be spread across a cluster of nodes, so there is a need for networking capabilities.

Many geospatial science models use latitude-longitude grids for their mesh. Converging meridians at the poles present mathematical problems, as does the increasing requirement to maintain both low and high resolution features and support moving fluidly between them. This project seeks to use Adaptive Mesh Refinement (AMR) techniques to solve these problems in the context of New Zealand geospatial datasets.

This project will focus data storage and retrieval from newly encoded AMR geospatial datasets, seeking to compare different data storage strategies such as implementing the above for Cassandra(Java), CouchDB (NoSQL/javascript), PostGIS (SQL), Neo4J (NoSQL/java), HDFS ... in a way that can be executed efficiently on a GRID data/compute resource. It is expected to produce abstract models and prototype implementations, rather than final production implementations.

Submitted by richard on

The project aims to undertake the first New Zealand analysis of the Hathi corpus to assess the nature, range, and quality of its New Zealand content.

The Hathi Trust describes itself as “a partnership of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future. There are more than sixty partners in HathiTrust, and membership is open to institutions worldwide.” The HathiTrust Digital Library stores the collections of partner institutions in digital form, preserving them securely and providing various degrees of access, depending on copyright status. The collection currently holds over 10 million scanned books. Primary contributors include Google, the Internet Archive, and major US research libraries.

Because of copyright issues not all of the books, including many on New Zealand topics, are available to the New Zealand public or scholars. The Hathi Trust, however, is developing a ‘non-consumptive research services’ that will enable researchers to query the available dataset (currently approximately 2 million books) and have results returned to them. This can be done using their API, or Meandre workbenches. One primary output of the Meandre workbenches are topic models. Meandre itself is described as a ‘semantic enabled web-driven, dataflow execution environment’ tailored for digital humanities applications. It is part of the Software Environment for the Advancement of Scholarly Research (SEASR) package. Workbenches are accessible online, with other components available as Eclipse plugins.

Submitted by Tim McNamara on

New Zealand has many native birds and animals that need to be protected from the detrimental effects of predators such as rats and stoats. Currently, there exist conservation initiatives which try to eradicate these pests from various islands, parks and reserves, by laying down traps for them. Researchers then send away samples from these pests for DNA analysis to discover where these pests are coming from, in hopes of finding better ways to eradicate these predators and keeping them off these protected areas.

These initiatives are run by volunteers, researchers, as well as employees, who lay down the traps and the bait, and record all the data. Researchers then use statistical analysis to try determine where pests are coming from and plan eradication attempts accordingly.

Information sharing is one of the problems that is faced by the Volunteers and Researchers. Many initiatives have their own small system which lets them record information, but it is only them that have access to it. We need to create a centralised system which will allow all initiative to submit their information to a single location. If this were possible, researchers would have access to a lot of genetic data to do all the statistical analysis they wish. This would lead to better eradication plans and more safety for birds. This centralised system is what we aim to create.

Submitted by richard on

Our research group (COMPASS) supports a data service, NZSSDS, which in turn is built around an architecture derived from the Australian Data Archive (ADA) and NESSTAR (a proprietary middleware). We seek to move this system to a new open-source architecture, the Dataverse environment developed at Harvard. This is an open source solution for managing data, that is DDI based, and we believe able to import content from NESSTAR. It is also able to be integrated with the look and feel from our existing websites. It could complement our teaching with data resources, as it allows storage of code and publications as well as data. It is also based on R, which might provide future opportunities based on the research group’s simulation studies. The research group’s data manager who has been responsible for the development of the data service is available for consultation, as our other team members and possibly Australian colleagues.

Submitted by Shubham Sharma on

Increasing computing power along with increasing availability of digital records has lead to a coming-of-age for network science. Techniques from computer science, mathematics and physics now make it possible for network scientists to mine digital resources and then construct, analysis and visualise the resulting networks. The resulting networks may contain millions of nodes and edges, with rich metadata associated with each of them.

While the study of such networks is interesting in and of itself, the end-users of the information in the networks are typically not network scientists. This project involves constructing a self-contained user interface so that a user with no background in network science is able to visualise, explore and manipulate complex networks.

Prospective students must be familiar with data visualisation. A knowledge of Java and the D3 visualisation library would be ideal.

The Complex Systems research group at IRL is involved in numerous projects which involve analysis of complex networks. While there is plenty of existing software which allows network scientists to manipulate and explore these networks, such software is generally inaccessible to our collaborators, and the end-users of our data, who generally do not have a technical background. This project would produce a stand-alone interface which would allow these collaborators and end-users to explore networks and would provide a far richer experience than the static, one-off network visualisations which are currently available.

Submitted by Tim McNamara on

The need for preserving ecosystem is growing day by day. As a species becomes extinct, it directly affects the living organisms in the entire ecosystem. So, to preserve and maintain the different species in an ecosystem is vital. This project aims to achieve a part of this goal. A team of Researchers tries to maintain the ecological balance by surveying the species information in a particular area(Little Barrier Island). From the results obtained, it shall be implemented to the entire New Zealand. 

A team of Researchers went on a field trip to Little Barrier Island and collected samples of different species. The collected samples would then be sent to laboratory for DNA Sequencing. On collection of DNA, the sequence shall be compared with the NCBI Database by which the rare and extinct species shall be preserved. 

The tool (modelEcosystem) will try to satisfy the needs for the Ecological Research community. The tool will provide interface for the Researchers to enter, update, and view the Genetic information about the species. The tool will also provide a Visual representation (using Google Maps) about the location of the species.

The project focuses on creating a data model to manage the data that were collected from the Island and from the Lab results. The project also aims in creating a website to allow users to add data and retrieve data. The project also makes use of Google Maps to visually explain express location details to people. Integrating statistical packages(such as R) is however out of the project scope.

Submitted by richard on