Logo

WEHI Research Computing Platform

Public page for the WEHI Research Computing Platform (RCP)

View this profile on Github

RCP Internship Program - Intake 13 - Semester 2 2025 (In Progress)

This is the list of projects for this intake. Here you will see:

These reports are written by the group to help share information to future interns and interested parties.

For more examples, see the Intake 11 - Summary Report Page

Please update and do a pull request for this page via Github here.

AIVE

OBJECTIVE

Our team was divided into two groups, each tasked with their own objectives.

Organelle Segmentation

The objective for this group was to contribute to the AIVE project by addressing the intensive manual effort required for organelle segmentation in microscopy images. Identifying the structures of organelles within electron microscopy (EM) images is an important task in the AIVE pipeline which is largely done through manual segmentation- which is time consuming - rather than Machine Learning techniques such as Deep Learning Models due to insufficient investment in the matter. The task included exploring and evaluating the various software tools that can be used to perform segmentation of 3D EM images, thus allowing researchers to determine its viability in the AIVE workflow’s integration requirements. We created documentation regarding the usage of such tools, the results obtained, and observations such as their performance ability and technical capabilities.

AI Probabilities and AIVE Integration

The second group’s objective was to streamline the AIVE workflow in WEKA. The motivation comes from knowing that in the current workflow, AIVE is a separate step from AI predictions to enhance the results. The goal was to explore WEKA to determine whether adding an extra layer to the model can integrate the AI predictions with the original EM image.
For AIVE integration, one main task was to get the AI segmentation done in WEKA which includes training and evaluating a Random Forest classifier and getting the probability maps per class. The second main task was to preprocess the raw EM image using Fiji/ImageJ which includes applying contrast normalisation and denoising techniques.

WHAT WE DID AS A GROUP

Comprehending the AIVE Workflow

As a group, we first reviewed the project background, research paper, and previous interns’ work to get a clear understanding of our starting point. After this we began narrowing down the problem space to decide what we would work on, eventually landing on the two objectives mentioned.

Tasks Undertaken:

We collaborated actively to identify bottlenecks, benchmark results, and keep documentation up to date for both future interns and project integration.

FINAL STATUS

Organelle Segmentation

By the end of the internship, we had tested and documented the initial analysis of tools for organelle segmentation. We established a reproducible workflow, highlighted tool limitations, and proposed recommendations for further improvements. While initial tool analysis objectives were achieved, additional development will be needed for full automation and deeper integration with the wider AIVE project

AI Probabilities and AIVE Integration

By the end of the internship, we were able to partially reproduce the AIVE core workflow that covers 3 main complex steps including AI segmentation, data preprocessing and voxel multiplication. Unlike AIVE original workflow, we were able to integrate AI segmentation and the voxel multiplication within WEKA by using a mathematical expression tool.

BioNix

OBJECTIVE

The research project worked to boost bioinformatics workflow reproducibility through its development of BioNix which operates as a Nix package manager extension. BioNix enables scientists to create reproducible environments for complex scientific analyses which can be shared and version controlled. Our main goal was to find the barriers of computational reproducibility and make useful improvements by adding documentation to code and implementing testing and packaging.

WHAT WE DID AS A GROUP

As a team, we explored the Nix ecosystem, including Bash scripting, Nix, Nixpkgs, and BioNix. The learning process started with a challenging slope because we lacked experience with these tools. We concentrated on: • Understanding how Nix ensures consistent builds and dependency management. • Exploring reproducibility challenges in existing bioinformatics workflows. • Testing and documenting small, reproducible pipelines to demonstrate workflow consistency. • Improving onboarding materials by creating an “Introduction to BioNix Project” page, which provides guidance on installation, key concepts, and troubleshooting. • Maintaining detailed technical diaries to record progress, errors, and solutions for future contributors. We worked collaboratively, shared progress in regular meetings, and updated documentation to support long-term project continuity and accessibility.

HOW FAR WE GOT

By the end of the project, we achieved several tangible outcomes: • Developed a foundational understanding of reproducible build systems and workflow management using Nix and BioNix. • Demonstrated understanding of genome alignment conecepts and workflow. • Investigated reproducibility barriers in packaging complex bioinformatics tools, such as Gridss • Created and published onboarding documentation on GitHub to support new users and interns with minimal programming background. Although full integration of advanced tools like Gridss into BioNix remains ongoing work, the team established a reproducible foundation, identified technical bottlenecks, and provided documentation that will streamline future development.

FINAL INSIGHTS

Through this project, we learned how to apply reproducibility principles in real-world bioinformatics contexts. We moved beyond traditional learning styles to self-directed exploration, iterative testing, and reflective documentation. Our collective effort resulted in practical contributions to BioNix’s infrastructure and onboarding resources, setting the stage for continued development by future contributors.

Duplex Sequencing

Our objective was to embed duplex-specific QC metric generation and reporting into the core pipeline in a reproducible and version-controlled way, improving upon earlier approaches that operated outside the main workflow. Duplex-specific metrics are currently calculated outside the main processing pipeline using custom R markdown, which are neither containerized nor version-controlled, making them difficult to reproduce and maintain. Additionally, some of these scripts rely on GPL-licensed components, which conflict with the MIT license of the in-house duplex pipeline, creating licensing incompatibilities. The Duplex QC results were also only available in standalone R-generated reports, while standard read-level metrics were presented in MultiQC. This fragmentation can make it harder for users to assess experimental quality efficiently.

To resolve these issues, we modularized the R Markdown notebook into a standalone R script to improve computational efficiency, enhance version control, facilitate automation within pipelines, and simplify testing. This was packaged within a Docker container to ensure reproducibility and compatibility by removing GPL-licensed dependencies. A custom MultiQC plugin was developed to render duplex-specific metrics and plots directly within the Duplex MultiQC report generated after consensus sequence creation, while leaving the standard read-level QC report unchanged. This integration ensures that visuals are clear, accessible, and tailored specifically for duplex quality assessment.

How far we got was completing all the planned work, including refactoring the R scripts, containerizing the workflow, developing the custom MultiQC plugin, and integrating everything into the main pipeline. The system has been tested, with team members reproducing each other’s results to ensure consistency and reliability. However, there remains considerable potential for extension on all fronts — from expanding the testing framework, to customizing the MultiQC plugin with additional metrics and visuals, as well as further optimizing the pipeline integration.

Genomics Invoicing

Our objective was …

What we did as a group was …

How far we got was …

REDMANE Demo

The objective of my internship was to revive and modernize REDMANE, a modular Research Data Management (RDM) platform designed to help researchers securely manage, organize, and share biomedical datasets. The aim was to restore its functionality, improve stability, and prepare the system for future scalability and integration.

What I did was focus on rebuilding the system infrastructure by setting up and configuring the cloud environment, containers, and network layers to ensure stable communication across services. I worked on migrating the database system and successfully connected the database to the backend, improving backend functionality and deployment reliability. I also streamlined the setup process and created comprehensive technical documentation to support future maintenance and development.

How far I got was a nearly complete system, with the backend and database fully connected and running smoothly, and the final step being to establish the connection between the backend and frontend. This progress laid the groundwork for completing the full data flow and advancing REDMANE toward a production-ready stage.

REDMANE Web Dev

The objective of our team this semester was to turn the REDMANE Data Registry into a functional MVP that could be shown to stakeholders. We aimed to remove hardcoded content, make pages responsive, and connect the frontend to the PostgreSQL database.

This internship, we completed three core features. Creating new datasets, a project summary page, and uploading file metadata. As well as updating the files view page for a dataset to be responsive, and filter between raw / processed / summarised files. The datasets and projects are now also read from the database. These features all interact directly with the database and now allow for basic Create and Read functionality (no Update and Delete just yet) in the demo.

This intake we were able to turn what was previously a basic framework for REDMANE, into a functional demo built with core create / upload features, with some advanced features still in progress, paving the way for future development.

Student Organiser

Objective

The objective of our project was to improve the Student Organiser application to support Rowland in managing interns, projects, and intakes more effectively. Additionally, our team aimed to design a new Student Engagement feature that would allow Rowland to track attendance, questions asked, and overall engagement throughout each internship period.

What We Did as a Group

As a team, we began by reviewing the existing system, including documentation and previous interns’ work, to fully understand its workflow and current limitations. We identified several areas for improvement and prioritised features that would enhance usability, data accuracy, and supervisor efficiency.

Our key contributions included:

Final Status

By the end of the internship, the team had successfully implemented and tested the new features for project capacity tracking, job description management, and intake communication using automated email templates.

The Student Engagement system was fully designed, with both the database structure and UI prototypes completed, but not yet implemented. This provides a clear and well-documented roadmap for future interns to continue building upon.

Overall, the Student Organiser is now more user-friendly, efficient, and aligned with the supervisor’s needs, setting a strong foundation for future improvements in intern management and engagement tracking.

Final presentation video