RCP Internship Program - Intake 11 - Summer 2024/2025

This is the list of projects for this intake. Here you will see:

Summary of the problem and the work done in the project
Links to the final presentation slides and/or video for the project
Links to the github repos that were part of this project
Links to other documentation, such as technical diary and other project documentation
Links to the project management tool with the tasks shown

REDMANE Clinical Dashboards

The challenge that we were trying to solve was how to utilize publicly available clinical metadata while ensuring patient privacy and addressing any potential security concerns while still making the data useful for research. As an example, sensitive data such as: medicare number, date of birth, location of residence, etc, are often included in clinical data. Therefore the solution is to artificially or ‘synthetically’ generate clinical data that replicates real world datasets.

The way we tried to solve this was by developing code that renamed public clinical data files from cBioportal and randomly sampled a publicly available .fastq fille from genomeInABottle to generate the corresponding fastq files for each patient in the clinical data file. While these files aren’t real genome sequences, they are in the correct fastq format and can be used for other teams’ data workflows.

Not only did we learn about methods for generating synthetic clinical data, but we also gained valuable experience in writing clean, maintainable code that integrates seamlessly into larger team workflows. Since our work was being used by multiple intern teams, it was crucial to distribute data efficiently and document our code thoroughly. We quickly realized that clear cross-team communication and well-structured documentation were essential to the success of both our team and others. While we each improved our technical skills, we found that soft skills—such as collaboration and effective communication—were just as critical. Additionally, developing a strong understanding of the high-level context of our work significantly reduced redundancy and saved time, allowing us to make more informed decisions thorughout the internship.

Key links

REDMANE Data Ingestion

The challenge that we were trying to solve was ensuring that metadata uploaded to the REDMANE data registry and data portals (specifically cBioPortal) were formatted in standardised ways. Different points of data ingestion required different metadata formats. For example, each data portal has its own specific format for metadata, and without a streamlined way to generate these metadata files, users would struggle to verify and upload their data correctly. This lack of consistency could lead to errors in data ingestion and disorganisation in REDMANE’s database.

We solved a part of this by developing a script for registering files onto the REDMANE data registry. This script scans a specified local directory, extracts relevant metadata, and compiles it into a JSON file summary to be uploaded to the registry. This JSON report ensures that the metadata is ingestible to the registry’s standard, which was designed in collaboration with the REDMANE Web Development team. We also looked at converting our JSON report into RO-Crate.

While we gained technical insights into data organisation, including the importance of aligning data ingestion processes with the needs of both users and system components, our key takeaways were among our soft skills. Our work heavily interlinks with other REDMANE processes, enforcing collaboration between the teams implementing those processes. We found that efficient cross-team communication is necessary to create solutions that work seamlessly across an ecosystem. Beyond that, our experience taught us that a deep understanding is crucial for successful implementation. By first addressing the most challenging aspect—grasping the problem’s context at a high level—we can implement a solution more efficiently.

Key links

Final presentation slides
Final presentation video
GitHub repos
Technical Diary
Weekly Reports Folder
Project Management Tools
- WEHI REDMANE Data Ingestion Main Directory

REDMANE Demo and Quality

The challenge we were trying to solve was bridging the gap between proof-of-concept and production by establishing a permanent demo environment that would ensure long-term availability. This environment was designed to mimic the REDMANE ecosystem, allowing other teams to deploy, test, and refine their code in a controlled setting. Key objectives included creating non-expiring VMs, improving quality control through cross-team collaboration, and providing support for integrating new code into the demo environment.

The way we tried to solve this was by setting up permanent VMs on Nectar Research Cloud and making them externally accessible. We dockerised the data registry, implemented SSL connections, set up a Keycloak instance to manage authentication, and tested using synthetic datasets provided by other teams. Our work also included documenting App Store standards to define the criteria for integrating new data portals into the REDMANE ecosystem.

What we learned through this process extended beyond technical skills to include cross-team collaboration and project coordination. We gained hands-on experience with Docker, CI/CD, Nectar VMs, Keycloak, and security protocols, deepening our understanding of software deployment. Working across teams emphasised the importance of clear communication and alignment, ensuring smooth operation within the REDMANE ecosystem. Additionally, we found that having a high-level understanding of system architecture was crucial for effectively implementing solutions and setting up a cohesive demo environment.

Key links

REDMANE Omero Data Portal

The challenge we were trying to solve was to adapt OMERO to meet the app store requirements outlined by the Demo and Quality teams. This involved automating OMERO’s deployment, integrating secure OIDC-based authentication for seamless Single Sign-On, and efficiently managing image sample data. Achieving consistent configuration across environments while maintaining robust access control in a research setting demanded a secured and scalable approach, including integrating OMERO into the REDMANE project.

The way we tried to solve this was by developing a Docker Compose script to streamline OMERO’s setup and drafting a conceptual Kubernetes installation guide for scalable container orchestration. We integrated OIDC components to synchronize user data between Auth0 and OMERO, enabling users to bypass redundant logins, while concurrently preprocessing and uploading sample images to simulate real-world data workflows.

What we learned was that there are many approaches to deploying OMERO and configuring access controls to meet app store standards. However, hardware constraints limit our options for deployment and data storage, making it challenging—but not impossible—to find the optimal solution. By comparing various options and consulting with other teams, we can refine our approach. Additionally, involving experts in the field could help streamline the process and reduce the time needed to integrate these solutions effectively.

Key links

Final presentation
- Final presentation slides
- Final presentation video
Whiteboard presentation
- Whiteboard presentation video
GitHub repos
- Omero DataPortal
Technical Diary
- NectarVM Setup
- Technical Diary Document

REDMANE Web Dev

The challenge that we were trying to solve was creating a web-based platform for the REDMANE project that facilitates efficient data management and user interaction. The project required a scalable, accessible, and well-integrated frontend and backend infrastructure to support research and development efforts. Additionally, ensuring a smooth development workflow with cloud-based hosting was essential.

The way we tried to solve this was by developing a React-based frontend hosted on a Nectar Virtual Machine (VM), enabling researchers and developers to interact with the system seamlessly. We configured security groups for controlled access, implemented port forwarding to resolve connectivity issues, and deployed the application using Vite for optimal performance. On the backend, we integrated API endpoints to manage authentication, data handling, and application logic, ensuring efficient communication between components.

What we learned was the importance of setting up secure and scalable cloud-based infrastructure to support web applications in a research environment. We gained hands-on experience with Nectar VM configuration, network security management, and optimizing frontend-backend integration for a seamless user experience. Overcoming challenges like port forwarding and deployment issues enhanced our problem-solving skills and deepened our understanding of cloud-hosted web applications.

Key links

REDMANE Workflows

The challenge that we were trying to convert the raw data files to processed and summarised files.

The way we tried to solve this was:

1. Through Nextflow and Seqera on Milton HPC We learned about the common workflow used in the bioinformatics field, Nextflow, and gained experience with an open-source pipeline designed for variant mapping called nf-core/sarek. This pipeline allowed us to efficiently process WGS data by identifying genetic variants from sequencing datasets. Additionally, we deployed the pipeline on Seqera and executed it on Milton HPC.

2. Through Galaxy Galaxy is a user-friendly interface where a lot of bioinformatics tools are available and ready to use.

What we learned include reproducibility and scalability for the two platforms, compared barriers of entry, using both established tools to create workflows and trialing known pipelines. To be able to run Nextflow pipelines using Seqera we explored how to set up the environment on HPC, writing config files, and setting parameters. We’ve also did the conversion from command line manually using packages like bowtie2, samtools and bcftools. We’ve also learned how to effectively communicate across teams, sharing files as well as sourcing information about topics we knew less about in daily stand ups and co-working sessions.

We’ve validated the final output (.vcf files) by visualising it using IGV.

Key links

Student Organiser PDF Coding

The challenge that we were trying to solve was the time-consuming nature of reviewing internship applications. This involved downloading and sifting through numerous PDF resumes, extracting key information such as skills, experience, and education, and then comparing these across applicants. The process is not only slow but also prone to human error and can make it difficult to identify suitable candidates.

The way we tried to solve this was by creating a web-based application using HTML, CSS, and JavaScript. The application provides a user interface where PDF files can be uploaded, viewed, and highlighted. The intended goal for the site is to streamline the resume review process, by easily being able to comment on selected text, and add categories relating to it. Additionally, being able to connect this application directly to the student organiser in order to easily open resumes and look at previous comments is ideal. All tasks were intended to be completed using open-source tools and libraries.

What we learned was open-source software can either be of great help or a major obstacle. On one side, many open-source projects have communities built around them, so one might be able to find answers easily with just a Google search. On the other hand, it can be hard to find solutions to your specific problems because of outdated information or bad documentations. However, as they are open-source you can read through the source code to get your answers.

Key links

Student Organiser RAG LLM

The challenge that we were trying to solve was the difficult navigation of existing WEHI websites, resources and documentation. As a result of this, student interns and open-source contributors found it difficult to quickly answer even basic questions and needed help from a supervisor to be directed in the right direction. Ultimately, this friction significantly slows down the initial onboarding process and thereby delays the whiteboard presentation to Week 4.

The way we tried to solve this was through a Retrieval-Augmented Generation Large Language Model (RAG LLM). By creating a pipeline that connected the user to a vector database of existing WEHI documentation via an LLM, users should be able to seamlessly input their queries into a chatbot interface and retrieve a brief summary response, supported by sources that informed the answer. This would enable the user to quickly answer their logistical questions and thereby focus on the more complex elements in their projects.

What we learned was there are many moving parts to a RAG LLM, all of which can be independently tuned and modified. There is complexity at every stage of the pipeline, therefore we had to critically make judgement on what technologies to implement given our constraints, objectives and high level context. We are excited to see how future cohorts will take the demos that we have made and hopefully create a production-level RAG LLM solution that can be utilised by subsequent cohorts moving forwards.

Key links

Quantum Computing

Coming into this internship with no prior experience in quantum computing, we spent 12 weeks exploring its applications in bioinformatics and machine learning, hoping we could contribute something to WEHI for medical related issues.

We worked extensively with Qiskit and PennyLane, two leading quantum programming frameworks, to understand quantum circuits and algorithms. This involved writing code, debugging quantum programs, and gaining exposure to cloud-based quantum computing. The nature of quantum computing required us to approach problem-solving in a completely new way, as errors and optimizations in quantum systems function differently from classical computing.

However, since we think quantum computing is still in an early stage and remains highly theoretical, we realised that we had to switch our focus to work on educational resources rather than attempting immediate real-world applications. Moreover, recognizing the steep learning curve associated with quantum computing, my team decided to develop an interactive educational platform. We created a website designed to help beginners navigate quantum concepts more effectively. The platform incorporated flip cards, quizzes, and curated learning resources to simplify complex topics and provide an engaging learning experience. This project was particularly meaningful because it allowed us to contribute something tangible to the field; an accessible tool that can help others get started with quantum computing.

Key links

Student Organiser Data Visualisation

This internship at the Walter and Eliza Hall Institute of Medical Research (WEHI) focused on enhancing the Student Intern Organiser, a Flask-based web application. Our goal was to improve user experience through interactive visualisations and better data management tools.

We developed an intuitive dashboard with filtering options, enabling users to track student intake, monitor hours, and analyse trends. Key visualisations, including line graphs and bar charts, provided insights into student distribution and project involvement. We also improved coordinator tools and refined the synthetic dataset for better usability.

Challenges included initially over-focusing on technical details, Git repository management issues, and communication gaps within the team. These were resolved by engaging with our supervisor, refining workflow practices, and improving collaboration through regular updates and discussions.

Key learnings included the importance of clear communication, structured documentation, and proactive problem-solving. We also developed teamwork skills, learned to balance independent work with seeking guidance, and understood the value of high-level planning before diving into technical tasks.

Future improvements could include additional data visualisations, predictive analytics, optimised data export formats, and enhanced mobile responsiveness. Exploring front-end libraries like React.js could further enhance the dashboard’s usability.

This internship provided valuable experience in data visualisation, software development, and teamwork. Overcoming challenges and refining our approach helped us develop crucial problem-solving and communication skills that will benefit future projects.

Key links

REDMANE Capacity Planning (not finalised)

To update this file

Go to Github and fork this repo, make changes, and then do a pull request back to the original repo. This is the file you need.