Logo

WEHI Research Computing Platform

Public page for the WEHI Research Computing Platform (RCP)

View this profile on Github

Genomics Quality Control Student Intern Project

Introduction

Post-sequencing quality control (QC) is an important step in ensuring that the quality of sequencing data meets expectations. Multiple tools exist to evaluate sequencing quality, such as FastQC[1], Fastq Screen[2] and Qualimap[3]. Core sequencing facilities, such as WEHI Genomics, process hundreds of sequencing runs every year. In order to troubleshoot sequencing issues and provide assurance of data quality to collaborators, sequencing facilities must implement robust data management workflows to ensure that QC data processing is automated where possible, is easily accessible and is appropriately archived for posterity. Given a large volume of runs, there exists the possibility of inter-run QC comparisons to better gauge run quality.

MultiQC[4] is a software tool that collates and displays QC information from numerous supported bioinformatics tools. MultiQC operates on a per-run basis, however, MegaQC[5] collates the data from these reports for comparison. However, MegaQC has not been in active development since 2020 and is considered “pretty unstable” by the developers. Additionally, the software lacks functionality that lists extensive project metadata, so that experiments can be more easily tracked and MultiQC reports can be easily located. Lastly, the tool lacks easy exporting of data, so that analyses can be further analysed programmatically.

Given that MegaQC overlaps with many of our use cases, it may be possible to modify and/or extend the software for our use cases. Alternatively, Shiny apps (shiny.rstudio.com) allow for interactive data analysis and provide an ideal platform for QC data collation. Integration with Monday.com and LabArchives used by the Genomics lab may also be of benefit for retrieving and/or depositing QC data and project metadata.

  1. FastQC. “FastQC: a quality control tool for high throughput sequence data.” (2016).
  2. Wingett, Steven W., and Simon Andrews. “FastQ Screen: A tool for multi-genome mapping and quality control.” F1000Research 7 (2018).
  3. Okonechnikov, Konstantin, Ana Conesa, and Fernando García-Alcalde. “Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data.” Bioinformatics 32.2 (2016): 292-294.
  4. Ewels, Philip, et al. “MultiQC: summarize analysis results for multiple tools and samples in a single report.” Bioinformatics 32.19 (2016): 3047-3048.
  5. https://megaqc.info/

Duties while on placement

The Research Software Engineer Intern role will:

There is also a fair amount of flexibility with this project to allow the student to inject their own ideas and introduce new features and functionality.

Skills and Pre-requisites

Skills and Pre-requisites:

Benefits for students

Benefits for students whilst undertaking the internship include: