Workflows with Snakemake
Researchers needing to implement data analysis workflows face a number of common challenges, including the need to organise tasks, make effective use of compute resources, handle any errors in processing, and document and share their methods. The Snakemake workflow system provides effective solutions to these problems. By the end of the course, you will be confident in using Snakemake to run real workflows in your day-to-day research.
Snakemake workflows are described by special scripts that define steps in the workflow as rules, and these are then used by Snakemake to construct and execute a sequence of shell commands to yield the desired output. Re-calculation of existing results is avoided where possible, so you can add or update input data, then efficiently generate an updated result. Workflows can be seamlessly scaled to server, cluster, grid and cloud environments without the need to modify the workflow definition.
This course is primarily intended for researchers who need to automate data analysis tasks for biological research involving next-generation sequence data, for example RNA-seq analysis, variant calling, CHIP-Seq, bacterial genome assembly, etc. However, Snakemake has many uses beyond this and the course does not assume any specialist biological knowledge. The language used to write Snakemake workflows is Python-based, but no prior knowledge of Python is required or assumed either. We do require that attendees must have familiarity with using the Linux command line (pipes, redirects, variables, …).
Ed-DaSH
Ed-DaSH is a Data Science training programme for Health and Biosciences. The team has developed new lessons using The Carpentries platform. See workshops for dates and registration details. All Ed-DaSH workshops are delivered remotely.