Six weeks sequence analysis bootcamp

Week 1: Unix environment


  1. Basic command line familiarity
  2. Changing standard input and output. Unix pipes.
  3. File compression tools: tar and gzip
  4. Introduction to some well-known unix tools: more, less, grep etc.
  5. Case studies on fastq files and sam files. For example, how do we get only the nucleotide sequences in a fastq file. How do we get entries having a particular adapter and etc.
  6. Understand how to transfer data across servers
  7. Network protocols: SSH, FTP, SFTP, SCP, CP etc
  8. How to customize bash, define paths and etc.
  9. Unix processes. How to kill a process, send it to background / foreground.

Week 2: Databases, annotations, genomic resources


  1. Understand basic file formats (FASTA, FASTQ , BED, GTF, SAM, BAM, WIG, BIGWIG, BEDGRAPH)
  2. Genomic resources and databases: UCSC, ENSEMBLE, NCBI
  3. Basic manipulation and visualization of these files:
  1. Overlaps, intersections, etc
  2. Enrichment computations?
  1. File conversions: bedgraph to wig, Wig to bigWig etc, sam to bam etc.

Week 3: The Galaxy environment


  1. Getting data in/out (with ftp and uploading from UI)
  2. Doing analysis in Galaxy

Week 4: Putting all together step 1: RNA-Seq data processing


  1. Understanding sequence data
  2. Transferring and manipulating RNA-Seq data
  3. Full data processing
  4. Visualization of processed RNA-Seq data


Week 5: Introduction to R


  1. Understanding R-Studio online
  2. Basic data analysis
  3. How to install and use packages locally
  4. A glimpse of bioconductor?


Week 6: Putting all together step 2: RNA-Seq data analysis

Goals: Using R to analyze RNA-Seq data

  1. Normalizing data
  2. Clustering expression data
  3. Gene ontology analysis
  4. Differential expression
  5. principal component analysis