User Tools

Site Tools


training:introduction2galaxyphd

Introduction to Galaxy

Instructor Stephanie Le Gras
Duration 2 hours
Content Description of web interface of Galaxy (Lecture)
Practical session on basic features of Galaxy (Hands-on)
Prerequisites None

Description of the training dataset

We are using RNAseq and chIPseq data from :

Strub, T., Giuliano, S., Ye, T., Bonet, C., Keime, C., Kobi, D., Le Gras, S., Cormont, M., Ballotti, R., and Bertolotto, C. (2011). Essential role of microphthalmia transcription factor for DNA replication, mitosis and genomic stability in melanoma. Oncogene 30, 2319–2332.

RNAseq data

In this study, they compared the transcriptome in melanoma cell lines between cells with an siRNA against MITF or an siRNA against Luciferase (used as a control). The data are given as a TSV (Tab-Separated Values) file which contain the number of reads per genes. Genes are identified by their Ensembl gene IDs.

Data have been analyzed using the Human genome hg19/GRCh37.

Here are the different column of the file to be analyzed:

Ensembl gene id Ensembl gene ID
siLuc2 (raw read counts) Raw read counts - control - 1st biological replicate
siLuc3 (raw read counts) Raw read counts - control - 2nd biological replicate
siMitf3 (raw read counts) Raw read counts - siMITF - 1st biological replicate
siMitf4 (raw read counts) Raw read counts - siMITF - 2nd biological replicate
siLuc2 (normalized) Normalized read counts - control - 1st biological replicate
siLuc3 (normalized) Normalized read counts - control - 2nd biological replicate
siMitf3 (normalized) Normalized read counts - siMITF - 1st biological replicate
siMitf4 (normalized) Normalized read counts - siMITF - 2nd biological replicate
siLuc2 (normalized and divided by gene length in kb) RPK - control - 1st biological replicate
siLuc3 (normalized and divided by gene length in kb) RPK - control - 2nd biological replicate
siMitf3 (normalized and divided by gene length in kb) RPK - siMITF - 1st biological replicate
siMitf4 (normalized and divided by gene length in kb) RPK - siMITF - 2nd biological replicate
log2(siMitf/siLuc) Log2 Fold change (siMitf/siLuc)
P-value (siMitf vs siLuc) P-value (siMitf/siLuc)
Adjusted p-value (siMitf vs siLuc) Adjusted p-value - FDR (siMitf/siLuc)

ChIPseq data

In this study, they did a chIPseq on the transcription factor MITF. The data file we are using is a fastq file (see http://fr.wikipedia.org/wiki/FASTQ) which contain a selection of reads mapped on chromosome 2.

Practical session

During the pratical session, we are going to use the GalaxEast platform (http://use.galaxeast.fr).

1 Log in to GalaxEast

Answer

2 History

2.1 Create a new history

2.2 Change the name of the history

Change the name of the history to “prepare RNA-seq data“.

Answer

3 Import data into Galaxy

3.1 Import a file from your computer

Upload the file S12040_genesdiff.txt into Galaxy.

Hints:

  • There is no need to uncompress the file before loading it to GalaxEast.
  • Format of the file: tabular

Answer

3.2 Import files from a data library

Import from the Shared data menu (top menu) > Data libraries > Introduction 2 Galaxy (datasets), the dataset Homo_sapiens_genes_(GRCh37).txt

Answer

4 Running tools

4.1 Join two files based on a common field

Annotate the S12040_genediff.txt with the Associated Gene Name, the Gene Biotype and the Gene Description found in the file Homo_sapiens_genes_(GRCh37).txt. Use the common field « Ensembl gene id » to join the two tables.

Help:

  • Use the tool Join two datasets (section: Join, Substract and Group)
  • Look at the numbers of the shared columns in both datasets
  • The first file to join is S12040_genediff.txt
  • Join the two files using the common field (Ensembl gene id).

Answer

4.2 Select columns from a dataset

There are now two “Ensembl gene id” column. Remove the one located right before the “Associated Gene Name” column.

Help:

  • Use the tool Cut (section: Text Manipulation)
  • Identify the number of the second “Ensembl gene id” column
  • Find the total number of columns in the dataset
  • You have to tell the tool which columns are to be kept (not which column to be removed)
  • Check out examples below the tool form

Answer

4.3 Find a pattern in a column

Is there any LincRNA (long non coding RNA) in the data? If so, how many genes are annotated as lincRNA.

Help:

  • Use the tool Filter (section: Filter and sort)
  • Find the number of the column containing the Gene Biotypes
  • Gene Biotype should be exactly equal to “lincRNA

Answer

5 Rename a dataset

Rename the last dataset to « final data ».

Answer

6 Create a workflow

We have chIP-seq data in line with these RNAseq data. The chIP-seq data are raw sequencing data. Create a workflow which goes frow raw data to peak annotation. Here are the different steps to follow:

  1. Input dataset
  2. Map with Bowtie for Illumina
    1. “Select a reference genome” should be set at runtime
    2. output (sam) is the output to link to the next tool
  3. MACS14 Model-based Analysis of ChIP-Seq (1.4.2)
    1. There is no control file;
    2. “Effective genome size” should be set at runtime
    3. “Tag size” should be set at runtime
    4. output_bed_file (bed) is the output to link to next tool
  4. homer_annotatePeaks
    1. “Genome version” should be set at runtime

nb:

  • The goal of a workflow is to link output of one tool to the input of next tool(s)
  • Tools and input file are found in the left panel (use the search field to faster find tools)
  • Tool parameters are set in the right panel
  • Two tools can be linked if the output format of first tool and the input format of next tool matches. In this case, the arrow linking the two becomes green.

Answer

7 Run the workflow

Import from the Shared data menu (top menu) > Data libraries > Introduction 2 Galaxy (datasets), the dataset MITF(2)-chr2.fastq. Run the workflow created on question 7. Here are the parameters to use:

  • Step 2: Select a reference genome: hg19
  • Step 3 (1): Effective genome size: 182400000
  • Step 3 (2): tag size: 54
  • Step 4: genome version: hg19

Can't run the workflow on MITF(2)-chr2.fastq?

Answer

training/introduction2galaxyphd.txt · Last modified: 2019/04/04 15:17 by slegras