|Instructor||Stephanie Le Gras|
|Content||Practical session on basic analysis using Galaxy (Hands-on)|
Transcription factor PU.1 is a protein that is encoded by the SPI1 gene in humans. This gene encodes an ETS-domain transcription factor that activates gene expression during myeloid and B-lymphoid cell development (see genecard of this gene).
We are going to use ChIP-seq data for PU1 transcription factor in mouse. The goal of ChIPseq experiments is to identify DNA regions bound by proteins of interest. Proteins are usually bound to DNA motifs. The topic of this pratical session is to identify the underlying motif for PU.1 binding sites.
These data have been published in this study:
Heinz S, Benner C, Spann N, Bertolino E et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 2010 May 28;38(4):576-89. PMID: 20513432
This GEO dataset contains peak coordinates of PU1 binding sites. Original data are based on the mouse genome assembly mm8. Coordinates have been transformed into mm9 coordinates to ease the analysis.
The dataset is of this form:
chr1 193580486 193580686 chr1-193457322-0 191 + chr1 64972363 64972563 chr1-64860165-0 183 + chr1 134238383 134238583 chr1-134169452-0 179 + chr1 51991430 51991630 chr1-51879231-0 177 + chr1 53880739 53880939 chr1-53768540-0 176 + chr1 130487423 130487623 chr1-130418492-0 175 + chr1 99556072 99556272 chr1-99490001-0 174 + (...)
This is a BED file. It contains one line per PU.1 peak which have been identified. Here is the description of the columns:
During the pratical session, we are going to use the GalaxEast platform (http://www.galaxeast.fr).
2.1 - Estimate the size of the peaks contained in the file GSM537989_Sample7.Bcell-PU.1.mm9.bed.
2.2 - Plot an histogram of the size of the peaks.
Randomly select 100 lines from the dataset GSM537989_Sample7.Bcell-PU.1.mm9.bed.
Extract nucleotide sequences from the genomic coordinates of the 100 PU.1 peaks extracted in the previous step.
Detect de novo motifs within the nucleotide sequences extracted from the 100 PU.1 peaks.
Extract a workflow from your history and save this workflow as “Motif Detection”. Keep neither the “Compute” nor the “Histogram” step.
Change history name to “Motif Detection” and add the tags “chip-seq, meme, motif” to the history.
Copy the file GSM537989_Sample7.Bcell-PU.1.mm9.bed to a new history called “Comparison of motif detection”. Use this new history.
Run the workflow “Motif Detection” twice on the file GSM537989_Sample7.Bcell-PU.1.mm9.bed.