转录组结题报告

Sample collection and preparation:

RNA quantification and qualification

1) RNA degradation and contamination was monitored on 1% agarose gels.
2) RNA purity was checked using the NanoPhotometer® spectrophotometer (IMPLEN, CA,USA).
3) RNA concentration was measured using Qubit® RNA Assay Kit in Qubit® 2.0 Flurometer (Life Technologies, CA, USA).
4) RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, CA, USA).

Library preparation for Transcriptome sequencing

A total amount of 3 μg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA) following manufacturer’s recommendations and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer(5X). First strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase(RNase H- ) . Second strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3’ ends of DNA fragments, NEBNext Adaptor with hairpin loop structure were ligated to prepare for hybridization. In order to select cDNA fragments of preferentially 150~200 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3 μl USER Enzyme (NEB, USA) was used with size-selected,adaptor-ligated cDNA at 37°C for 15 min followed by 5 min at 95 °C before PCR. Then PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X)Primer. At last, PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system.

Clustering and sequencing

The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumia) according to the manufacturer’s instructions. After cluster generation, the library preparations were sequenced on an Illumina Hiseq platform and 125 bp/150 bp paired-end reads were generated.

Data Analysis:

Quality control

Raw data (raw reads) of fastq format were filtered by CUTADAPT^[1]. File were then processed by FASTQC^[2] for quality control. Reads containing ploy-N and low quality reads from raw data. At the same time, Q20, Q30 and GC content the clean data were calculated. All the downstream analyses were based on the clean data with high quality. Quantification of gene expression level.

Reads mapping to the reference genome

Reference genome and gene model annotation files were downloaded from UCSC genome browser. Index of the reference genome was built using STAR^[3] and paired-end clean reads were aligned to the reference genome using STAR. HTSeq(v0.6.1)^[4] was used to count the reads numbers mapped to each gene.

Alternative splicing analysis

rMATS(v3.2.1 beta)^[5] was used for identification and differential expression analysis of alternative splicing events. We used hg19 genome as our reference genome ,and transcripts annotation from RefSeq for alternative splicing events annotation. Differential alternative splicing events were accepted if they could achieve an FDR lower than 5%. Functional interpretation of differential expression for alternative splicing events were performed by Database for Annotation, Visualization and Integrated Discovery (DAVID) program. Fisher's exact test p < 0.05 was chosen as the threshold of significant change functions and KEGG pathways.

Differential expression analysis

Differential expression analysis of two conditions/groups (two biological replicates per condition) was performed using the DESeq2^[6] R package(1.18.0). DESeq2 provide statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution. The resulting P-values were adjusted using the Benjamini and Hochberg’s approach for controlling the false discovery rate . Genes with an adjusted P-value < 0.05 found by DESeq2 were assigned as differentially expressed.