### Onur Yukselen Notes for gtf files: May.28.2019 ## GRCz11refSeqUcsc.gtf.gz## 1.Transcripts whose transcript id defined in two different chromosomes are removed from GRCz11refSeqUcsc.gtf.gz. 2.Removed duplicate lines from GRCz11refSeqUcsc.gtf.gz file ## GRCz11ensembl95ucsc.gtf.gz ## Following gene names replaced with following pattern :%s/ (1 of many)/oneOfMany/g gene_name "ABCD2 (1 of many)"; gene_name "ADAM12 (1 of many)"; gene_name "ANO2 (1 of many)"; gene_name "ANXA1 (1 of many)"; gene_name "ARF5 (1 of many)"; gene_name "ARHGAP22 (1 of many)"; gene_name "CACNA2D3 (1 of many)"; gene_name "CALHM2 (1 of many)"; gene_name "CCR8 (1 of many)"; gene_name "CDHR1 (1 of many)"; gene_name "CELA1 (1 of many)"; gene_name "CEP170B (1 of many)"; gene_name "CMKLR1 (1 of many)"; gene_name "COLGALT1 (1 of many)"; gene_name "COQ8A (1 of many)"; gene_name "COX7A2 (1 of many)"; gene_name "CSDC2 (1 of many)"; gene_name "DNAL1 (1 of many)"; gene_name "DOCK4 (1 of many)"; gene_name "DPYSL2 (1 of many)"; gene_name "EBF1 (1 of many)"; gene_name "EIF3J (1 of many)"; gene_name "ERBB4 (1 of many)"; gene_name "F13A1 (1 of many)"; gene_name "F7 (1 of many)"; gene_name "FAM107A (1 of many)"; gene_name "FAM163B (1 of many)"; gene_name "FERMT3 (1 of many)"; gene_name "FITM1 (1 of many)"; gene_name "FOXN2 (1 of many)"; gene_name "GAA (1 of many)"; gene_name "GABRB2 (1 of many)"; gene_name "GABRR1 (1 of many)"; gene_name "GBGT1 (1 of many)"; gene_name "GPR62 (1 of many)"; gene_name "HTRA2 (1 of many)"; gene_name "ISCU (1 of many)"; gene_name "LECT2 (1 of many)"; gene_name "MAPK8IP1 (1 of many)"; gene_name "MARCH4 (1 of many)"; gene_name "MCHR2 (1 of many)"; gene_name "MFAP4 (1 of many)"; gene_name "MIR19B2 (1 of many)"; gene_name "MIR22 (1 of many)"; gene_name "MIRLET7A3 (1 of many)"; gene_name "MPP4 (1 of many)"; gene_name "NAMPT (1 of many)"; gene_name "NAT16 (1 of many)"; gene_name "NAV1 (1 of many)"; gene_name "NCEH1 (1 of many)"; gene_name "NPC2 (1 of many)"; gene_name "NUP85 (1 of many)"; gene_name "OSCP1 (1 of many)"; gene_name "PAMR1 (1 of many)"; gene_name "PARD3 (1 of many)"; gene_name "PCMTD2 (1 of many)"; gene_name "PLPP7 (1 of many)"; gene_name "POLR2E (1 of many)"; gene_name "PSMD4 (1 of many)"; gene_name "RBP1 (1 of many)"; gene_name "RFESD (1 of many)"; gene_name "RGS13 (1 of many)"; gene_name "RHOU (1 of many)"; gene_name "RNF14 (1 of many)"; gene_name "RNF180 (1 of many)"; gene_name "SCCPDH (1 of many)"; gene_name "SDK2 (1 of many)"; gene_name "SERPINB8 (1 of many)"; gene_name "SLA (1 of many)"; gene_name "SLC16A6 (1 of many)"; gene_name "SLC22A7 (1 of many)"; gene_name "SLC29A4 (1 of many)"; gene_name "SLC45A4 (1 of many)"; gene_name "SLC46A3 (1 of many)"; gene_name "SLC7A1 (1 of many)"; gene_name "SLCO3A1 (1 of many)"; gene_name "SP5 (1 of many)"; gene_name "TCIM (1 of many)"; gene_name "TMEM179 (1 of many)"; gene_name "TMEM196 (1 of many)"; gene_name "TMEM235 (1 of many)"; gene_name "TSTA3 (1 of many)"; gene_name "USP53 (1 of many)"; gene_name "UST (1 of many)"; gene_name "ZNF521 (1 of many)"; # Rui notes 19/05/21: - mv Danio_rerio.GRCz11.95.ucsc.gtf.gz GRCz11ensembl95ucsc.gtf.gz - mv GCF_000002035.6.106.ucsc.gtf.gz GRCz11refSeqUcsc.gtf.gz ---------------- Alper, We have a project trying to improve the 3’ UTR transcript annotation in zebrafish. For this we need to test RNA-seq mapping outcomes on several different annotations that we generate and compare them to the original RefSeq and ENSEMBL annotations. Can we get some additional annotations put into DolphinNext so we can start mapping data onto them? Right now we want to start with our initial annotations. Rui and/or Julie (both cc’d) can coordinate this. Julie and Rui - can you please make sure these files go into the following directory: /nl/umw_nathan_lawson/pub/UCSC_tracks/GRCz11/Improved3pUTRannotations/StartingAnnotations There seem to be a ton of different .gtf files of various names spread all over my project space - I want to make sure we are all dealing with the same files. Can you please name them as follows: for ENSEMBL annotation (looks like release 95 on UCSC) = GRCz11ensembl95ucsc.gtf for RefSeq (GCF_000002035.6) = GRCz11refSeqUcsc.gtf Let me know if there are any questions. Thanks, Nathan --- Hello Julie, The GTF files that I used to evaluate our annotation is: UCSC: https://www.dropbox.com/s/ses39glskbm28lu/GCF_000002035.6.106.ucsc.gtf.gz?dl=0 ENSEMBL: https://www.dropbox.com/s/e75we5ku1wq0sjv/Danio_rerio.GRCz11.95.ucsc.gtf.gz?dl=0 Should I rename these two copies and put them on /nl/umw_nathan_lawson/pub/UCSC_tracks/GRCz11/Improved3pUTRannotations/StartingAnnotations Best regards, Rui --- Thanks Rui! Nathan, do you need to have the transgenes added to the gtf files created by Rui? If yes, the following file should be used to replace the UCSC version /project/umw_nathan_lawson/Annotation/GRCz11/ucscPlus_cleangtf/GRCz11plus/genes/genes.gtf. Best regards, Julie --- No. Just the native annotation will be fine.