Run CamoTSS
CamoTSS includes two kind of modes : TC mode and CTSS mode.
The input files include:
alignment file (bam file)
annotation file (gtf file)
cell list file and reference genome file (fasta file)
cell barcode list file (csv file)
The output files include:
cell by all TSSs matrix (h5ad)
cell by two TSSs matrix (h5ad)
cell by CTSS matrix (h5ad)
cell by CTSS matrix (h5ad)
ref file (reference gene and TSS csv)
Here is a quick test file. You can check it.
Download test file
You can download test file from onedrive.
Here, you can download some large file include genome.fa, possorted_genome_bam_filtered.bam.
Alternatively, you can also download the reference genome fasta file from Ensembl or Genecode or website of 10x Genomics.
Run CamoTSS
Here are two modes in CamoTSS : TC and CTSS.
You can run CamoTSS by using test file according to the following code.
#!/bin/bash
gtfFile= $CamoTSS/test/Homo_sapiens.GRCh38.105.chr_test.gtf
fastaFile = $download/genome.fa
bamFile= $download/possorted_genome_bam_filtered.bam
cellbarcodeFile=$CamoTSS/test/cellbarcode_to_CamoTSS
CamoTSS --gtf gtfFile --refFasta fastaFile --bam bamFile -c cellbarcodeFile -o CamoTSS_out --mode TC
Options
There are more parameters for setting (CamoTSS -h always give the version
you are using):
Usage: CamoTSS [options]
Options:
-h, --help show this help message and exit
-g GTF_FILE, --gtf=GTF_FILE
The annotation gtf file for your analysing species.
-c CDRFILE, --cellbarcodeFile=CDRFILE
The file include cell barcode which users want to keep
in the downstream analysis.
-b BAM_FILE, --bam=BAM_FILE
The bam file of aligned from Cellranger or other
single cell aligned software.
-o OUT_DIR, --outdir=OUT_DIR
The directory for output [default : $bam_file]
-r REFFASTA, --refFasta=REFFASTA
The directory for reference genome fasta file
-m MODE, --mode=MODE You can select run by finding novel TSS cluster mode
[TC]. If you also want to detect CTSS within one
cluster, you can use [CTSS] mode
Optional arguments:
--minCount=MINCOUNT
Minimum UMI counts for TC in all cells [default: 50]
-p NPROC, --nproc=NPROC
Number of subprocesses [default: 4]
--maxReadCount=MAXREADCOUNT
For each gene, the maxmium read count kept for
clustering [default: 10000]
--clusterDistance=CLUSTERDISTANCE
The minimum distance between two cluster transcription
start site [default: 300]
--InnerDistance=INNERDISTANCE
The resolution of each cluster [default: 100]
--windowSize=WINDOWSIZE
The width of sliding window [default: 15]
--minCTSSCount=MINCTSSCOUNT
The minimum UMI counts for each CTSS [default: 100]
--minFC=MINFC The minimum fold change for filtering CTSS [default:
6]