The BEDTools allow a fast and flexible way of comparing large datasets of genomic features. The BEDtools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. Its name is due to an historical reason because nowadays they can process the most commonly used feature file formats like: BED, GFF, VCF, and SAM. The following are examples of common questions that one can address with BEDTools:
- Which SNPs are in a coding region?
- Which are the exonic and intronic coverages?
- How many positions have a coverage greater than 8?
- Which SNPs are shared by two predictions done by two different SNP callers?
The following notes a partially taken from the BEDTools manual. It is important to read this manual before using BEDTools in a real question.
Summary of the tools
Some of the tools included are:
- intersectBed
- Returns overlapping features between two BED/GFF/VCF/BAM files.
- windowBed
- Returns overlapping features between two BED/GFF/VCF/BAM files within a “window”.
- closestBed
- Returns the closest feature to each entry in a BED/GFF/VCF file.
- coverageBed
- Summarizes the depth and breadth of coverage of features in one BED/GFF/BAM file (e.g., aligned reads) relative to another (e.g., user-defined windows).
- genomeCoverageBed
- Histogram or a “per base” report of genome coverage.
- subtractBed
- Removes the portion of an interval that is overlapped by another feature.
- mergeBed
- Merges overlapping features into a single feature.
- fastaFromBed
- Creates FASTA sequences from BED/GFF intervals.
- maskFastaFromBed
- Masks a FASTA file based upon BED/GFF coordinates.
- overlap
- Computes the amount of overlap (positive values) or distance (negative values) between genome features.
- Examples are here.