Documentation Center

  • Trial Software
  • Product Updates

getCounts

Class: BioMap

Return count of read sequences aligned to reference sequence in BioMap object

Syntax

Count = getCounts(BioObj, StartPos, EndPos)
GroupCount = getCounts(BioObj, StartPos, EndPos, Groups)
GroupCount = getCounts(BioObj, StartPos, EndPos, Groups, R)
... = getCounts(..., Name,Value)

Description

Count = getCounts(BioObj, StartPos, EndPos) returns Count, a nonnegative integer specifying the number of read sequences in BioObj, a BioMap object, that align to a specific range or set of ranges in the reference sequence. The range or set of ranges are defined by StartPos and EndPos. StartPos and EndPos can be two nonnegative integers such that StartPos is less than EndPos, and both integers are smaller than the length of the reference sequence. StartPos and EndPos can also be two column vectors representing a set of ranges (overlapping or segmented).

By default, getCounts counts each read only once. Therefore, if a read spans multiple ranges, that read instance is counted only once. When StartPos and EndPos specify overlapping ranges, the overlapping ranges are considered as one range.

GroupCount = getCounts(BioObj, StartPos, EndPos, Groups) specifies Groups, a row vector of integers or strings, the same size as StartPos and EndPos. This vector indicates the group to which each range belongs. GroupCount is a column vector containing a number of elements equal to the number of unique elements in Groups. GroupCount specifies the number of reads that align to each group, in the ascending order of unique groups in Groups.

Each group is treated independently. Therefore, a read can be counted in more than one group.

GroupCount = getCounts(BioObj, StartPos, EndPos, Groups, R) specifies a reference for each of the segmented ranges defined by StartPos, EndPos, and Groups.

... = getCounts(..., Name,Value) returns counts with additional options specified by one or more Name,Value pair arguments.

Input Arguments

BioObj

Object of the BioMap class.

StartPos

Either of the following:

  • Nonnegative integer that defines the start of a range in the reference sequence. StartPos must be less than EndPos, and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the start of a range in the reference sequence.

EndPos

Either of the following:

  • Nonnegative integer that defines the end of a range in the reference sequence. EndPos must be greater than StartPos, and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the end of a range in the reference sequence.

Groups

Row vector of integers or strings, the same size as StartPos and EndPos. This vector indicates the group to which each range belongs.

R

Vector of positive integers indexing the SequenceDictionary property of BioObj, or a cell array of strings specifying the actual names of references. R must be ordered and have the same number of elements as the unique elements in Groups. If R has the same number of elements as Groups, then all of the entries in R for each unique value in Groups must be the same.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'Independent'

Logical that specifies whether to treat the ranges defined by StartPos and EndPos independently. If true, Count is a column vector containing the same number of elements as StartPos and EndPos. In this case, a read that spans multiple ranges, is counted once in each range.

    Note:   This name-value pair argument is ignored when using the Groups input argument, because getCounts assumes that each group of ranges is independent.

Default: false

'Overlap'

Specifies the minimum number of base positions that a read must overlap in a range or set of ranges, to be counted. This value can be any of the following:

  • Positive integer

  • 'full' — A read must be fully contained in a range or set of ranges to be counted.

  • 'start' — A read's start position must lie within a range or set of ranges to be counted.

Default: 1

'Spliced'

Logical specifying whether short reads are spliced during mapping (as in mRNA-to-genome mapping). N symbols in the Signature property of the object are not counted.

Default: false

'Method'

String specifying the method to measure the abundance of reads. Choices are:

  • 'raw' — Raw counts

  • 'rpkm' — Counts of reads per kilobase pairs per million aligned reads

  • 'mean' — Average coverage depth computed base-by-base

  • 'max' — Maximum coverage depth computed base-by-base

  • 'min' — Minimum coverage depth computed base-by-base

  • 'sum' — Sum of all aligned bases in all the reads

Default: 'raw'

Output Arguments

Count

Either of the following:

  • When Independent is false, this value is a nonnegative integer. The integer specifies the number of reads that align to a range or set of ranges (overlapping or segmented) of the reference sequence in BioObj, a BioMap object. Each read is counted only once, even if the read spans multiple ranges.

  • When Independent is true, this value is a column vector of nonnegative integers. This vector indicates the number of reads that align to the independent ranges specified by StartPos and EndPos. This vector contains the same number of elements as StartPos and EndPos.

GroupCount

Column vector containing a number of elements equal to the number of unique elements in Groups. The vector specifies the number of reads that align to each group, in the order of unique groups in Groups. The groups of ranges are treated independently. Therefore, a single read can be counted in more than one group.

Examples

Construct a BioMap object, and then return the number of reads that align to at least one base position in two ranges of the reference sequence:

% Construct a BioMap object from a SAM file 
BMObj1 = BioMap('ex1.sam');
% Return the number of reads that align to the segmented range 1:50 and 71:100
counts_1 = getCounts(BMObj1,[1;71],[50;100])
counts_1 =

    37
 

Construct a BioMap object, and then return the number of reads that align to at least one base position in two independent ranges of the reference sequence:

% Construct a BioMap object from a SAM file 
BMObj1 = BioMap('ex1.sam');
% Return the number of reads that align to each of the ranges,
% 1:50 and 71:100, independent of each other
counts_2 = getCounts(BMObj1,[1;71],[50;100],'independent',true)
counts_2 =

    20
    21

Notice that the total number of reads reported in counts_2 is greater than the number of reads reported in counts_1. This difference occurs because there are four reads that span the two ranges, and are counted twice in the second example.

 

Construct a BioMap object, and then return the number of reads that align to two separate groups of ranges of the reference sequence:

% Construct a BioMap object from a SAM file 
BMObj1 = BioMap('ex1.sam');
% Return the number of reads that align to a group containing range 30:60,
% and also the number of reads that align to a group containing range 1:10
% and range 50:60
counts_3 = getCounts(BMObj1,[1;30;50],[10;60;60],[2 1 2])
counts_3 =

    25
    22
 

Construct a BioMap object, and then return the total number of reads aligned to the reference sequence:

% Construct a BioMap object from a SAM file 
BMObj1 = BioMap('ex1.sam');
% Return the number of sequences that align to the entire reference sequence
getCounts(BMObj1,min(getStart(BMObj1)),max(getStop(BMObj1)))
ans =

        1482

See Also

| | | | | |

How To

Related Links

Was this topic helpful?