CBS on real data

less than 1 minute read

The file SCW-11_Bone-marrow.bedtools.counts is “real” count data from a low-coverage, whole genome sequencing run. The counts were computed using ‘bedtools coverage’ and the bins were as described in the discussion of ginkgo binning

The resulting segmentation plot looks like this.

The code to produce this was:

# load the libraries and get set up
import cbs
import pandas as pd
import numpy as np
import seaborn as sns
sns.set_style('darkgrid')

# read in the data, focus on chromosome 1, and drop outliners
df = pd.read_table('SCW-11_Bone-marrow.bedtools.counts',header=None,names=['chr','start','end','counts'])
df1 = df[df['chr']=='chr1']
threshold = np.percentile(df1['counts'].values,95)
df1a = df1[df1['counts']<threshold]
data =df1a['counts'].values


# segment, validate, and draw the figure
L = cbs.segment(data)
S = cbs.validate(data,L)
ax = cbs.draw_segmented_data(data,S,title='Segmentation of counts from chromosome 1')

Jeremy Teitelbaum

CBS on real data

You May Also Enjoy

Updates

Polya’s Urn

deFinetti’s Theorem Part II

deFinetti’s Theorem