CBS on real data

less than 1 minute read

The file SCW-11_Bone-marrow.bedtools.counts is “real” count data from a low-coverage, whole genome sequencing run. The counts were computed using ‘bedtools coverage’ and the bins were as described in the discussion of ginkgo binning

The resulting segmentation plot looks like this.

segmented data

The code to produce this was:

# load the libraries and get set up
import cbs
import pandas as pd
import numpy as np
import seaborn as sns
sns.set_style('darkgrid')

# read in the data, focus on chromosome 1, and drop outliners
df = pd.read_table('SCW-11_Bone-marrow.bedtools.counts',header=None,names=['chr','start','end','counts'])
df1 = df[df['chr']=='chr1']
threshold = np.percentile(df1['counts'].values,95)
df1a = df1[df1['counts']<threshold]
data =df1a['counts'].values


# segment, validate, and draw the figure
L = cbs.segment(data)
S = cbs.validate(data,L)
ax = cbs.draw_segmented_data(data,S,title='Segmentation of counts from chromosome 1')

Updated: