Using build GRCh37 (hg19)  Platform Documentation: Setting Up and Starting a CNV analysis

Setting Up and Running a CNV-analysis

Running a CNV analysis on this platform takes roughly 3 steps. We will go through all of them here. The instructions are similar for running the seperate algorithms or for running a standard combined analysis.

Step 1: Generating the Input Files

There are two files that need to be created. First, a file containing sample genders, identifiers, and some other information, and second, a file containing the actual intensity, and genotype data. When you have access to BeadStudio or GenomeStudio, these files can easily be extracted. When you only have Full Call Reports available, you will need to reformat those using provided scripts.

NOTICE: samplenames should only contain alphanumeric symbols (AZ09_) !

a) Extract From BeadStudio / GenomeStudio

Open the column chooser (Screenshot) for the 'Full Data Table' ('1' in the image on the left).
Retain only the following 'Displayed Columns':
- Name
- Chr
- Position
- Sample Names to be included
Next set the following 'Displayed Subcolumns':
- Log R Ratio
- B Allele Freq
- GType

Finally, set the following 'Displayed Columns' for the 'Samples Table' using the column chooser (Screenshot) (under '2') :
- SampleID
- Gender
- Call Rate
- Index
- Array Info.Sentrix ID
- ArrayInfo.Sentrix Position

We suggest to use a uniform sampleID constitution, consisting of the 4 last digits of the chip barcode, followed by an underscore and a sample/DNA identifier. The sample genders can be estimated from the context menu of the Samples table. Call Rates can be calculated from the calculator icon in the Samples Table toolbar.

Once you have tables like the one shown on the left, use buttons '1' and '2' to select the entire table and export it to a flat text file. Do this for both the Samples Table and the Full Data Table. When prompted to export the entire table instead of only the selected rows and columns, select 'No'.

Make sure no single columns are selected, since this might lead to incomplete files. When a single column is selected, this is indicated by a darkened column header and yellow cells (Screenshot).

b) Compile from Full Call Reports

When you have recieved Full Call Reports (FCR's) from a core facility, you can reformat these results using a few simple scripts using a UNIX commanline, as illustrated in the image on the left.

First, download the scripts here. Extract them into a folder containing the FCR's. Next reformat each FCR to a tabular format using the step1 script (see syntax below). The 'NON-POLY.txt' file is a file containing non-polymorphic probes for the used chiptype. These lists can be obtained here.

We will now combine these output files with the probe details. If this information is not provided with the FCR's, you can obtain it here . Use the step2 script (syntax below), where infiles are the output files from step 1:

Finally, extract the samples table using the step3 script:

Notice: All input files should be converted during the process to UNIX-format. If your output files seem incorrect please use dos2unix command or similar to achieve this manually!

perl NON-POLY.txt INPUTfile OUTPUTfile
perl positions.txt INfiles OUTfile
perl INfile OUTfile

Step 2: Uploading Your Data

To start an analysis, the data has to be sent to the server. If you have a high speed internet connection and the project is small, this can be achieved by the standard analysis form, described below. When you have a slower connection or are uploading large datasets, then you should first upload your data using the seperate upload page, as described here.

Step 3: Setting Up Analysis Details

To start an analysis, click the 'New Analysis' entry in the left panel. You will then be presented with the option to run a combined analysis with preset parameter settings, or to run a QuantiSNP, PennCNV or VanillaICE seperately, which allows you to change certain parameters. We will briefly describe each option here.

It is important to note that only the results from the combined analysis are stored in the database for future reference and will allow searching and sharing of the results. Since the results of each seperate algorithm are also presented as an individual file, we recommend to use this method.

a) Combined analysis

First you need a projectname, which is set by default to the current date and time. Next, define a group. The options for this originated from some projects we were working on when designing this page: Diagnostics (daily hospital routine), Research_MR (Mental Retardation), Research_EP (Epilepsy), Control (healthy / HapMap samples) and Guest (sporadic users with). If need another option, please feel free to suggest.

Thirdly, the files created in step 1 need to be attached to the project. You can upload data from the 'choose file' fields, or select a previously uploaded datafile from the dropdown lists. Pedigree information is optional and can be added in a later stage. More information on this can be found here.

Finally you need to select the correct chiptype. It is important to select the correct type, because chip-specific lists of non-polymorphic probes and population BAF tables are used. The option of the asymmetric filter allows you to retain duplications only called by PennCNV but with high confidence (>20) in the majority vote.

Step 4: Checking Submitted Data

When the data are successfully submitted, a page with an overview of the found samples is presented. Here you can specify for new samples if they should be included in search results (for combined analysis only !). We recommend to exclude samples with low callrates, since they will contain many false positive calls due to the low data quality.

If a seperate algorithm was started, an overview of the parameters settings will also be shown.

If there were problems detected with the input files you will recieve an error with some hints on what might be wrong. If all is allright, press 'start analysis' to proceed.

Step 5: Monitor Progress

When a project is running you can follow the progress as shown on the left (outdated image). On top you can press the 'Results so far' button to go to a graphical overview of the results for finished samples. There will also be a link to check the output files for individual jobs for each sample.

The output of the program itself is printed below. Here you can see which samples are finished, needed GC-correction (which is applied automatically when genomic waving is detected), an how many CNV's were found in each sample.

Until the analysis is finished, this page will be refreshed every 15 seconds. Once it's finished, you will be presented with links to the files containing the results and to pages containing detailed overviews of the results. These links will show up at the top of the page.

You can close CNV-WebStore while the analysis is running. An email will be sent to you once the analysis is finished.

Detailed tips and instructions on browsing the results can be found here.