Project Structure
When you download a project or a selection of samples from AmpliconRepository, you receive a compressed archive (.tar.gz). This archive is organized by the AmpliconSuiteAggregator into a standardized structure designed for both human readability and programmatic analysis.
Root Directory: results/
The top-level directory contains the following key files:
aggregated_results.csv: A flat, tabular summary of all samples and features in the project. This is the best starting point for analyzing the dataset in Excel or R.aggregated_results.html: A searchable, interactive HTML table of your results.run.json: A machine-readable JSON file containing all metadata and relative file paths for the project.
Sample Data: samples/
The samples/ directory contains a subdirectory for every sample in the project. Inside each sample folder (e.g., samples/sample1/), you will find:
sample1_AA_results/: An uncompressed directory containing the raw output from AmpliconArchitect (AA), including:*_summary.txt: A text summary of the amplicons found in the sample.*_cycles.txt&*_graph.txt: The bioinformatic reconstructions of each amplicon.*.pdf&*.png: Visualizations of the amplicon structures.
sample1_cnvkit_output.tar.gz: A compressed archive of the CNVkit results.sample1_CNV_CALLS.bed: An uncompressed BED file containing the copy number calls used by AA.- Metadata & Logs:
sample1_run_metadata.json/sample1_sample_metadata.jsonsample1.log: The pipeline execution log for this sample.sample1_timing_log.txt: Performance metrics for the run.
Consolidated Analysis: consolidated_classification/
This directory aggregates results from AmpliconClassifier (AC) across all samples in the project, using the project name as a prefix. More about these files is available from the AC GitHub Readme.
*_result_table.tsv: The authoritative list of all focal amplifications identified across the project.*_amplicon_classification_profiles.tsv: Detailed profiles for each identified amplicon.*_gene_list.tsv: A comprehensive list of genes associated with each identified feature.*_ecDNA_counts.tsv: Summary counts of ecDNA identified in the project.*_ecDNA_context_calls.tsv: Data regarding the genomic context of identified ecDNA.*_feature_basic_properties.tsv&*_feature_entropy.tsv: Metrics regarding the complexity and properties of identified features.- Subdirectories:
*_annotated_cycles_files/: Contains cycle files annotated with gene information.*_classification_bed_files/: Contains BED files for each identified feature.*_SV_summaries/: Summaries of structural variants associated with the features.
Other Files: other_files/
If you included supplementary data in an AUX_DIR during upload (e.g., FISH images, pathology reports), those files will be consolidated here.