strict-omics · project lane

LLM proposes. Deterministic gates decide.

Run the first four ingestion gates in your browser right now — parse, QC, species gate, trim. Production alignment, RO-Crate provenance, and the full pipeline run on our Nextflow / Snakemake backend.

Fail-closed ingestion gateBrowser-side QC + speciesContainer-pinned productionRO-Crate provenance
Up to 200K reads processed locally per run. No upload. No server pipeline call.Have a CSV instead? Try the multi-omics tool →

Who it is for

Research teams that need a transcriptomics pipeline they can defend: deterministic species and platform gates, audit-grade provenance, and a clean handoff to downstream analysis or manuscript figures.

What we do

We run a two-branch transcriptomics factory (microarray vs RNA-seq) behind a fail-closed ingestion gate. The LLM proposes candidate datasets and metadata; a Pydantic-validated gate decides what enters the pipeline.

What you get

A versioned run manifest, an RO-Crate provenance bundle, MultiQC-aggregated QC, a species-verified sample list, container-pinned preprocessing outputs, and a decision-ready brief you can act on.

Operating steps

The workbench above covers steps 1, 2, and 5. Steps 3 and 4 (alignment and containerised production QC) run on the backend.

01

Ingestion gate

Repository metadata, platform/assay fields, publication cross-check. Fail-closed: conflicting evidence moves to manual review or rejection. The browser workbench above runs the first 4 gates locally.

02

Empirical species check

FastQ Screen / Kraken2 for RNA-seq; verifyBamID2 + CrosscheckFingerprints for human samples. The browser workbench uses a k-mer index to fail-closed if the dominant species is not on the allow-list.

03

Technology-specific branch

Microarray (RLE, NUSE, percent present) and RNA-seq (FastQC, RNA-SeQC 2, RSeQC) are never mixed. Batch effects are detected before they are corrected. Production runs only.

04

Containerised QC

Pinned Nextflow / Snakemake runs, MultiQC aggregation, batch-aware thresholds. ENCODE-style read depth and replicate standards where applicable. Production runs only.

05

Provenance & handoff

DataLad-versioned data, RO-Crate workflow-run provenance, Git-versioned code, and a decision-ready brief that links every output back to a study accession. Production runs only.

Stack

Production-grade tools, pinned by digest.

Every component is selected for portability, auditability, and the ability to rerun a clean pipeline and reproduce the output.

Nextflow

Production orchestration for HPC, cloud, and workstation.

Snakemake

Leaner alternative, especially for R-heavy custom workflows.

nf-core conventions

Style guide and quality floor for reusable pipelines.

MultiQC

Aggregated QC report across all modules.

DataLad

Git-annex versioning of large raw and derived datasets.

Workflow Run RO-Crate

Captures execution provenance with inputs and outputs.

Pydantic + XML prompts

Local schema validation so the LLM cannot relax scientific constraints.

FastQ Screen / Kraken2

Empirical species verification before alignment.

verifyBamID2

Human-sample contamination and identity check.

Standards we enforce

Repository-native metadata, minimum-information standards, and a bilingual controlled vocabulary. We use these as hard gates, not as guidelines.

MIAME / MINSEQE

Minimum information standards that pin what a usable study must report.

MAGE-TAB / SOFT

Repository-native formats for ArrayExpress, BioStudies, and GEO sample and platform metadata.

ENCODE bulk RNA-seq

Read length >= 50 bp, two or more replicates, ~30M aligned reads, Spearman >= 0.9 isogenic / >= 0.8 anisogenic.

Bilingual controlled vocabulary

Canonical English ontology terms for tissue, disease, strain, and perturbation; Korean mirror for ops.

Boundaries

The LLM proposes candidate inclusions and never relaxes scientific constraints. Final inclusion is a deterministic validator decision.

Microarray and RNA-seq never share a preprocessing branch. Different metadata, raw files, QC, and batch behaviour.

Container digests, reference builds, and annotation releases are pinned in the run manifest. A clean rerun reproduces the output or the pipeline is not yet production.

Route preview

Request a Paid Brief

Send a short note and we will return a route preview, an owner, and a fit score. Project-tier engagements start at \u20A98M.

Which organism and assay type are you working with? (Homo sapiens, Mus musculus, microarray, bulk RNA-seq, single-cell, spatial, etc.)
Do you have raw data, and in what format? (FASTQ / SRA / BAM for RNA-seq, CEL / IDAT for microarray, or processed matrices only.)
What decision does this pipeline need to support? (target ID, cohort selection, manuscript figure, audit-grade dataset, regulatory submission, etc.)
24h response targetShort intake, clear next step

Submissions are routed into the Brown Biotech Notion intake hub.

Triage preview

Send a concise project brief

Share just enough context to route the request well. You'll see the route, owner, approval gate, and next action after submit.

Evidence stack

Upload or describe the artifact(s) that support the brief.