Skip to content

Analyzer

The analyzer reverse-engineers Synode configs from existing event data. Point it at your CSV, JSONL, or JSON event logs and it produces a complete configuration with journeys, adventures, actions, datasets, and personas.

Getting Started

  1. Create a Synode project: synode init
  2. Place your event data in the input/ directory
  3. Run synode analyze

The analyzer will:

  • Auto-detect your event schema (Segment, Mixpanel, GA4, or custom)
  • Split events into sessions
  • Discover event patterns and field distributions
  • Build a complete Synode config

Supported Formats

FormatExtensionsAuto-detection
CSV.csvColumn mapping wizard
JSONL.jsonlSegment, Mixpanel, GA4
JSON.jsonSegment, Mixpanel, GA4

For GA4 BigQuery exports, the analyzer automatically flattens event_params arrays and nested device/geo objects into a flat payload.

How It Works

The analyzer runs a 9-stage pipeline:

  1. Ingestion — parse file, detect schema, normalize events
  2. Session detection — group events by session ID or time gap (default 30min)
  3. Action discovery — find unique event names and analyze field distributions
  4. Sequence mining — discover frequent contiguous event sequences (PrefixSpan)
  5. Adventure assembly — group sequences into adventures with bounce, wait, and timing stats
  6. Journey assembly — group adventures into journeys with weights and prerequisites
  7. Dataset extraction — find entity tables (products, categories) from payload ID patterns
  8. Persona extraction — detect user-level attribute distributions (locale, device, country)
  9. Compilation — generate TypeScript config files

What Gets Extracted

The analyzer computes all of these from your data:

Config elementHow it's detected
Journey weightsRelative session frequency per journey
Bounce ratesDrop-off rates at journey, adventure, and action level
Wait timesMedian time gaps between consecutive actions
PrerequisitesJourney ordering patterns (>80% co-occurrence)
CooloffsMinimum repeat interval per user
weighted() fieldsCategorical values with observed probabilities
oneOf() fieldsUniform categorical distributions
range() fieldsNumeric min/max bounds
chance() fieldsBoolean true rates
DatasetsEntity-like data identified by _id fields with attributes
Persona attributesUser-level fields (locale, device, country) with distributions

Intermediate Format

The analyzer produces an intermediate JSON file (input/analysis.json) before compiling to TypeScript. This file is a complete, serializable representation of the detected config:

json
{
  "version": "1.0",
  "meta": { "sourceFiles": ["events.csv"], "eventCount": 50000, "uniqueUsers": 1200 },
  "journeys": [{ "id": "...", "weight": 0.7, "bounceChance": 0.15, "adventures": [...] }],
  "datasets": [{ "id": "products", "count": 50, "fields": { "name": { "generator": "oneOf", "options": [...] } } }],
  "persona": { "attributes": { "locale": { "generator": "weighted", "options": { "en-US": 0.6, "de-DE": 0.25 } } } },
  "simulation": { "users": 1200, "lanes": 4, "tick": "1m" }
}

You can edit this file manually and re-compile with synode analyze --compile-only.

AI Enhancement (Optional)

The analyzer can optionally use AI (Claude or Gemini) to refine the generated config:

  • Name refinement — suggest human-readable journey/adventure/action names
  • Condition inference — detect logical prerequisites between flows
  • Dataset identification — find entity patterns in payloads
  • Config review — holistic quality check

Each AI call requires explicit user approval. A security notice is shown before any data is sent. No raw event data is transmitted — only aggregated structure and statistics.

Supported providers: Anthropic (Claude), Google (Gemini). Install the provider SDK you need:

bash
# For Anthropic
npm install ai @ai-sdk/anthropic

# For Google
npm install ai @ai-sdk/google

Set the API key via environment variable or enter it when prompted:

bash
export ANTHROPIC_API_KEY=sk-...
# or
export GOOGLE_API_KEY=...

CLI Reference

bash
# Full analysis with interactive wizard
synode analyze

# Re-compile from existing analysis.json
synode analyze --compile-only
FlagDescription
--compile-onlySkip analysis, re-compile from existing input/analysis.json

Programmatic API

All pipeline stages are available as standalone functions:

typescript
import {
  parseFile,
  detectSchema,
  normalizeEvents,
  detectSessions,
  discoverActions,
  mineSequences,
  assembleAdventures,
  assembleJourneys,
  extractDatasets,
  extractPersona,
  compileToTypeScript,
} from '@synode/analyzer';

// Parse and normalize
const rows = await parseFile('input/events.csv');
const schema = detectSchema(rows);
const events = normalizeEvents(rows, schema.mapping);

// Detect sessions and analyze
const sessions = detectSessions(events, { mode: 'timeGap', gapMs: 30 * 60 * 1000 });
const actions = discoverActions(sessions);
const sequences = mineSequences(sessions, { minSupport: 0.1 });

// Assemble structure
const adventures = assembleAdventures(sequences, sessions, actions);
const journeys = assembleJourneys(adventures, sessions);
const datasets = extractDatasets(sessions);
const persona = extractPersona(sessions);

// Compile to TypeScript files
const files = compileToTypeScript({ version: '1.0', meta: { ... }, journeys, datasets, persona, ... });

See the API Reference for detailed type information.