CleanStart Source Intelligence Core: The Security Data Engine

Knowledge Hub

Internal system — Source Intelligence Core is the internal engine behind CleanStart Verified Source, CleanStart's external supply chain security offering. The API endpoints documented here are for internal use.

How Source Intelligence Identifies Supply Chain Threats

CleanStart Source Intelligence Core is the central security engine that powers supply chain threat detection. It monitors 24,013 repositories, tracks 809,425+ security advisories, analyzes 281M+ dependency relationships, and enables the four detection layers that identify threats from code injection to zero-day exploits.

This is CleanStart's real-time security Source Intelligence Core—the data foundation that makes everything else possible.

The Source Intelligence Core

CleanStart Source Intelligence Core powers the Continuous Trust Loop — enabling real-time threat detection and automatic remediation across your supply chain.

What It Monitors

CleanStart Source Intelligence Core monitors 24,013 software repositories across GitHub, GitLab, Bitbucket, and open source platforms. It tracks 809,425+ security advisories including 739,944 CVE/NVD-sourced advisories and 9,998 OSV advisories, with real-time updates from the Open Source Vulnerabilities (OSV) database.

The system maintains intelligence on 281 million package-repository relationships across 7 ecosystems: 238M Go, 16.6M Crates, 12.5M npm, 7.2M Maven, 3.4M PyPI, 3.1M RubyGems, and 18.7K C++.

Weekly anomaly detection identifies suspicious activity including code injection patterns, maintainer behavior anomalies, confirmed malicious packages, and other security threats across monitored repositories.

Real-Time Alerting

# Subscribe to real-time intelligence feedscleanimg-init --subscribe-intelligence \  --threat-categories "code-injection,typosquatting,zero-day" \  --minimum-confidence 0.85 # Receive alerts like:# [ALERT] Suspicious behavior detected#   Repository: popular-library (npm)#   Threat: Cryptominer injection attempt#   Confidence: 96%#   Action Recommended: Immediate review#   Intelligence Link: https://...

The Four Detection Layers

Layer 1: Deep Code Analysis

The Code Analysis Pipeline analyzes every commit to every repository using multiple complementary techniques.

Pattern Matching detects suspicious cryptographic patterns used in mining or exfiltration operations, privilege escalation attempts, code obfuscation, and command injection patterns. Abstract Syntax Tree (AST) Analysis performs data flow tracking through the code, control flow analysis to understand program logic, function call tracking to identify dangerous operations, and module dependency resolution to discover hidden dependencies. Behavioral Analysis examines network access patterns, file system operations, process spawning behavior, and detection of unexpected or dangerous capabilities.

Example detection:

# Suspicious pattern detected (high confidence)import hashlibimport subprocessimport base64 # Hidden mining loop (detected via AST analysis)def update_cache():    for i in range(infinite):        exec(base64.b64decode(obfuscated_payload))        # This pattern: invisible loop + exec + obfuscation = FLAGGED

Layer 2: Stylometry and Maintainer Analysis

Stylometry Analysis tracks developer patterns to detect account compromise. It builds coding style fingerprints based on variable naming conventions, comment density and language, indentation and formatting patterns, function structure and complexity, and common keywords and libraries used. Commit pattern analysis examines the time of day when commits are made, geographic locations inferred from IP addresses, and frequency and volume of commits.

This behavioral profile allows detection of anomalous activity—for example, if a maintainer typically works 9-5 US Eastern but suddenly commits from Beijing at 3:45 AM, or if they switch from Python/Node.js to C++ extensions they've never used before, or if they release three major versions in six hours with code that doesn't match their historical style, the system flags this as a likely account compromise. Anomaly scoring measures deviation from historical patterns, compares the activity to known attacker profiles, and produces a confidence score (0.0-1.0).

Example detection:

Alert: Maintainer anomaly detected (89% confidence)  Library: crypto-utils (npm)  Maintainer: alice@example.com   Historical Pattern:  - Works 9-5 US Eastern Time  - Uses Python/Node.js exclusively  - 15-20 commits/week   Anomalous Activity:  - Commit at 3:45 AM Beijing time  - Includes C++ extension (never used before)  - 3 major releases in 6 hours  - Code style doesn't match historical pattern   Action: Account likely compromised, review immediately

Layer 3: Sandbox Analysis

Sandbox Analysis executes code in an isolated environment to detect runtime threats. The Dynamic Analysis Pipeline sets up an isolated OS with no network or filesystem access, uses process monitoring (strace, ptrace) to track system calls, emulates network access to capture all packets, and monitors file system operations.

Execution scenarios test multiple situations: normal operation including imports and basic function calls, installation phase (npm install, pip install), build processes (npm run build, gradle build), and attack simulations such as credential injection. Threat Detection looks for unexpected outbound connections, cryptocurrency mining operations, rootkit installation attempts, credential theft mechanisms, and malware signatures.

Example detection:

Runtime Threat Detected (Critical)  Package: logger-utility (npm)  Version: 1.2.5   Sandbox Execution Results: Normal imports and function calls execute without issue. However, the installation phase is flagged as malicious. During installation, the package attempts to connect to mining.attacker.com, tries to write to ~/.ssh/authorized_keys, spawns a background process (miner.sh), and deletes package.json.backup to remove evidence.   Confidence: 99.2%  Severity: CRITICAL  Action: Block immediately, contact package author

Layer 4: Registry Monitoring

Continuous monitoring of package registries for threats:

Registry monitoring involves three main detection techniques. Typosquatting Detection analyzes packages using Levenshtein distance analysis, identifies similar-looking Unicode characters, detects common misspellings, and spots homograph attacks like O vs 0. Version Analysis flags unusual version jumps (like 0.1.0 jumping to 99.0.0), package name changes, metadata discrepancies, and unusual build artifacts. Correlation Analysis detects multiple accounts uploading the same package, rapid dependency chains where new packages suddenly become popular, coordinated releases, and shared infrastructure like IP addresses or certificates.

Example detection:

Typosquatting Alert  Original Package: lodash (npm) - 8.3M weekly downloads  Typosquatter Found: lodash1 (npm) - Just uploaded   Similarity Score: 97%  Pattern: lodash[number] (known typosquatting pattern)  Upload Metadata:    - New account (1 hour old)    - Same VPN as lodash2 uploader    - Same build certificate as other typos   Content Similarity: 94% (probably cloned)  Malicious Additions: YES    - Added cryptomining code    - Steals npm tokens   Confidence: 98.7%  Action: Immediately remove from registry

API and Data Access

REST API for Intelligence

# Query vulnerabilitiescurl -H "Authorization: Bearer YOUR_API_KEY" \  https://intelligence.cleanstart.io/api/v1/vulnerabilities \  -d '{"cve": "CVE-2024-5678"}' # Response:{  "cve": "CVE-2024-5678",  "severity": "HIGH",  "affected_packages": [    {      "ecosystem": "npm",      "name": "express",      "versions": ["<4.18.2"]    }  ],  "detection_status": "publicly_disclosed",  "exploited_in_wild": false,  "repositories_affected": 1247}

Real-Time Streaming

# Subscribe to real-time threat streamwss://intelligence.cleanstart.io/api/v1/stream?key=YOUR_API_KEY # Receive events:{  "type": "zero_day_detected",  "threat_id": "THREAT-2025-001",  "severity": "critical",  "threat_vector": "code_injection",  "package": "popular-lib:2.1.0",  "timestamp": "2025-10-04T14:30:45Z",  "confidence": 0.97,  "details": {...}}

Dashboard and Reporting

Intelligence Dashboard

The CleanStart Intelligence Dashboard shows comprehensive threat intelligence. The Threat Summary (Last 24 Hours) reports 3 critical threats (cryptominer injection, zero-day, typosquatting), 12 high severity threats (version confusion, account compromise), 47 medium severity threats (suspicious patterns, anomalies), and 203 low severity threats (monitoring, trending).

The Detection Breakdown shows: Layer 1 (Code) with 34 threats (72% of detections), Layer 2 (Stylometry) with 8 threats (account compromises), Layer 3 (Sandbox) with 11 threats (malware), Layer 4 (Registry) with 5 threats (typosquatting), and overlap across multiple layers with 4 threats (highest confidence).

The Top Affected Ecosystems include npm (34 threats), PyPI (12 threats), Maven (5 threats), and other registries (7 threats).

Custom Reports

# Generate weekly threat summarycleanimg-init --intelligence-report \  --period weekly \  --format pdf \  --output threat-report-week-40.pdf # Generate custom reportcleanimg-init --intelligence-report \  --threat-types "code_injection,zero_day" \  --confidence ">0.9" \  --format json \  --output critical-threats.json

Integration with Your Infrastructure

Continuous Supply Chain Monitoring

# Monitor all dependencies in your codebasecleanimg-init --monitor-supply-chain \  --sbom your-application.spdx \  --alert-webhook https://your-slack-channel \  --alert-email security@company.com # Automatically alerts if:# - Any dependency gets a new CVE# - Maintainer account shows suspicious activity# - Malicious code detected in transitive dependency# - Typosquatting of your dependencies

Build-Time Intelligence Checks

# In your CI/CD pipelinebuild-with-intelligence:  stage: build  script:    - npm install    - cleanimg-init --intelligence-check \        --sbom package-lock.json \        --fail-on-threat-level high  # Fails build if intelligence detects high-severity threats

Policy Enforcement Based on Intelligence

# OPA policy: Block deployments based on intelligencepackage supply_chain_security deny[msg] {    input.image.vulnerabilities.detected_by_intelligence    input.image.intelligence_threat_level == "critical"    msg := "Cannot deploy: Critical threat detected by intelligence"} allow {    not deny[_]}

Threat Intelligence Feed Subscription

Data Feed Options

CleanStart offers three subscription tiers. The Free tier includes a 24-hour delayed threat feed, public CVE data only, monthly reports, and basic email alerts. The Professional tier ($5K/month) adds real-time threat alerts, zero-day early warnings, API access with 10K requests/day, custom reports, and Slack/Teams integration. The Enterprise tier (custom pricing) provides a dedicated threat intelligence team, unlimited API access, custom detections, on-premises Source Intelligence Core deployment, and SLA guarantees.

CleanStart Source Intelligence Core: The Security Data Engine

How Source Intelligence Identifies Supply Chain Threats

The Source Intelligence Core

What It Monitors

Real-Time Alerting

The Four Detection Layers

Layer 1: Deep Code Analysis

Layer 2: Stylometry and Maintainer Analysis

Layer 3: Sandbox Analysis

Layer 4: Registry Monitoring

API and Data Access

REST API for Intelligence

Real-Time Streaming

Dashboard and Reporting

Intelligence Dashboard

Custom Reports

Integration with Your Infrastructure

Continuous Supply Chain Monitoring

Build-Time Intelligence Checks

Policy Enforcement Based on Intelligence

Threat Intelligence Feed Subscription

Data Feed Options

See Also

CleanStart Source Intelligence Core: The Security Data Engine

How Source Intelligence Identifies Supply Chain Threats

The Source Intelligence Core

What It Monitors

Real-Time Alerting

The Four Detection Layers

Layer 1: Deep Code Analysis

Layer 2: Stylometry and Maintainer Analysis

Layer 3: Sandbox Analysis

Layer 4: Registry Monitoring

API and Data Access

REST API for Intelligence

Real-Time Streaming

Dashboard and Reporting

Intelligence Dashboard

Custom Reports

Integration with Your Infrastructure

Continuous Supply Chain Monitoring

Build-Time Intelligence Checks

Policy Enforcement Based on Intelligence

Threat Intelligence Feed Subscription

Data Feed Options

See Also