Internal system — Source Intelligence Core is the internal engine behind CleanStart Verified Source, CleanStart's external supply chain security offering. The API endpoints documented here are for internal use.
How Source Intelligence Identifies Supply Chain Threats
CleanStart Source Intelligence Core is the central security engine that powers supply chain threat detection. It monitors 24,013 repositories, tracks 809,425+ security advisories, analyzes 281M+ dependency relationships, and enables the four detection layers that identify threats from code injection to zero-day exploits.
This is CleanStart's real-time security Source Intelligence Core—the data foundation that makes everything else possible.
The Source Intelligence Core
CleanStart Source Intelligence Core powers the Continuous Trust Loop — enabling real-time threat detection and automatic remediation across your supply chain.
What It Monitors
CleanStart Source Intelligence Core monitors 24,013 software repositories across GitHub, GitLab, Bitbucket, and open source platforms. It tracks 809,425+ security advisories including 739,944 CVE/NVD-sourced advisories and 9,998 OSV advisories, with real-time updates from the Open Source Vulnerabilities (OSV) database.
The system maintains intelligence on 281 million package-repository relationships across 7 ecosystems: 238M Go, 16.6M Crates, 12.5M npm, 7.2M Maven, 3.4M PyPI, 3.1M RubyGems, and 18.7K C++.
Weekly anomaly detection identifies suspicious activity including code injection patterns, maintainer behavior anomalies, confirmed malicious packages, and other security threats across monitored repositories.
Real-Time Alerting
# Subscribe to real-time intelligence feedscleanimg-init --subscribe-intelligence \ --threat-categories "code-injection,typosquatting,zero-day" \ --minimum-confidence 0.85 # Receive alerts like:# [ALERT] Suspicious behavior detected# Repository: popular-library (npm)# Threat: Cryptominer injection attempt# Confidence: 96%# Action Recommended: Immediate review# Intelligence Link: https://...The Four Detection Layers
Layer 1: Deep Code Analysis
The Code Analysis Pipeline analyzes every commit to every repository using multiple complementary techniques.
Pattern Matching detects suspicious cryptographic patterns used in mining or exfiltration operations, privilege escalation attempts, code obfuscation, and command injection patterns. Abstract Syntax Tree (AST) Analysis performs data flow tracking through the code, control flow analysis to understand program logic, function call tracking to identify dangerous operations, and module dependency resolution to discover hidden dependencies. Behavioral Analysis examines network access patterns, file system operations, process spawning behavior, and detection of unexpected or dangerous capabilities.
Example detection:
# Suspicious pattern detected (high confidence)import hashlibimport subprocessimport base64 # Hidden mining loop (detected via AST analysis)def update_cache(): for i in range(infinite): exec(base64.b64decode(obfuscated_payload)) # This pattern: invisible loop + exec + obfuscation = FLAGGEDLayer 2: Stylometry and Maintainer Analysis
Stylometry Analysis tracks developer patterns to detect account compromise. It builds coding style fingerprints based on variable naming conventions, comment density and language, indentation and formatting patterns, function structure and complexity, and common keywords and libraries used. Commit pattern analysis examines the time of day when commits are made, geographic locations inferred from IP addresses, and frequency and volume of commits.
This behavioral profile allows detection of anomalous activity—for example, if a maintainer typically works 9-5 US Eastern but suddenly commits from Beijing at 3:45 AM, or if they switch from Python/Node.js to C++ extensions they've never used before, or if they release three major versions in six hours with code that doesn't match their historical style, the system flags this as a likely account compromise. Anomaly scoring measures deviation from historical patterns, compares the activity to known attacker profiles, and produces a confidence score (0.0-1.0).
Example detection:
Alert: Maintainer anomaly detected (89% confidence) Library: crypto-utils (npm) Maintainer: alice@example.com Historical Pattern: - Works 9-5 US Eastern Time - Uses Python/Node.js exclusively - 15-20 commits/week Anomalous Activity: - Commit at 3:45 AM Beijing time - Includes C++ extension (never used before) - 3 major releases in 6 hours - Code style doesn't match historical pattern Action: Account likely compromised, review immediatelyLayer 3: Sandbox Analysis
Sandbox Analysis executes code in an isolated environment to detect runtime threats. The Dynamic Analysis Pipeline sets up an isolated OS with no network or filesystem access, uses process monitoring (strace, ptrace) to track system calls, emulates network access to capture all packets, and monitors file system operations.
Execution scenarios test multiple situations: normal operation including imports and basic function calls, installation phase (npm install, pip install), build processes (npm run build, gradle build), and attack simulations such as credential injection. Threat Detection looks for unexpected outbound connections, cryptocurrency mining operations, rootkit installation attempts, credential theft mechanisms, and malware signatures.
Example detection:
Runtime Threat Detected (Critical) Package: logger-utility (npm) Version: 1.2.5 Sandbox Execution Results: Normal imports and function calls execute without issue. However, the installation phase is flagged as malicious. During installation, the package attempts to connect to mining.attacker.com, tries to write to ~/.ssh/authorized_keys, spawns a background process (miner.sh), and deletes package.json.backup to remove evidence. Confidence: 99.2% Severity: CRITICAL Action: Block immediately, contact package authorLayer 4: Registry Monitoring
Continuous monitoring of package registries for threats:
Registry monitoring involves three main detection techniques. Typosquatting Detection analyzes packages using Levenshtein distance analysis, identifies similar-looking Unicode characters, detects common misspellings, and spots homograph attacks like O vs 0. Version Analysis flags unusual version jumps (like 0.1.0 jumping to 99.0.0), package name changes, metadata discrepancies, and unusual build artifacts. Correlation Analysis detects multiple accounts uploading the same package, rapid dependency chains where new packages suddenly become popular, coordinated releases, and shared infrastructure like IP addresses or certificates.
Example detection:
Typosquatting Alert Original Package: lodash (npm) - 8.3M weekly downloads Typosquatter Found: lodash1 (npm) - Just uploaded Similarity Score: 97% Pattern: lodash[number] (known typosquatting pattern) Upload Metadata: - New account (1 hour old) - Same VPN as lodash2 uploader - Same build certificate as other typos Content Similarity: 94% (probably cloned) Malicious Additions: YES - Added cryptomining code - Steals npm tokens Confidence: 98.7% Action: Immediately remove from registryAPI and Data Access
REST API for Intelligence
# Query vulnerabilitiescurl -H "Authorization: Bearer YOUR_API_KEY" \ https://intelligence.cleanstart.io/api/v1/vulnerabilities \ -d '{"cve": "CVE-2024-5678"}' # Response:{ "cve": "CVE-2024-5678", "severity": "HIGH", "affected_packages": [ { "ecosystem": "npm", "name": "express", "versions": ["<4.18.2"] } ], "detection_status": "publicly_disclosed", "exploited_in_wild": false, "repositories_affected": 1247}Real-Time Streaming
# Subscribe to real-time threat streamwss://intelligence.cleanstart.io/api/v1/stream?key=YOUR_API_KEY # Receive events:{ "type": "zero_day_detected", "threat_id": "THREAT-2025-001", "severity": "critical", "threat_vector": "code_injection", "package": "popular-lib:2.1.0", "timestamp": "2025-10-04T14:30:45Z", "confidence": 0.97, "details": {...}}Dashboard and Reporting
Intelligence Dashboard
The CleanStart Intelligence Dashboard shows comprehensive threat intelligence. The Threat Summary (Last 24 Hours) reports 3 critical threats (cryptominer injection, zero-day, typosquatting), 12 high severity threats (version confusion, account compromise), 47 medium severity threats (suspicious patterns, anomalies), and 203 low severity threats (monitoring, trending).
The Detection Breakdown shows: Layer 1 (Code) with 34 threats (72% of detections), Layer 2 (Stylometry) with 8 threats (account compromises), Layer 3 (Sandbox) with 11 threats (malware), Layer 4 (Registry) with 5 threats (typosquatting), and overlap across multiple layers with 4 threats (highest confidence).
The Top Affected Ecosystems include npm (34 threats), PyPI (12 threats), Maven (5 threats), and other registries (7 threats).
Custom Reports
# Generate weekly threat summarycleanimg-init --intelligence-report \ --period weekly \ --format pdf \ --output threat-report-week-40.pdf # Generate custom reportcleanimg-init --intelligence-report \ --threat-types "code_injection,zero_day" \ --confidence ">0.9" \ --format json \ --output critical-threats.jsonIntegration with Your Infrastructure
Continuous Supply Chain Monitoring
# Monitor all dependencies in your codebasecleanimg-init --monitor-supply-chain \ --sbom your-application.spdx \ --alert-webhook https://your-slack-channel \ --alert-email security@company.com # Automatically alerts if:# - Any dependency gets a new CVE# - Maintainer account shows suspicious activity# - Malicious code detected in transitive dependency# - Typosquatting of your dependenciesBuild-Time Intelligence Checks
# In your CI/CD pipelinebuild-with-intelligence: stage: build script: - npm install - cleanimg-init --intelligence-check \ --sbom package-lock.json \ --fail-on-threat-level high # Fails build if intelligence detects high-severity threatsPolicy Enforcement Based on Intelligence
# OPA policy: Block deployments based on intelligencepackage supply_chain_security deny[msg] { input.image.vulnerabilities.detected_by_intelligence input.image.intelligence_threat_level == "critical" msg := "Cannot deploy: Critical threat detected by intelligence"} allow { not deny[_]}Threat Intelligence Feed Subscription
Data Feed Options
CleanStart offers three subscription tiers. The Free tier includes a 24-hour delayed threat feed, public CVE data only, monthly reports, and basic email alerts. The Professional tier ($5K/month) adds real-time threat alerts, zero-day early warnings, API access with 10K requests/day, custom reports, and Slack/Teams integration. The Enterprise tier (custom pricing) provides a dedicated threat intelligence team, unlimited API access, custom detections, on-premises Source Intelligence Core deployment, and SLA guarantees.
See Also
Zero-Day Detection: zero-day-detection.md — How we detect undisclosed threats. VEX Documents: ../supply-chain-provenance/vex-documents.md — Contextual vulnerability status. SBOM: ../supply-chain-provenance/spdx-sbom.md — Component inventory. Threat Detection Integration: ../runtime-evidence/ebpf-falco-integration.md — Runtime monitoring.
