D
Security
Sensitive Data Scanner (SDS)
170-pattern catalogue, checksum-validated, with NER for free-form PII. Redact secrets before they hit disk.
▸ How it works
SDS sits inline in the ingestion pipeline shard. Every log message, span attribute, RUM event, metric attribute, and session replay string is run through a precompiled rule set. Matches are either `alert`ed (recorded as a finding, source text untouched), `redact`ed (`[REDACTED:rule_id]`), `hash`ed (12-char SHA-256 prefix), or `drop`ped (entire event removed from the batch) — all in the same flush that would have persisted the data, so no raw secret ever lands in storage.
Three layers of detection: (1) a **170-pattern library** spanning cloud providers, AI APIs, package registries, SaaS tokens, payment keys, database connection strings, international PII, and infrastructure markers; (2) **validators** (Luhn / IBAN ISO-7064 / ABA mod-10 / Shannon entropy / JWT shape) that filter false positives — a 16-digit order ID that fails Luhn won't get redacted as a credit card; (3) a **heuristic NER engine** for free-form PII (names, addresses) with confidence scoring, plus a pluggable `NerAdapter` for Bumblebee/BERT if you want true ML recall.
Three production safeguards: **ReDoS protection** (static heuristic + 100ms stress test + 50ms runtime cap) on every user pattern; a **per-project CPU token bucket** so one noisy project can't starve the shard; a **cluster-wide kill switch** that propagates over `Phoenix.PubSub` in milliseconds.
▸ What this lets you do
- ✓ 170 built-in patterns spanning credentials, PII, financial, and infra
- ✓ Stop credentials, PII, and PHI from being persisted to your observability store
- ✓ Validator-aware redaction — false-positive 16-digit order IDs aren't touched
- ✓ NER for names + US addresses; pluggable ML adapter for true-ML recall
- ✓ Custom regex rules with ReDoS validation at config time, runtime timeout per call
- ✓ Live regex tester previews matches against pasted text before you enable a rule
- ✓ Cluster-wide kill switch flips in milliseconds via PubSub
- ✓ Per-project CPU budget keeps noisy regexes from starving other projects
- ✓ Findings durable: snapshots of name + pattern + action survive rule deletion
- ✓ GraphQL + REST APIs + live subscriptions for SIEM/SOAR integration
- ✓ Compliant by default — redaction happens before write, not after
▸ Get it running
- 1 Enabled by default for every project (tier-bound)
- 2 Open `SDS` → click `+ Seed built-in rules` to seed 170 patterns at once (high-noise ones default to disabled)
- 3 Use the **Live tester** panel to validate regex patterns against pasted text before saving
- 4 Add a custom regex or NER rule via `+ Custom rule` — pattern safety + scope toggles in the form
- 5 Hits land in `sds_findings` AND fire on `/v1/findings` for unified review
- 6 High/critical findings fan out to configured webhook destinations (Slack, PagerDuty, custom SIEM)
- 7 Subscribe to `findingCreated` over GraphQL for live streaming to security dashboards
▸ Code examples
# The GitHub PAT will be matched by the `github_pat` rule.
# Default action is `redact`, so the stored log message reads
# "leaked token: [REDACTED:github_pat]"
# and an `sds.finding.created` event fires to your webhooks.
curl -X POST https://funnel.example.com/v1/logs \
-H "Authorization: Bearer st_YOUR_KEY" \
-d '{
"logs": [{
"severity": "error",
"service_name": "api",
"message": "leaked token: ghp_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
}]
}'
▸ What we detect 170 patterns across 25 categories
AWS
6
- · aws_access_key
- · aws_secret_key
- · aws_session_token
- · aws_mws_auth_token
- + 1 more
GCP
4
- · gcp_service_account
- · gcp_api_key
- · gcp_oauth_client_id
- · google_api_key
Azure
3
- · azure_storage_key
- · azure_subscription_id
- · azure_sas_token
Other clouds
7
- · do_api_token
- · heroku_api_key
- · netlify_token
- · render_api_key
- + 3 more
Cloudflare
2
- · cloudflare_api_token
- · cloudflare_global_api_key
AI / ML APIs
10
- · openai_api_key
- · anthropic_api_key
- · huggingface_token
- · replicate_api_token
- + 6 more
Payments
9
- · stripe_secret
- · stripe_publishable
- · stripe_restricted_key
- · stripe_webhook_secret
- + 5 more
VCS / Dev tools
8
- · github_pat
- · github_app_token
- · github_refresh_token
- · gitlab_pat
- + 4 more
Communications
11
- · slack_token
- · slack_webhook_url
- · slack_legacy_token
- · discord_bot_token
- + 7 more
SaaS / Productivity
9
- · linear_api_key
- · notion_integration_token
- · notion_ntn_token
- · asana_pat
- + 5 more
Observability
14
- · sentry_dsn
- · sentry_auth_token
- · new_relic_license_key
- · new_relic_user_key
- + 10 more
Databases
7
- · mongodb_uri_with_password
- · postgres_uri_with_password
- · mysql_uri_with_password
- · redis_uri_with_password
- + 3 more
Identity / Auth
6
- · jwt
- · okta_api_token
- · auth0_api_token
- · onepassword_service_account_token
- + 2 more
Crypto / Web3
6
- · ethereum_private_key
- · bitcoin_wif_key
- · ethereum_address
- · infura_project_id
- + 2 more
Containers / Hosted
5
- · docker_registry_auth
- · dockerhub_pat
- · tailscale_auth_key
- · ngrok_auth_token
- + 1 more
Generic auth
4
- · generic_secret
- · bearer_auth_header
- · basic_auth_header
- · api_key_in_url
Cryptography
2
- · private_key_pem
- · k8s_service_account_token
Financial / Banking
5
- · credit_card
- · iban
- · aba_routing
- · bic_swift_code
- + 1 more
PII — contact
7
- · email_password
- · us_phone
- · uk_phone
- + 3 more
PII — national IDs
11
- · ssn_us
- · itin_us
- · ein_us
- · sin_canada
- + 7 more
PII — physical / vehicles
2
- · vehicle_vin
- · south_africa_id
Healthcare (US)
3
- · icd10_code
- · dea_number
- · npi_us
Free-form PII (NER)
2
- · person_name
- · address
Infrastructure
6
- · ipv4_private
- · ipv4_loopback
- · ipv4_link_local
- · ipv6_address
- + 2 more
Mobile / Push
4
- · firebase_fcm_server_key
- · firebase_database_url
- · onesignal_api_key
- · pusher_secret
Patterns are precompiled at module load · validators (Luhn / IBAN / ABA / entropy / JWT) filter false positives · NER catches free-form PII the regex layer misses · custom regex rules pass through ReDoS validation before they ever run.
Where to find it
/app/p/:org/:project/sds