D
Security

Sensitive Data Scanner (SDS)

170-pattern catalogue, checksum-validated, with NER for free-form PII. Redact secrets before they hit disk.

How it works

SDS sits inline in the ingestion pipeline shard. Every log message, span attribute, RUM event, metric attribute, and session replay string is run through a precompiled rule set. Matches are either `alert`ed (recorded as a finding, source text untouched), `redact`ed (`[REDACTED:rule_id]`), `hash`ed (12-char SHA-256 prefix), or `drop`ped (entire event removed from the batch) — all in the same flush that would have persisted the data, so no raw secret ever lands in storage. Three layers of detection: (1) a **170-pattern library** spanning cloud providers, AI APIs, package registries, SaaS tokens, payment keys, database connection strings, international PII, and infrastructure markers; (2) **validators** (Luhn / IBAN ISO-7064 / ABA mod-10 / Shannon entropy / JWT shape) that filter false positives — a 16-digit order ID that fails Luhn won't get redacted as a credit card; (3) a **heuristic NER engine** for free-form PII (names, addresses) with confidence scoring, plus a pluggable `NerAdapter` for Bumblebee/BERT if you want true ML recall. Three production safeguards: **ReDoS protection** (static heuristic + 100ms stress test + 50ms runtime cap) on every user pattern; a **per-project CPU token bucket** so one noisy project can't starve the shard; a **cluster-wide kill switch** that propagates over `Phoenix.PubSub` in milliseconds.

What this lets you do

  • 170 built-in patterns spanning credentials, PII, financial, and infra
  • Stop credentials, PII, and PHI from being persisted to your observability store
  • Validator-aware redaction — false-positive 16-digit order IDs aren't touched
  • NER for names + US addresses; pluggable ML adapter for true-ML recall
  • Custom regex rules with ReDoS validation at config time, runtime timeout per call
  • Live regex tester previews matches against pasted text before you enable a rule
  • Cluster-wide kill switch flips in milliseconds via PubSub
  • Per-project CPU budget keeps noisy regexes from starving other projects
  • Findings durable: snapshots of name + pattern + action survive rule deletion
  • GraphQL + REST APIs + live subscriptions for SIEM/SOAR integration
  • Compliant by default — redaction happens before write, not after

Get it running

  1. 1 Enabled by default for every project (tier-bound)
  2. 2 Open `SDS` → click `+ Seed built-in rules` to seed 170 patterns at once (high-noise ones default to disabled)
  3. 3 Use the **Live tester** panel to validate regex patterns against pasted text before saving
  4. 4 Add a custom regex or NER rule via `+ Custom rule` — pattern safety + scope toggles in the form
  5. 5 Hits land in `sds_findings` AND fire on `/v1/findings` for unified review
  6. 6 High/critical findings fan out to configured webhook destinations (Slack, PagerDuty, custom SIEM)
  7. 7 Subscribe to `findingCreated` over GraphQL for live streaming to security dashboards

Code examples

# The GitHub PAT will be matched by the `github_pat` rule.
# Default action is `redact`, so the stored log message reads
#   "leaked token: [REDACTED:github_pat]"
# and an `sds.finding.created` event fires to your webhooks.
curl -X POST https://funnel.example.com/v1/logs \
  -H "Authorization: Bearer st_YOUR_KEY" \
  -d '{
    "logs": [{
      "severity":     "error",
      "service_name": "api",
      "message":      "leaked token: ghp_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
    }]
  }'

What we detect 170 patterns across 25 categories

AWS
6
  • · aws_access_key
  • · aws_secret_key
  • · aws_session_token
  • · aws_mws_auth_token
  • + 1 more
GCP
4
  • · gcp_service_account
  • · gcp_api_key
  • · gcp_oauth_client_id
  • · google_api_key
Azure
3
  • · azure_storage_key
  • · azure_subscription_id
  • · azure_sas_token
Other clouds
7
  • · do_api_token
  • · heroku_api_key
  • · netlify_token
  • · render_api_key
  • + 3 more
Cloudflare
2
  • · cloudflare_api_token
  • · cloudflare_global_api_key
AI / ML APIs
10
  • · openai_api_key
  • · anthropic_api_key
  • · huggingface_token
  • · replicate_api_token
  • + 6 more
Payments
9
  • · stripe_secret
  • · stripe_publishable
  • · stripe_restricted_key
  • · stripe_webhook_secret
  • + 5 more
VCS / Dev tools
8
  • · github_pat
  • · github_app_token
  • · github_refresh_token
  • · gitlab_pat
  • + 4 more
Communications
11
  • · slack_token
  • · slack_webhook_url
  • · slack_legacy_token
  • · discord_bot_token
  • + 7 more
SaaS / Productivity
9
  • · linear_api_key
  • · notion_integration_token
  • · notion_ntn_token
  • · asana_pat
  • + 5 more
Observability
14
  • · sentry_dsn
  • · sentry_auth_token
  • · new_relic_license_key
  • · new_relic_user_key
  • + 10 more
Databases
7
  • · mongodb_uri_with_password
  • · postgres_uri_with_password
  • · mysql_uri_with_password
  • · redis_uri_with_password
  • + 3 more
Identity / Auth
6
  • · jwt
  • · okta_api_token
  • · auth0_api_token
  • · onepassword_service_account_token
  • + 2 more
Crypto / Web3
6
  • · ethereum_private_key
  • · bitcoin_wif_key
  • · ethereum_address
  • · infura_project_id
  • + 2 more
Containers / Hosted
5
  • · docker_registry_auth
  • · dockerhub_pat
  • · tailscale_auth_key
  • · ngrok_auth_token
  • + 1 more
Generic auth
4
  • · generic_secret
  • · bearer_auth_header
  • · basic_auth_header
  • · api_key_in_url
Cryptography
2
  • · private_key_pem
  • · k8s_service_account_token
Financial / Banking
5
  • · credit_card
  • · iban
  • · aba_routing
  • · bic_swift_code
  • + 1 more
PII — contact
7
  • · email
  • · email_password
  • · us_phone
  • · uk_phone
  • + 3 more
PII — national IDs
11
  • · ssn_us
  • · itin_us
  • · ein_us
  • · sin_canada
  • + 7 more
PII — physical / vehicles
2
  • · vehicle_vin
  • · south_africa_id
Healthcare (US)
3
  • · icd10_code
  • · dea_number
  • · npi_us
Free-form PII (NER)
2
  • · person_name
  • · address
Infrastructure
6
  • · ipv4_private
  • · ipv4_loopback
  • · ipv4_link_local
  • · ipv6_address
  • + 2 more
Mobile / Push
4
  • · firebase_fcm_server_key
  • · firebase_database_url
  • · onesignal_api_key
  • · pusher_secret

Patterns are precompiled at module load · validators (Luhn / IBAN / ABA / entropy / JWT) filter false positives · NER catches free-form PII the regex layer misses · custom regex rules pass through ReDoS validation before they ever run.

Where to find it
/app/p/:org/:project/sds
Open in app →