Security

Sensitive Data Scanner (SDS)

170-pattern catalogue, checksum-validated, with NER for free-form PII. Redact secrets before they hit disk.

▸ How it works

SDS sits inline in the ingestion pipeline shard. Every log message, span attribute, RUM event, metric attribute, and session replay string is run through a precompiled rule set. Matches are either `alert`ed (recorded as a finding, source text untouched), `redact`ed (`[REDACTED:rule_id]`), `hash`ed (12-char SHA-256 prefix), or `drop`ped (entire event removed from the batch) — all in the same flush that would have persisted the data, so no raw secret ever lands in storage. Three layers of detection: (1) a **170-pattern library** spanning cloud providers, AI APIs, package registries, SaaS tokens, payment keys, database connection strings, international PII, and infrastructure markers; (2) **validators** (Luhn / IBAN ISO-7064 / ABA mod-10 / Shannon entropy / JWT shape) that filter false positives — a 16-digit order ID that fails Luhn won't get redacted as a credit card; (3) a **heuristic NER engine** for free-form PII (names, addresses) with confidence scoring, plus a pluggable `NerAdapter` for Bumblebee/BERT if you want true ML recall. Three production safeguards: **ReDoS protection** (static heuristic + 100ms stress test + 50ms runtime cap) on every user pattern; a **per-project CPU token bucket** so one noisy project can't starve the shard; a **cluster-wide kill switch** that propagates over `Phoenix.PubSub` in milliseconds.

▸ What this lets you do

✓ 170 built-in patterns spanning credentials, PII, financial, and infra
✓ Stop credentials, PII, and PHI from being persisted to your observability store
✓ Validator-aware redaction — false-positive 16-digit order IDs aren't touched
✓ NER for names + US addresses; pluggable ML adapter for true-ML recall
✓ Custom regex rules with ReDoS validation at config time, runtime timeout per call
✓ Live regex tester previews matches against pasted text before you enable a rule
✓ Cluster-wide kill switch flips in milliseconds via PubSub
✓ Per-project CPU budget keeps noisy regexes from starving other projects
✓ Findings durable: snapshots of name + pattern + action survive rule deletion
✓ GraphQL + REST APIs + live subscriptions for SIEM/SOAR integration
✓ Compliant by default — redaction happens before write, not after

▸ Get it running

1 Enabled by default for every project (tier-bound)
2 Open `SDS` → click `+ Seed built-in rules` to seed 170 patterns at once (high-noise ones default to disabled)
3 Use the **Live tester** panel to validate regex patterns against pasted text before saving
4 Add a custom regex or NER rule via `+ Custom rule` — pattern safety + scope toggles in the form
5 Hits land in `sds_findings` AND fire on `/v1/findings` for unified review
6 High/critical findings fan out to configured webhook destinations (Slack, PagerDuty, custom SIEM)
7 Subscribe to `findingCreated` over GraphQL for live streaming to security dashboards

▸ Code examples

# The GitHub PAT will be matched by the `github_pat` rule.
# Default action is `redact`, so the stored log message reads
#   "leaked token: [REDACTED:github_pat]"
# and an `sds.finding.created` event fires to your webhooks.
curl -X POST https://funnel.example.com/v1/logs \
  -H "Authorization: Bearer st_YOUR_KEY" \
  -d '{
    "logs": [{
      "severity":     "error",
      "service_name": "api",
      "message":      "leaked token: ghp_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
    }]
  }'

▸ What we detect 170 patterns across 25 categories

AWS

· aws_access_key
· aws_secret_key
· aws_session_token
· aws_mws_auth_token
+ 1 more

GCP

· gcp_service_account
· gcp_api_key
· gcp_oauth_client_id
· google_api_key

Azure

· azure_storage_key
· azure_subscription_id
· azure_sas_token

Other clouds

· do_api_token
· heroku_api_key
· netlify_token
· render_api_key
+ 3 more

Cloudflare

· cloudflare_api_token
· cloudflare_global_api_key

AI / ML APIs

· openai_api_key
· anthropic_api_key
· huggingface_token
· replicate_api_token
+ 6 more

Payments

· stripe_secret
· stripe_publishable
· stripe_restricted_key
· stripe_webhook_secret
+ 5 more

VCS / Dev tools

· github_pat
· github_app_token
· github_refresh_token
· gitlab_pat
+ 4 more

Communications

· slack_token
· slack_webhook_url
· slack_legacy_token
· discord_bot_token
+ 7 more

SaaS / Productivity

· linear_api_key
· notion_integration_token
· notion_ntn_token
· asana_pat
+ 5 more

Observability

· sentry_dsn
· sentry_auth_token
· new_relic_license_key
· new_relic_user_key
+ 10 more

Databases

· mongodb_uri_with_password
· postgres_uri_with_password
· mysql_uri_with_password
· redis_uri_with_password
+ 3 more

Identity / Auth

· jwt
· okta_api_token
· auth0_api_token
· onepassword_service_account_token
+ 2 more

Crypto / Web3

· ethereum_private_key
· bitcoin_wif_key
· ethereum_address
· infura_project_id
+ 2 more

Containers / Hosted

· docker_registry_auth
· dockerhub_pat
· tailscale_auth_key
· ngrok_auth_token
+ 1 more

Generic auth

· generic_secret
· bearer_auth_header
· basic_auth_header
· api_key_in_url

Cryptography

· private_key_pem
· k8s_service_account_token

Financial / Banking

· credit_card
· iban
· aba_routing
· bic_swift_code
+ 1 more

PII — contact

· email
· email_password
· us_phone
· uk_phone
+ 3 more

PII — national IDs

· ssn_us
· itin_us
· ein_us
· sin_canada
+ 7 more

PII — physical / vehicles

· vehicle_vin
· south_africa_id

Healthcare (US)

· icd10_code
· dea_number
· npi_us

Free-form PII (NER)

· person_name
· address

Infrastructure

· ipv4_private
· ipv4_loopback
· ipv4_link_local
· ipv6_address
+ 2 more

Mobile / Push

· firebase_fcm_server_key
· firebase_database_url
· onesignal_api_key
· pusher_secret

Patterns are precompiled at module load · validators (Luhn / IBAN / ABA / entropy / JWT) filter false positives · NER catches free-form PII the regex layer misses · custom regex rules pass through ReDoS validation before they ever run.

Where to find it

/app/p/:org/:project/sds

Open in app →