Exact match vs. vector similarity on breach database
Enter a password to check
Enter a password to check
Traditional breach checking uses hash(password) for exact matching.
If your password is p@ssword1 and the breach contains password1,
exact match says "safe" — but an attacker can easily guess the variation.
This demo uses a contrastive [2,3]-gram CNN encoder (89K params, 256-dim embeddings)
trained in two stages: (1) character-level rules (leetspeak, capitalization, digit append, etc.)
and (2) keyboard walk spatial patterns (qwerty, 1qaz2wsx).
This cross-domain training achieves 20x catch rate lift over exact matching
at threshold 0.60, with 97%+ catch at 0.75 on evaluation seeds.
In production, homomorphic encryption (CKKS via enVector) enables this similarity search on encrypted vectors — the server never sees the password or its embedding.