1. Introduction
The paper addresses a fundamental gap in password security discourse: the lack of a rigorous definition of "password strength." It argues that current approaches are often anecdotal and fail to account for the attacker's strategy. The authors propose a canonical measure based on the efficiency of potential guessing attacks, shifting the focus from password characteristics to attack characteristics.
2. State of the Art
The paper critiques the current state of password security as "grim as medieval medicine," citing Bruce Schneier's observation that much advice is based on anecdote rather than analysis. It highlights the absence of a satisfactory method to measure the strength of a password dataset, as noted in recent literature [3]. Common password strength meters are dismissed as measuring "mimicry" rather than true resistance to intelligent attacks.
3. Core Insight & Logical Flow
Core Insight: Password strength is not an intrinsic property of a string of characters; it is a relational property defined entirely by the attacker's guessing strategy. The defender's goal is not to create a "strong password" in a vacuum, but to create one that performs poorly against the set of feasible attack strategies a rational adversary might employ.
Logical Flow: The argument proceeds with formal precision:
- Define a guessing attack as an ordered list (dictionary) of candidate passwords.
- Prove that any two attacks differ only by the order of this list.
- Conclude that a password's strength against a specific attack is its position in that attack's dictionary.
- Since the defender cannot know the exact attack order, they must consider a set of plausible attacks.
- Therefore, the defender's measure of strength is the expected value of the password's position across this set of attacks.
4. Strengths & Flaws
Strengths:
- Conceptual Rigor: Provides the first formal, attack-centric definition of password strength, moving beyond heuristic rules-of-thumb.
- Game-Theoretic Foundation: Correctly frames password selection as a strategic interaction, aligning with modern security analysis like that found in Game Theory for Security research.
- Exposes Flawed Heuristics: Effectively debunks compliance-focused policies (e.g., "must include a number and symbol") that generate predictable patterns.
Flaws & Limitations:
- Computational Intractability: The core metric—calculating the expected rank across all plausible attacks—is computationally infeasible for large password spaces. It's a theoretical ideal, not a practical tool for real-time strength meters.
- Omits Key Realities: The model assumes an "offline guessing" attack with unlimited attempts, ignoring rate-limiting, account lockouts, and online detection systems which fundamentally alter the attacker's strategy.
- No Guidance on Attack Set: The paper's critical leap—defining the "set of feasible attacks"—is left underspecified. How does a defender practically model this set? This is the crux of the problem.
5. Actionable Insights
For security practitioners, this paper mandates a paradigm shift:
- Stop Measuring Mimicry: Discard password meters that only check for character classes. They train users to create passwords that are strong against the meter, not against an attacker.
- Think in Distributions, Not Rules: Instead of mandating symbols, encourage users to select passwords from a high-entropy distribution that is unlikely to align with common attack dictionaries (e.g., using diceware or password managers).
- Model Your Adversary: For critical systems, conduct threat modeling to define the plausible attack strategies (e.g., brute-force, dictionary based on past breaches, targeted personal info). Tailor password policies to disrupt those specific strategies.
- Embrace Uncertainty: Admit that perfect strength measurement is impossible. The goal is to increase the cost and uncertainty for the attacker, not to achieve a perfect score.
6. Technical Framework
6.1 Formal Attack Model
The paper models a guessing attack $A$ as an ordered sequence (dictionary) $D_A = (w_1, w_2, w_3, ...)$ of candidate passwords, where $w_i$ is a word from a finite alphabet. The attacker tries passwords in this order until success. The attack is "offline," meaning the interface provides immediate success/failure feedback with no limits.
6.2 Mathematical Formulation
Let $p$ be a specific password. For a given attack $A$, the strength of $p$ is defined as its rank in $D_A$: $$S_A(p) = \text{rank}_A(p)$$ where $\text{rank}_A(p) = i$ if $p = w_i \in D_A$.
Since the defender does not know the exact $A$, they consider a set $\mathcal{A}$ of possible attacks. The canonical password strength $C(p)$ is then the expected rank: $$C(p) = \mathbb{E}_{A \sim \mathcal{A}}[\,S_A(p)\,] = \sum_{A \in \mathcal{A}} P(A) \cdot \text{rank}_A(p)$$ where $P(A)$ is the probability (or likelihood) assigned to attack $A$ from the set $\mathcal{A}$. This formulation directly ties strength to the defender's belief about the attacker's strategy.
7. Experimental Results & Analysis
Conceptual Experiment & Implication: While the paper itself does not present empirical data from software runs, it logically demonstrates the necessity of its model through a thought experiment. It shows that two passwords, "Password123!" and "xQ37!z9pLm", might receive similar scores from a naive meter checking for length and character variety. However, "Password123!" will have a very low rank (high strength) in a brute-force attack ordering but an extremely high rank (low strength) in a dictionary attack that prioritizes common base words and patterns. The canonical measure $C(p)$, by averaging over both attack types, would reveal the true weakness of "Password123!" relative to the random string.
Chart Interpretation (Conceptual): Imagine a bar chart comparing three password evaluation methods for a sample of passwords:
- Method A (Naive Meter): Shows "Password123!" and "xQ37!z9pLm" as equally strong.
- Method B (Dictionary Attack Rank): Shows "Password123!" as very weak (low rank number) and "xQ37!z9pLm" as strong (high rank number).
- Method C (Canonical Measure $C(p)$): Shows a weighted average. "Password123!" score plummets due to its high probability in dictionary attacks, while the random string retains a high score. This chart would visually argue that $C(p)$ correlates better with real-world crackability.
8. Analysis Framework: Case Study
Scenario: A company's password policy requires: "At least 12 characters, including uppercase, lowercase, a number, and a symbol."
Traditional Analysis: A password like "Summer2024!$" passes the policy and gets a "Strong" rating from a typical meter.
Canonical Measure Analysis:
- Define Attack Set $\mathcal{A}$:
- $A_1$: Dictionary attack using common words ("Summer"), seasons, years, and common symbol suffixes ("!$"). Probability: High (0.7).
- $A_2$: Targeted attack using company name, employee info. Probability: Low for bulk attack (0.1).
- $A_3$: Full brute-force on 12-char space. Probability: Extremely Low (0.001).
- $A_4$: Attack using passwords from previous breaches of similar companies. Probability: Medium (0.199).
- Estimate Ranks:
- $\text{rank}_{A1}("Summer2024!$")$: Very low (e.g., in top 10 million).
- $\text{rank}_{A2}(p)$: Could be low if targeted.
- $\text{rank}_{A3}(p)$: Very high (~$95^{12}$).
- $\text{rank}_{A4}(p)$: Potentially low if pattern is common.
- Calculate $C(p)$: The expected rank is dominated by the high-probability dictionary attack $A_1$, resulting in a low canonical strength score, exposing the policy's failure.
9. Future Applications & Directions
- Adaptive Password Policies: Systems could use the canonical framework to create dynamic policies. Instead of static rules, a backend service could estimate $\mathcal{A}$ based on current threat intelligence (e.g., newly leaked dictionaries) and reject passwords with a low $C(p)$ score against that updated model.
- Password Manager Integration: Password managers are ideal for implementing this. They can maintain a local model of $\mathcal{A}$ (based on global breach data and heuristic rules) and use it to generate passwords that maximize $C(p)$. This turns the theoretical model into a practical, user-transparent security enhancement.
- Formal Security Proofs: The model provides a foundation for formally proving the security properties of password generation algorithms in academic literature, similar to how encryption algorithms are analyzed.
- Hybrid Threat Models: Future work must integrate the canonical measure with real-world constraints like rate-limiting. The attack set $\mathcal{A}$ would then include not just password orderings, but also strategies for distributing guesses across time and accounts.
- Machine Learning for $\mathcal{A}$: The major open problem—defining the attack set—could be addressed with ML. Systems could train models on actual cracking attempts and leaked passwords to continuously learn and update the probability distribution $P(A)$ over strategies, creating a moving target for attackers.
10. References
- Panferov, E. (2016). A Canonical Password Strength Measure. arXiv:1505.05090v4 [cs.CR].
- Schneier, B. (2007). Schneier on Security. Wiley.
- Bonneau, J. (2012). The Science of Guessing: Analyzing an Anonymized Corpus of 70 Million Passwords. IEEE Symposium on Security and Privacy.
- Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal.
- Florêncio, D., & Herley, C. (2007). A Large-Scale Study of Web Password Habits. Proceedings of the 16th International Conference on World Wide Web.
- Ur, B., et al. (2015). Do Users' Perceptions of Password Security Match Reality? Proceedings of the 2015 CHI Conference on Human Factors in Computing Systems.
- NIST Special Publication 800-63B (2017). Digital Identity Guidelines: Authentication and Lifecycle Management.
- Wang, D., et al. (2016). The Tangled Web of Password Reuse. NDSS Symposium 2016.