Expectation Entropy: A Novel Metric for Password Strength Assessment
Analysis of Expectation Entropy, a new metric for evaluating password strength on a 0-1 scale, comparing it to classical entropy measures and NIST standards.
Home »
Documentation »
Expectation Entropy: A Novel Metric for Password Strength Assessment
1. Introduction & Motivation
This paper introduces Expectation Entropy, a novel metric designed to estimate the strength of random or random-like passwords. The motivation stems from a practical gap in existing password strength assessment tools. Classical combinatorics-based formulas (e.g., $\log_2(\text{character space}^{\text{length}})$) output results in tens of bits, while the industry-standard NIST Entropy Estimation Suite provides a normalized min-entropy score between 0 and 1. This discrepancy makes direct comparison and intuitive interpretation challenging. Expectation Entropy bridges this gap by providing a strength estimate on the same 0-1 scale as NIST's tool, where a value of, for example, 0.4 indicates an attacker must exhaustively search at least 40% of the total possible guesses to find the password.
The work is contextualized within the "PHY2APP" project, focusing on generating strong symmetric passwords for Wi-Fi device provisioning (ComPass protocol) using Physical Layer Security methods, highlighting the need for a robust, scalable strength metric.
2. Various Definitions of Entropy
Entropy measures disorder, randomness, or uncertainty. Different definitions apply variably to password strength.
2.1 Min-Entropy
Defined as $H_{\infty} = -\log_2(\max(p_i))$, where $p_i$ is the probability of an element. It represents the worst-case scenario, measuring the difficulty of guessing the most likely outcome. This is the basis for the NIST suite's output.
2.2 Shannon Entropy
Defined as $H_1 = -\sum_{i=1}^{N} p_i \log_2 p_i$. It provides an average measure of information content but is criticized for being unrelated to actual guessing difficulty in password cracking contexts, as it ignores password length and the attacker's optimal strategy.
2.3 Hartley Entropy
Defined as $H_0 = \log_2 N$, it measures only the size of the distribution (alphabet size), completely ignoring character probabilities.
2.4 Guessing Entropy
Defined as $G = \sum_{i=1}^{N} p_i \cdot i$, where guesses are ordered by decreasing probability. This measures the expected number of guesses required by an optimal attacker. It is more directly related to practical cracking time but is not normalized.
3. Expectation Entropy
3.1 Definition & Formulation
Expectation Entropy is built upon the concept of Guessing Entropy but normalized to a [0, 1] scale. The core idea is to estimate the strength from a single password's composition. It considers disjoint character sets: lowercase letters $L$ (|L|=26), uppercase letters $U$ (26), digits $D$ (10), and symbols $S$ (32), forming a total character space $K$ of size 94 for English.
While the full mathematical derivation for a single password is implied but not fully explicit in the provided excerpt, the metric essentially normalizes the effort required by an optimal attacker relative to the total search space. If $G$ is the Guessing Entropy and $N$ is the total number of possible passwords (e.g., $94^{\text{length}}$ for full space), a normalized form could be conceptually related to $E \approx G / N_{eff}$, where $N_{eff}$ is an effective search space size considering the password's composition.
3.2 Interpretation & Scale
The key innovation is its interpretable scale. An Expectation Entropy value of $\alpha$ (where $0 \le \alpha \le 1$) means an attacker must perform at least a fraction $\alpha$ of the total required guesses (in an optimal order) to crack the password. A value of 1 indicates ideal randomness where the attacker must perform a full brute-force search. This aligns intuitively with the NIST min-entropy scale, facilitating comparison and decision-making for system designers.
4. Core Insight & Analyst's Perspective
Core Insight: Reaz and Wunder aren't just proposing another entropy metric; they're attempting to solve a critical usability and interpretability gap in security engineering. The real problem isn't a lack of complexity measures, but the cognitive friction when a combinatorics tool screams "80 bits!" and NIST whispers "0.7". Expectation Entropy is a pragmatic translator, converting cryptographic strength into an actionable, probabilistic risk score on a unified dashboard.
Logical Flow: The argument is elegantly simple: 1) Existing metrics live on different planets (bits vs. normalized scores), causing confusion. 2) Guessing Entropy ($G$) is closer to an attacker's reality but isn't bounded. 3) Therefore, normalize $G$ relative to the effective search space to create a 0-1 score that directly maps to an attacker's required effort percentage. This bridges the theoretical (NIST's min-entropy) and the practical (password cracker's workload).
Strengths & Flaws: The strength is its elegant simplicity and immediate interpretability—a godsend for policy makers and system architects. However, the devil is in the distributional assumptions. The metric's accuracy heavily depends on correctly modeling the probability distribution $p_i$ of characters within a single password sample, which is a notoriously difficult statistical problem. Unlike NIST's suite which tests long bitstreams, applying this to a short 16-character password requires robust estimators that may be sensitive to biases. The paper, from the excerpt, doesn't fully detail this estimation process for a single instance, which is its Achilles' heel.
Actionable Insights: For security teams, this metric could be integrated into password creation APIs or Active Directory plugins to provide real-time, intuitive strength feedback ("Your password requires 60% of guesses to crack"). For researchers, the next step must be a rigorous, large-scale empirical validation against real-world cracking tools (like Hashcat or John the Ripper) to calibrate the model. Does an Expectation Entropy of 0.8 truly mean 80% of the search space? This needs proof against adversarial AI models, similar to how GANs are used to attack other security domains. The concept is promising, but its operational utility hinges on transparent, peer-reviewed validation beyond the controlled environment of machine-generated passwords.
5. Technical Details & Mathematical Formulation
Based on the concepts outlined, the Expectation Entropy $H_E$ for a password can be conceptually framed. Let a password of length $l$ be drawn from an alphabet $\mathcal{A}$ with an associated probability distribution for each character position (which may be estimated from the password itself or a reference corpus).
Ordered Probability Vector: For the entire password space of size $N = |\mathcal{A}|^l$, one can theoretically order all possible passwords by their descending probability of being chosen (according to the generative model).
Guessing Entropy: The expected number of guesses for an optimal attacker is $G = \sum_{i=1}^{N} p_i \cdot i$, where $p_i$ is the probability of the $i$-th most likely password.
Normalization: The maximum possible $G$ for a uniform distribution is $(N+1)/2$. A normalized measure of effort could be defined as:
$$ H_E \approx \frac{2 \cdot G - 1}{N} $$
This would map a uniform distribution (perfect randomness) to $H_E \to 1$ as $N$ grows large, and a highly predictable password (where $G$ is small) to a value close to 0.
Practical Estimation: For a single password, one must estimate its "rank" or the cumulative probability of all passwords more likely than it. If a password's cumulative probability mass up to its rank is $\alpha$, then $H_E \approx 1 - \alpha$. This aligns with the paper's description that a value of 0.4 means searching 40% of the space.
The precise, efficient algorithm for estimating this from a single sample is the core technical contribution implied by the authors.
6. Experimental Results & Chart Description
Note: The provided PDF excerpt does not contain specific experimental results or charts. The following is a description based on what a typical validation study for such a metric would involve.
A comprehensive evaluation of Expectation Entropy would likely involve the following charts:
Chart 1: Metric Comparison Scatter Plot. This chart would plot passwords on two axes: X-axis showing classical bit-strength (e.g., $\log_2(94^l)$), and Y-axis showing Expectation Entropy (0-1). A cloud of points would reveal the correlation (or lack thereof) between the two measures, highlighting passwords that are long (high bit-strength) but predictable (low Expectation Entropy).
Chart 2: Cracking Resistance Curve. This would show the actual fraction of the search space an attacker (using a tool like Hashcat with a rule-based attack) must traverse to crack passwords binned by their Expectation Entropy score (e.g., 0.0-0.1, 0.1-0.2...). An ideal metric would show a perfect diagonal line where predicted effort (Entropy) equals actual effort. Deviation from the diagonal indicates estimation error.
Chart 3: Distribution of Scores. A histogram showing the Expectation Entropy scores for different password types: machine-generated (e.g., from the ComPass protocol), human-generated with rules, and human-generated without rules. This would visually demonstrate the metric's ability to discriminate between password generation methods.
The key result to validate is the claim: "Having an Expectation entropy of a certain value, for example, 0.4 means that an attacker has to exhaustively search at least 40% of the total number of guesses." This requires empirical attack simulations.
7. Analysis Framework: Example Case
Scenario: Evaluating two 12-character passwords for a system using the 94-character printable ASCII space.
Password A (Human-chosen):Summer2024!
Password B (Machine-generated):k9$Lp@2W#r1Z
Classical Bit-Strength: Both have the same theoretical maximum: $\log_2(94^{12}) \approx 78.7$ bits.
Expectation Entropy Analysis:
Password A: The structure is common: a dictionary word ("Summer"), a predictable year ("2024"), and a common suffix symbol ("!"). A probabilistic model (like a Markov chain trained on leaked passwords) would assign a high probability to this pattern. Its rank in the ordered list of probable passwords would be very low, meaning the cumulative probability of more likely passwords is high. Therefore, its Expectation Entropy would be low (e.g., 0.05-0.2), indicating an attacker would likely find it in the first 5-20% of an optimized guess order.
Password B: It appears random, with no obvious pattern, mixing character sets per position. A probabilistic model would assign a very low, roughly uniform probability to this specific sequence. Its rank would be very high (close to the middle/end of the ordered list). Therefore, its Expectation Entropy would be high (e.g., 0.7-0.95), indicating an attacker must search most of the space.
This example demonstrates how Expectation Entropy provides a more nuanced and realistic risk assessment than the identical bit-strength from the classical formula.
8. Application Outlook & Future Directions
Immediate Applications:
Real-Time Password Strength Meters: Integrating Expectation Entropy into web and application sign-up flows to provide users with an intuitive, percentage-based strength indicator.
Security Policy Enforcement: Organizations could set minimum Expectation Entropy thresholds (e.g., 0.6) instead of just complexity rules, directly tying policy to estimated cracking effort.
Automated System Audits: Scanning existing password databases (hashed) to estimate the collective Expectation Entropy distribution and identify accounts with critically weak passwords.
Future Research Directions:
Robust Single-Sample Estimators: Developing and comparing statistical methods (e.g., using neural language models, n-gram models, or Bloom filters) to accurately estimate the probability/rank of a single password from which $H_E$ is derived.
Adversarial Evaluation: Testing the metric against state-of-the-art password cracking tools and AI models (e.g., PassGAN, an adaptation of the Generative Adversarial Network framework for passwords) to see if the predicted effort matches actual cracking times.
Beyond Passwords: Applying the normalized "effort fraction" concept to other secrets, such as cryptographic keys (where bits are standard) or biometric templates, to create a unified strength metric across different authentication factors.
Standardization Efforts: Proposing Expectation Entropy or its principles to bodies like NIST for inclusion in future revisions of digital identity guidelines (e.g., SP 800-63B).
9. References
German Federal Ministry of Education and Research (BMBF). Grant details for project PHY2APP.
M. Dell'Amico, P. Michiardi, Y. Roudier, "Password Strength: An Empirical Analysis," in Proceedings of IEEE INFOCOM, 2010. (Represents survey on password strength methods).
National Institute of Standards and Technology (NIST). Entropy Estimation Suite. [Online]. Available: https://github.com/usnistgov/entropy-estimation
NIST Special Publication 800-90B. Recommendation for the Entropy Sources Used for Random Bit Generation.
J. Kelsey, K. A. McKay, M. Turan, "Predictive Models for Min-Entropy Estimation," in Proceedings of CHES, 2015.
K. Reaz, G. Wunder, "ComPass: A Protocol for Secure and Usable Wi-Fi Device Provisioning," in Proceedings of ACM WiSec, 2023. (Assumed from context).
C. E. Shannon, "A Mathematical Theory of Communication," The Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, 1948.
R. V. L. Hartley, "Transmission of Information," The Bell System Technical Journal, vol. 7, no. 3, pp. 535–563, 1928.
J. Bonneau, "The Science of Guessing: Analyzing an Anonymized Corpus of 70 Million Passwords," in Proceedings of IEEE Symposium on Security and Privacy, 2012.
J. L. Massey, "Guessing and Entropy," in Proceedings of IEEE International Symposium on Information Theory (ISIT), 1994.
C. Cachin, Entropy Measures and Unconditional Security in Cryptography. PhD Thesis, ETH Zurich, 1997.
J. O. Pliam, "The Disparity between Work and Entropy in Cryptology," 1998. [Online]. Available: https://eprint.iacr.org/1998/024
B. Hitaj, P. Gasti, G. Ateniese, F. Perez-Cruz, "PassGAN: A Deep Learning Approach for Password Guessing," in Proceedings of ACNS, 2019. (External reference for adversarial AI evaluation).