Improved Models for Password Guessing...tication for ASP.NET MVC [2], typically include hashing and...

Improved Models for Password Guessing

Wesley TanseyDept. of Computer Science, The University of Texas at Austin

1 University Station C0500, Austin, TX, [email protected]

ABSTRACTOne approach to measuring password strength is to assessthe probability it will be cracked in a fixed set of guesses.The current state of the art in password guessing employsa first-order Markov model that makes several assumptionsabout the distribution of passwords. We present two novelapproaches to modeling password distributions that removesome of these assumptions. First, a layered Markov modelis developed that extends a first-order model with index-sensitive weights. This model learns a separate set of tran-sition probabilities for every ith character in a password.We validate our approach on a collection of publicly avail-able datasets of real-world passwords. Our results indicatethat the layered model increases performance by as much as157%. Second, we present a technique that, given an ap-proximate model, learns a better distribution of passwordson-line, while simultaneously cracking passwords. Our ap-proach also has the appealing property that the number ofguesses required to learn is inversely proportional to thenumber of passwords in the database. We demonstrate theutility of our algorithm with a realistic example of two dis-tinct distributions and show that the model’s performanceincreases by 143% after the learning phase.

Categories and Subject DescriptorsD.4.6 [Security and Protection]: Authorization; K.6.5[Security and Protection]: Authorization; E.3 [DataEncryption]: Code breaking

General TermsPasswords, Markov Models, Evolutionary Algorithms, Crypt-analysis

1. INTRODUCTIONAs tools and services continue to shift into the cloud, usersare increasingly finding their data stored in remote databaseswith various levels of security. Access to these accounts aretypically protected by a user-selected password that is thenstored in the database. In the event of a database being com-promised by an attacker, a user’s password is only secure ifthe database was properly hashed and salted, and if the userchose a password that is difficult for the attacker to guess.While modern web application authentication frameworks,such as AuthLogic for Ruby on Rails [1] and Forms Authen-tication for ASP.NET MVC [2], typically include hashingand salting of passwords, many organizations continue torely on older, less-secure systems that may only hash pass-words or even store them in plaintext. Correspondingly,

recent years have seen a rise in attacks on web applicationdatabases, resulting in a number of large databases of real-world passwords being made available online [4].

Even if a compromised database has been properly encrypted,the passwords are still vulnerable if they are chosen accord-ing to a predictable pattern or from a small distribution ofpossible passwords such as a dictionary of words. Indeed,studies show that users frequently choose low-entropy pass-words that contain only lowercase letters [6]. As a counter-measure to this, applications generally enforce a set of con-straints that user passwords must satisfy. Often such con-straints are encoded in simple rules such as “must containat least one number” or “must be at least 8 characters long.”These constraints are designed to force a user to increase theentropy of their password, thereby making it more difficultto crack.

While the intention of rule sets are to force users to choosemore random passwords, human users have consistently cho-sen to simply adapt their predictable patterns to match min-imum requirements of new rule sets [13, 5, 6]. For instance,a user who wants to enter “password” as their password maybe thwarted by a rule requiring a number is present. Ratherthan selecting a new password at random that satisfies therule, the human user is more likely to make a small edit totheir initial choice, such as using “password1” as their pass-word. Although this change is valid according to the ruleset, it is nonetheless not substantially more secure than theoriginal password because the user is still following a pre-dictable pattern.

These predictable patterns have been exploited effectivelyby multiple techniques. Modern dictionary attacks, such asthose used by the cracking tool John the Ripper (JtR) [3],have adapted to this by including a mangling phase that per-turbs the dictionary according to a set of hand-crafted rulesthat mimic how a human is likely to perturb their pass-word. Probabilistic context-free grammars can be crafted togenerate better guesses based on commonly observed pass-word syntax [14] and can easily be reused to create implicitcreation rules [13]. Alternatively, Markov models can beused to order guesses based on the probability of each se-quence of characters being typed [10]. If the database isonly hashed and individual records are not salted, then thehashed records are vulnerable to rainbow table attacks [11]that can be enhanced with a hybrid Markov model.

While highly effective compared to brute force methods,the Markov models described in [10] and implemented inJtR rely on two key assumptions. First, all characters in apassword are assumed to be identically distributed after theinitial character and any additional information about thedistribution of characters at each index of the password isdiscarded. Second, the distribution of passwords and char-acter transition probabilities in the training set is assumedto be representative of passwords in any database.

In practice, neither of these assumptions should hold true.The distribution of passwords is strongly affected by thecreation rules applied to the database and such rules varyacross organizations, with some placing no constraints onpasswords, others enforcing explicit rules about minimumlengths and different character types, and recent implicitapproaches [13] based on likelihood of a password beingguessed. Every new iteration of creation rules fundamentallyalters the distribution of user-selected passwords in waysthat may be unknown to the attacker, but are no doubt stilldriven by a predictable generative pattern optimized morefor easy human recall than difficulty of cracking. Thus, im-proving the capability of Markov models to accurately rep-resent known password distributions and to adapt to new,unknown distributions will enable us to better understandthe objective strength of a specific password and may leadto a more precise measurement of guessability.

To this end, we present two novel approaches to model-ing password distributions that remove the assumptions de-scribed above. First, a layered Markov model is developedthat extends a first-order model with index-sensitive weights.This model learns a separate set of transition probabilitiesfor every ith character in a password. We validate our ap-proach on a collection of publicly available datasets of real-world passwords. Our results indicate that the index-awaremodel can increase accuracy by as much as 153% over themodel currently used in the popular cracking tool John theRipper.

If a different set of creation rules were enforced betweenthe training and target databases, they may have signifi-cantly different distributions of passwords. To handle thisscenario, we present a technique that, given an approximatemodel such as one trained on a database with no creationrules, learns a better distribution of passwords on-line, whilesimultaneously cracking passwords. Our approach also hasthe appealing property that the number of guesses requiredto learn is inversely proportional to the number of passwordsin the database. We demonstrate with a realistic exampleof training a model on a database with no creation rulesand then evolving it on-line against an encrypted databasewith creation rules requiring passwords to be at least eightcharacters and contain at least one number. Our results in-dicate that the model’s performance increases by 143% afterthe learning phase.

This paper makes the following novel contributions:

• An improved Markov model capable of more accu-rately representing a distribution of passwords.

• A comparative analysis of the current state-of-the-art

Markov model with our improved model on severalreal-world datasets.

• A machine learning algorithm that takes a model trainedon a similar distribution and can improve it on-line,while simultaneously guessing passwords, to better modelthe target distribution.

The rest of this paper is organized as follows. Section 2presents background information and related work. Section3 describes our improved Markov model. Section 4 presentsour machine learning algorithm for improving a model on-line. Section 5 presents empirical results of our improvedmodel on several real-world datasets and the results of ourlearning algorithm on a realistic example. In Section 6 wediscuss insights gained from our analysis and outline futurework. Finally, Section 7 presents our conclusions.

2. BACKGROUNDWe next discuss relevant background work related to thestate of the art in using Markov models for password crack-ing and the machine learning technique we use in our on-linelearning algorithm.

2.1 First-Order Markov ModelsCurrently, the most advanced model, first presented in [10]and later implemented in JtR, is a first-order Markov chainas shown in Figure 1a. The model generates passwordsprobabilistically according to the weighted, directed edgesbetween nodes. Sampling a password begins at the origin,where each edge stochastically transitions to one of the 95nodes in the main layer, each corresponding to a differentASCII password character.1 The nodes in the main layerthen transition between themselves in an analogous manner.At each transition, the character for that node is appendedto the guess and the model is iterated for a fixed passwordlength. As the model contains only an origin node and a sin-gle layer of character nodes, it is called a first-order Markovmodel.

To create a password cracking model, one first trains theMarkov model over a known distribution. This distributioncan simply be a publicly available set of passwords froma previously attacked dataset. For instance, JtR uses therockyou [4] dataset for its Markov model. For efficiencypurposes, the Markov model can be converted from a prob-abilistic graphical structure to an ordered mapping based onthe probability of each path, in descending order. This en-ables the model to be used to structure the order of guessesand can also be applied to rainbow table attacks [10].

2.2 Evolutionary AlgorithmsEvolutionary Algorithms (EAs) [7] are a policy search methodbased on the process of Darwinian evolution. A populationof individuals is evaluated based on a user-defined fitnessmetric. After every individual has been evaluated, naturalselection is performed on the population, with the fittestindividuals surviving and creating randomly mutated off-spring. This process is repeated either for a fixed number of

1Valid password characters are assumed to be in the range of[32,126], but our approaches are trivially extensible to othercharacter sets.

Figure 1: An example of (a) the configuration of the current state-of-the-art, first-order Markov model and(b) our layered Markov model with n = 4.

generations or until some other stopping condition, such asa global optima, is found. As the original application of EAswas to learn finite state machines, they seem well-suited toevolving Markov models.

Specific to our implementation, we modify the NEAT al-gorithm [12] to support Markov model evolution. NEATwas originally created to evolve artificial neural networks,but has features useful for evolving any graphical structure,such as support for arbitrary topology, speciation by graph-ical similarity, and fitness sharing to support innovation andencourage diversity in the population.

In the next section, we present our new representation thatbuilds on the first-order Markov model concept.

3. LAYERED MARKOV MODELWhen creating a first-order Markov model, much informa-tion is lost about the true generative model underlying thedistribution of passwords. For instance, while the modelis able to differentiate the first character in a word fromall later characters (via the first set of transition probabil-ities from the origin node), the remaining characters arethen treated as being identically distributed. Clearly thisassumption is incorrect, since we know that when users addnumbers to their alphabetic passphrases they overwhelm-ingly choose to append them to the end [13]. It seems in-tuitive that other index-dependent password rules may alsoexist.

To capture such rules, we extend the first-order Markovmodel into an n-layered model, where n is the length of thedesired password(s) to crack. Figure 1b shows an example ofour password model with n = 4. The model is composed of aset of layers that are trained with index-sensitive weights upto the nth layer, at which point all remaining probabilitiesare grouped together.

It is worth noting that we have intentionally kept this modelcompact compared to other options. For instance, we could

have created an nth-order Markov model in which each nodetransitions to its own set of nodes, effectively capturing bothindex and character information about all previous charac-ters in the current state. However, such an approach wouldcause a prohibitively large expansion in the state space,likely causing the model to be slow and requiring an unreal-istic number of samples for adequate training. Additionally,converting the graph into a deterministic ordering for effi-cient guessing, as in [10], would be infeasible as the numberof paths would be too large. We therefore believe this layeredapproach achieves an important balance between granularityand combinatorial explosion.

This layered representation enables us to capture more in-formation about a known distribution of passwords. In thenext section, we present our approach for learning a modelwhen the distribution is unknown.

4. ON-LINE LEARNING ALGORITHMMost leaked or hacked password databases are heavily skewedtowards weak passwords composed mostly of lowercase let-ters. This is especially true of the largest hacks, such asthose analyzed in [13], which all enforced no password cre-ation rules at all. This is most likely because the systemsthat are most vulnerable to attack are those which are out-dated and fail to follow the common practice among ap-plication developers. Rule sets employed at creation time,whether known or unknown to the attacker, also mean thatthe distribution of passwords may be specific to the databaseitself. Intuitively, training on a different database (e.g., therockyou database that JtR relies on) may even be detrimen-tal to performance if the set of creation rules for the targetdatabase used a similar model internally to reject predictablepasswords [14, 13].

Mainstream adoption of creation rules is also constantlyevolving. Common practice has moved from no require-ments, to including one uppercase and number, to now fre-

quently also requiring a special character.2 In the extremecase, one may not even know the rule set and may be facedwith a target database of hashed and salted passwords ofunknown distribution. In such a case, one is left with threeoptions: 1) perform a brute force search under the worst-caseassumption that the password rules forced users to choosetruly random passwords, 2) use some combination of knownpassword distributions to train a model in the hopes that theunknown distribution is sufficiently similar, or 3) attemptto approximate the unknown distribution by performing asearch over the space of possible models. To the best of ourknowledge, all previous work has focused on one of the firsttwo options; our approach addresses the third option.

We cast the password cracking challenge as a reinforcementlearning problem and learn, via an EA, a policy for gen-erating password guesses. At each step the agent guessesa password and receives a reward equivalent to the num-ber of accounts that match the guess, excluding duplicatesof previous guesses by that agent. Each guess is generatedby stochastically traversing the Markov model; while thisproduces some guesses that have already been found in thedatabase, they are necessary and even desirable as we wishto compare models of the entire database, not just the por-tion that remains encrypted. The fitness score for an in-dividual is equivalent to the sum of all the rewards it re-ceived from its guesses. The population of individuals isthen evolved using a variant of the NEAT algorithm [12]adapted for Markov models.

Our approach has several key benefits for password crack-ing. First, EAs are embarrassingly parallel as each individ-ual can be evaluated independently on a separate process.Our algorithm also operates on-line, learning the distribu-tion while simultaneously cracking passwords, reducing theburden of the learning process. Since the number of validguesses defines the granularity of the search space, increas-ing the number of passwords to crack actually smoothens thefitness landscape and enables the algorithm to make moreincremental progress. Thus, our approach also has the ap-pealing property of scaling inversely proportional to the sizeof the database.

Next we describe the experimental setup and results thatvalidate our n-layered Markov model and our learning algo-rithm.

5. EXPERIMENTS AND RESULTSWe conducted separate sets of experiments to evaluate ourtwo approaches. To validate our layered Markov model, weran it against a collection of real-world password datasetsand compared its results to that of a first-order model. Forour EA, we created two realistic datasets representing fun-damentally different password distributions and showed thata simple first-order model trained on the first database canbe drastically improved by evolving it on-line against thesecond database.

5.1 First-Order vs. 8-LayersTo evaluate the effectiveness of our n-layered model, we com-pared its performance to a first-order model on five real-

2This is the explicit creation rule set for UT Austin EIDs.

world datasets: faithwriters, singles.org, phpbb, rock-you, and myspace. The first four have previously been ana-lyzed in-depth [13] and the last is a database of social net-working passwords acquired via a phishing scam. All pass-word datasets were acquired by unaffiliated third parties andare publicly available online [4]. In the case of the myspace

dataset, the real database had creation rules in place thatrequired a minimum of one numeric character; however, thephishing website simply accepted any input from the userand thus only 85% of passwords in the raw dataset con-tain a number. We filtered out the 15% of passwords thatcould not be in the actual database and refer to the resultingcleaned dataset as the myspace dataset. A summary of thedistributions of passwords in each database is presented inTable 1.

For each of the five databases, both a first-order and an 8-layered model were trained on the distribution of passwordsin that database. Each trained model was subsequently runfor 3×108 guesses on the four other databases and the train-ing database, for a total of 25 runs per model. Figure 2 showsa comparison of the performance of the first-order and 8-layered models on each database. The layered model clearlydominates the first-order model across all databases, out-performing the first-order model in all 25 tests by as muchas 157.65%. Figure 2f shows the average performance gainof the layered model over the first-order model across eachtesting database.

Database Passwords L U N Otraining 354296 100% 0% 9.94% 1.71%testing 354296 100% 0% 100% 1.71%

Table 2: Summary statistics for the realistic exam-ple databases. Both datasets are created from a per-turbed English language dictionary. The right fourcolumns show the percentage that contain each typeof character: L = lowercase, U = uppercase, N =number, and O = other character.

5.2 Learning an Unknown DistributionAside from the filtered myspace dataset,3 all of the real-world password databases contain no creation rules and areconsequently composed of highly similar password distribu-tions. Thus, to validate our EA we have chosen to createtwo realistic example databases, each a perturbed version ofan English dictionary of 354K words.

In our training dataset, we randomly mutate 10% of thedictionary entries to contain a number; though this distri-bution does not match any of the benchmark datasets usedin the last experiment, it is not unrealistic as a recent study[6] showed that more than 90% of seven-character passwordscontain only lowercase letters. The testing dataset simu-lates the same dictionary with creation rules enforced thatrequire a password to be at least eight characters long andcontain at least one number; for each entry in the dictionarythat is less than seven characters, the length constraint issatisfied by adding multiple digits. In both datasets, the

3We are currently running an evolution on the myspacedataset, however due to its small size the run required alarge number of guesses per individual and was not able tofinish by the paper deadline.

Database faithwriters singles.org phpbb rockyou myspace

Unique Passwords 8347 12233 184389 14344388 31326Contains Lowercase 91.55% 88.54% 86.46% 78.45% 95.82%Contains Uppercase 9.94% 9.84% 10.10% 9.32% 6.62%Contains Number 44.55% 38.11% 54.11% 68.08% 100.00%Contains Non-alphanumeric 0.55% 0.25% 2.21% 7.10% 3.29%

Table 1: Summary statistics for the five real-world password databases used to validate our approach. Thefirst four databases all contained no creation rules and thus have similar distributions. The myspace databasehad a creation rule requiring passwords contain at least one number, effectively skewing the distribution ofpasswords.

digit insertions are sampled from a non-uniform distribu-tion similar to the pattern found in the rockyou database[13]. Table 2 summarizes the distribution of passwords ineach of our example databases.

To evaluate the performance of our on-line learning algo-rithm, we first train a first-order Markov model on the train-ing dataset. We intentionally chose the simpler first-ordermodel instead of a layered model so as to make the compar-ison be against the currently-accepted state of the art. Thismodel is run for 108 guesses and serves as the baseline forcomparison to our EA. The baseline model is then used asa seed for a population of 100 individuals, each evaluatedon 105 guesses per individual. The algorithm is run for 200generations, producing 107 guesses per generation.

Figure 3: Performance results comparison of a staticmodel to our evolving model. The evolving model’sability to learn the distribution on-line enables it toquickly surpass the static model.

After 200 generations of evolution, the resulting model pro-duces nearly 143% more correct passwords in 108 guessesthan the baseline. Furthermore, as shown in Figure 3, bygeneration four, after only 4×107 guesses, the on-line evolu-tionary algorithm surpasses the the number of correct guessesdiscovered by the baseline model in 108 guesses. These re-sults suggest that the algorithm is not only effective at learn-ing the distribution, but also at speeding up password crack-ing while still in the process of learning.

In the next section, we present a brief discussion of our re-sults and outline future work.

6. DISCUSSION AND FUTURE WORKBoth our layered model and learning algorithm have shownsubstantial performance improvement over the current ap-proach. In this section we discuss the insights we gainedfrom our analysis and describe potential future work.

Our current approach relies on stochastic activation to gen-erate each guess for a model. While this method is likely agood approximation of the effectiveness of a Markov model,in practice these models are converted into precomputed ta-bles based on the probability of each possible path. Largemodels pose the risk of a combinatorial explosion in pathcounts and may make the model infeasible to implement intools such as JtR. Similarly, large lookup tables may not fitwell into the processor’s cache and thus suffer a performancepenalty that negates their accuracy gains over more compactmodels. We believe that our layered model fits well intothese constraints, but plan to demonstrate this concretelyby implementing our 8-layer model in JtR and running anin-depth comparison of to its current model.

Conceptually, it is clear that creation rules affect the distri-bution of passwords in a database and that a variety of suchpolicies are in use across different organizations. However,none of the larger databases that have been made publiclyavailable, such as rockyou or phpbb, contained any creationrules. Although the myspace database did enforce (after fil-tering) a creation rule requiring at least one number, therelatively small size of the database means that our learn-ing algorithm requires a large number of guesses to improveits model. Thus, we have started an experiment to learna better first-order model by training on phpbb and test-ing on myspace. It is expected to take around 200 hours tocomplete and the results are consequently still pending.

In addition to the biases our work seeks to address, other bi-ases are frequently introduced into password guessing mod-els. Most attackers follow an increasingly aggressive planof attack, starting with a dictionary of common passwords,then mangling that dictionary to include numbers and non-alphanumeric characters, and then finally relying on a Markovmodel to intelligently brute-force the attack. Consequently,Markov models, such as the one in JtR, that train on anentire database like rockyou are learning a distribution overall passwords rather than all important passwords. A betterapproach may be to first filter out passwords that an at-tacker would find by other means and then training on theresulting difficult passwords.

In a similar manner, the assumption that all passwords ina database were drawn from the same distribution and canbe best modeled with a single Markov model seems naive.Rather, it is more likely that a database contains a mixtureof samplings from different, possibly even disjoint, distribu-tions. It may therefore be possible to improve performanceby first clustering passwords into different categories andthen training a collection of models or training models usinga technique like boosting [8] that prunes out the passwordsa model can already generate easily.

Finally, we plan to explore ways to enhance our learningalgorithm. It may be possible to achieve better results byreplacing the NEAT algorithm with an Estimation of Dis-tribution Algorithm (EDA) [9] that explicitly tries to modelthe distribution of solutions in place of crossover and mu-tation. We also plan to investigate combining our layeredmodel and learning algorithm.

7. CONCLUSIONAs modern password databases are often hashed and salted,more intelligent guessing is the primary avenue for increasingpassword cracking rates. Thus, better modeling of passwordpredictability is crucial to increasing the precision of pass-word strength estimates. We have presented two novel ap-proaches that improve on the current state-of-the-art mod-els for predicting passwords. We demonstrated that a lay-ered, index-sensitive Markov model outperforms a first-orderMarkov model on a collection of real-world password datasets.We also showed that when the training and target databasesare drawn from different distributions of passwords that itis not only possible but actually benefitial to learn a bettermodel of the target database on-line, while simulatenouslymaking guesses. We believe these two contributions rep-resent significant improvements compared to existing tech-niques and their success reveals a range of new possibilitiesfor research in password strength analysis.

8. REFERENCES[1] Authlogic. Binary Logic, December 2011. [online]

https://github.com/binarylogic/authlogic.

[2] FormsAuthentication Class. Microsoft MSDN,December 2011. [online]http://msdn.microsoft.com/en-us/library/

system.web.security.formsauthen%tication.aspx.

[3] John the Ripper password cracker. Openwall Project,December 2011. [online]http://www.openwall.com/john/.

[4] Passwords - SkullSecurity. Skull Security PasswordDatabase, December 2011. [online] http://www.skullsecurity.org/wiki/index.php/Passwords.

[5] M. Dell’Amico, P. Michiardi, and Y. Roudier.Password strength: an empirical analysis. InINFOCOM, 2010 Proceedings IEEE, pages 1–9. IEEE,2010.

[6] D. Florencio and C. Herley. A large-scale study of webpassword habits. In Proceedings of the 16thinternational conference on World Wide Web, pages657–666. ACM, 2007.

[7] L. Fogel, A. Owens, and M. Walsh. Artificialintelligence through simulated evolution. 1966.

[8] Y. Freund and R. Schapire. A desicion-theoretic

generalization of on-line learning and an application toboosting. In Computational learning theory, pages23–37. Springer, 1995.

[9] P. Larranaga and J. Lozano. Estimation ofdistribution algorithms: A new tool for evolutionarycomputation, volume 2. Springer Netherlands, 2002.

[10] A. Narayanan and V. Shmatikov. Fast dictionaryattacks on passwords using time-space tradeoff. InProceedings of the 12th ACM conference on Computerand communications security, pages 364–372. ACM,2005.

[11] P. Oechslin. Making a faster cryptanalytictime-memory trade-off. Advances inCryptology-CRYPTO 2003, pages 617–630, 2003.

[12] K. Stanley and R. Miikkulainen. Evolving neuralnetworks through augmenting topologies. Evolutionarycomputation, 10(2):99–127, 2002.

[13] M. Weir, S. Aggarwal, M. Collins, and H. Stern.Testing metrics for password creation policies byattacking large sets of revealed passwords. InProceedings of the 17th ACM conference on Computerand communications security, CCS ’10, pages 162–175,New York, NY, USA, 2010. ACM.

[14] M. Weir, S. Aggarwal, B. de Medeiros, and B. Glodek.Password cracking using probabilistic context-freegrammars. In Security and Privacy, 2009 30th IEEESymposium on, pages 391–405. IEEE, 2009.

a) b)

c) d)

e) f )

Figure 2: Performance results comparison of a first-order Markov model to our 8-layer Markov model onthe a) faithwriters, b) myspace, c) phpbb, d) rockyou, and e) singles.org datasets, as well as f) overall averageimprovement per dataset using our approach.

Improved Models for Password Guessing...tication for ASP.NET MVC [2], typically include hashing and...

Documents

Transcript of Improved Models for Password Guessing...tication for ASP.NET MVC [2], typically include hashing and...