15

I am trying to figure out whether a non-random password that relies on information an attacker cannot possibly know can be secure.

To give an example, let’s say that I generate my password by putting together the surnames of the first 5 people that I ever fancied. Let’s also assume that there is nobody in the world who knows those names apart from me. I can think of reasons for why this password can be considered both insecure and secure, and I am unable to determine which one is correct.

Reasons it might be insecure:

  1. The password entropy associated with this method for generating the password is 0. It is completely deterministic.
  2. All discussions I found on password security center around entropy, so this should be considered a weak password based on its entropy of 0.

Reasons it might be secure:

  1. The information required for generating a password using this method cannot be known by a potential attacker.
  2. The best an attacker can do is somehow figure out my method. Even then, the “word list” of all possible surnames would contain thousands of words, so perhaps the entropy should not be considered as 0 in practice?
  3. While it sounds like security by obscurity, I believe that it might not be, because this is a case where an attacker cannot possibly get to know the surnames.
  4. And all of this is assuming that the attacker can somehow figure out my method of generating the password, which they have no way of figuring it.

I went through lots of great questions on here regarding password entropy:

  1. XKCD #936: Short complex password, or long dictionary passphrase?
  2. Is "the oft-cited XKCD scheme [...] no longer good advice"?
  3. Why are passwords generated by a password generator a complicated mix of letters and numbers instead of a long phrase?
  4. Should passwords be truly random?
  5. What does "random" mean in the context of password creation?
  6. Confused about (password) entropy
  7. Why use entropy at all in considering password strength?
  8. How secure is Snowden's MargaretThatcheris110%SEXY password?

However, I am still unable to find the answer.

10
  • 17
    If you think the entropy is 0, I don't think you have understood those other questions you read. Commented Apr 30 at 19:18
  • 5
    I'd run your password through zxcvbn and cut your score by 30% for each "fancied" person you have any internet contact with. Then cut it in half if anyone can link this question to the account where you want to use the password. Commented Apr 30 at 19:33
  • 19
    The mental disconnect I think you have here is that "attacker doesn't know" != "attacker cannot guess". The reason discussion about strength centers around entropy is that entropy is, for our purposes here, another way to say "how hard is it to guess while knowing nothing about the generation rule". Additionally, while it would be hard, bordering on impossible, to figure out your rule without personal knowledge of you, I don't have to figure out the rule to correctly guess the password! Commented May 1 at 14:28
  • 1
    "The attacker *cannot possibly know" is doing an incredible amount of heavy lifting and I don't think it's a safe assumption to make. Commented May 2 at 10:42
  • 6
    You know one of the biggest problems with passwords is password reuse, right? And you've proposed a mechanism by which you can generate one password. What are you proposing to do when you need a second, third, etc password? Commented May 3 at 5:57

4 Answers 4

31

The only thing "an attacker cannot possibly know" is something nobody knows.

Take your example, "the surnames of the first 5 people that I ever fancied". Your password would be something like RandolphBluntBrooksFerreraFoster (last year's Academy Award nominees for Best Supporting Actress). Since you must always assume your password scheme is known to your attackers (see below), that's just a dictionary attack away. Let's see how strong it is.

I have an old copy of names from the US census, which I use extensively when considering passwords and vetting gibberish detectors (remember DGAs?). This list has 88,799 entries. Let's err on the side of it being larger (and therefore harder to search), rounding that up to an even 100k (coincidentally the same size I round for estimating a standard dictionary size).

Assuming your five-name password scheme uses completely random names (which it does not!), it would have an entropy of log₂(100000⁵) = 83, equivalent to a fully random password of 12-13 characters; log₂(100000⁵)/log₂(94) = 12.67, so log₂(100000⁵) = log₂(94¹²·⁶⁷). However, because you're not picking randomly, your entropy is actually much lower.

Take a basic dictionary attack, which would try the most common names first. My example's least-popular name, by 1990 census data, is Ferrera (16,546th most common). If we instead run that math as log₂(16546⁵), we get 70 (equivalent to 10-11 random characters). A more sophisticated attacker might figure out what city you're from and what names are in your high school's year book, putting those atop their dictionary list. This still isn't zero, but you should assume that somebody close to you can break it in a few weeks, including research and setup time.

By all means, make your own scheme to remember passwords. Just make sure it's properly robust and you're not depending on the scheme's obscurity (this is more formally called Kerckhoffs's principle, which Claude Shannon summarized as "the enemy knows the system").

Secure passcodes and passphrases must be generated, not made up. You should use a password manager's generator or else Diceware to do this. Human-generated codes rely on obscurity📽️.

My own system is to generate (with a computer!) a passphrase of five random words and put a generated random 8-character passcode between a pair of those random words chosen at random, like correct horse battery 6g:UgHTL staple erupt. Even using Diceware's tiny 7776-word list, this is pretty strong: log₂(7776⁵×94⁸×4) = 119 (≈18 characters).

This is hard! It's supposed to be hard! Memorability is inversely proportional to entropy; use a password manager. The only passwords you should need to actually remember are those you have to enter manually rather than with the password manager (such as the code that unlocks the password manager and the code that unlocks your computer so you can get to your password manager). Additionally, try to use multi-factor authentication wherever possible.

18
  • 4
    Why would I always assume that the password scheme is known to the attacker? There are several (an infinite number of?) possible schemes. Isn't there lots of extra security derived from the fact that the attacker would have to try all the schemes and for each scheme there are millions of possible password?
    – hb20007
    Commented Apr 30 at 21:04
  • 22
    It's a safer way to think about things. Too many people rely on "clever" schemes that turn out to be not at all clever. You certainly gain extra security by having a novel scheme, but attackers are pretty clever too, and people constantly underestimate their resources.
    – Adam Katz
    Commented Apr 30 at 21:21
  • 7
    +1. This principle is so important that it even has a name / handle in cryptography: en.wikipedia.org/wiki/Kerckhoffs%27s_principle Commented Apr 30 at 23:43
  • 26
    @hb20007 Adam is right. Another way of thinking about it is that if you give yourself any credit for being "clever," your claiming to be more clever than a trained professional whose living depends on out-witting people who think they're clever.
    – Cort Ammon
    Commented May 1 at 4:11
  • 13
    @hb20007 using your crush as a password is pretty high up on the list of things everybody does. I used her phone number that I never dared to call for many years. (I don't any more, so don't bother trying)
    – Christian
    Commented May 1 at 11:05
12

The mental disconnect I think you (and tbf many many other people) have here is that the strength of the password is different than the strength of the rule used to generate it.

You are assuming that because the rule you use to generate the password could only be known to you that therefore the resulting password is strong. But this is a non-sequitur: the conclusion does not follow from the premise!

XKCD 221

"Random" function returns "4" because its author rolled a die

Even though the rule is good (fair dice roll), the "password" is easy for a completely naive attacker to guess. Your example suffers from the same problem: the problem isn't (only) that the rule is flawed; the problem is that it's a lot easier to guess the password than the rule, and the attacker only needs to guess the password. The conversation on this topic revolves around entropy because entropy in this context is another way of saying "how hard is this to guess for a naive attacker."

TL;DR: Don't fixate on the unknowability of your generation rule!

10

Trying to compute the entropy of a password and then evaluating the security of a password scheme based on this, very often results on arguing about the correct method to calculate some entropy while ignoring what actually determines the security of the system. Instead of focusing on some theoretical entropy metric one should instead consider how hard guessing the right password is for the attacker in reality, and how hard it needs to be.

How hard guessing is depends on what the attacker can determine about the password scheme. If it is known that you base your password on surnames you've fancied, then the attacker can simply make a list of common surnames. If the attacker knows your surname or where you live they might reduce and prioritize this list of surnames, because some names or more common in specific areas. All of this filtering and prioritizing greatly reduces the average attempts needed to guess the right password. And it can even be different for different persons (i.e. different information available to the attacker about the person). So just using some common "entropy" measurement does not describe the actual effort needed.

And then the security of the system depends on how hard password guessing must be in the first place. This depends on how much guesses the attacker can proceed within a reasonably time. For example if there is an increasing wait between guesses implemented by the login system or if the account is temporarily suspended, then the attacker cannot attempt too much guesses. In this case even simple 6 digit pin codes might be reasonable secure.

But if the attacker can mount an offline attack against stolen password hashes then the speed of guessing is only limited by the computing power the attacker has and the (deliberate) slowness of the password hashing algorithms.

9
  • I know that there are other factors regarding the security of the system like increasing wait time between guesses etc., but I am focusing on the only one under my control, which is the actual password.
    – hb20007
    Commented Apr 30 at 19:50
  • @hb20007: Like I said it does not depend on some theoretical entropy of the password but how hard it is to guess for the attacker in reality. And in case of surnames this depends on how much information the attacker can get about you to filter and prioritize the list of existing surnames. Commented Apr 30 at 19:51
  • Which depends on both the security practices of the system as well as my password. The former is not in my control so I am asking about the one which is.
    – hb20007
    Commented Apr 30 at 19:52
  • 8
    @hb20007: "relies on information an attacker cannot possibly know" - the important thing is not to rely on information the attacker cannot possibly know but which the attacker cannot possible guess within a reasonable number of attempts. Unless the password is random then how hard the guessing is depends on how much the attacker knows about you and your way to construct passwords. Commented Apr 30 at 19:56
  • 2
    Let us continue this discussion in chat. Commented Apr 30 at 20:20
4

Adam's answer is better than this, but I'm going to elaborate on my initial comments anyway:

If you think the entropy is 0, I don't think you have understood those other questions you read.

Your password scheme does not have an entropy of 0 because no one knows who all your secret sweethearts were.1 So even if someone knew your password concoction scheme, they couldn't deterministically derive your password. The link you cited as entropy 0 has no secret information. Once you know the system, the password is automatic.

This doesn't mean your password is good. It's not that good. Any cracker will use dictionaries before they use randomly generated password strings. Adam's arithmetic says your password is roughly equivalent to 10-11 random characters.

A cracker who is trying to target you might throw names and places familiar to you up at the top of those dictionaries. He or she might even see this question and concentrate specifically on names from people you're known to know. That brings your entropy way down, but still not to 0. Even knowing (hypothetically) SmithJonesJohnsonLee and having the last one narrowed down to Stevens or Kim is one bit of entropy.2 The attacker would have to have your diary or be someone who knows you extremely well1 for the scheme to have no entropy at all.

I'd run your password through zxcvbn and cut your score by 30% for each "fancied" person you have any internet contact with. Then cut it in half if anyone can link this question to the account where you want to use the password.

zxcvbn is an emulator for how a generic cracker might evaluate a password. My non-technical suggestion was to reduce its output for each person that a targeted attacker could link to you, or a lot more if this system was known to them. But even if the result is low, it's still not going to be 0.

With a password of SmithJonesJohnsonLeeStevens, zxcvbn gives me guesses_log10 = 16. That's 53 bits. Insufficient, but not the worst imaginable. But for each would-be sweetie known to be connected to you, let's cut that by 30%. Then cut it in half for knowing the system you used. Now we're down to 4.5 bits. That's about 22 password guesses, which is probably too low, but maybe within an order of magnitude.3

1 I'm assuming here that the attacker is not your mom, and you didn't out yourself on LiveJournal.
2 To be clear: One bit of entropy is extremely bad. This was an example, not a suggestion.
3 22 guesses may not be enough for all five of your fancied people. But could someone do it in 300 guesses? That's 2 weeks of one guess an hour on a live system, or before you can blink for an offline crack.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .