Predictability of PHPs array_rand

Question

It's generally known information that methods such as the array_rand() method in PHP are not considered cryptographically secure. I'm trying to understand under what situations generated result may be predictable.

If I know the seed, the values in the array, a single generated value and how many times the method has been called before that value was generated, I know I can easily calculate all subsequently returned values. I can do that by writing my own script that uses the same seed and generates the same number of results, thus putting it in the same state.

Assuming I don't know the seed or the number of previously generated values, what are my chances/how low can I get the probability of predicting future values.

To define a more concrete example and make the question less theoretical, lets assume we're using array_rand to generate case-insensitive alphanumeric tokens with a length of 12. This is done by grabbing 12 values from a character array by calling array_rand 12 times thus making the token [A-Z\d]{12}. I know I have one of a thousand consecutively generated tokens, but not which position it was generated in.

Can I predict the next token (assume I've not got the last token generated)? I'm assuming this can't be predicted with 100% accuracy, but what are the chances of brute forcing all possibilities for the next token, and how many would there be?

Assuming I can validate if a token is valid, how much does knowing 2 consecutive tokens (24 values) narrow my chances of predicting the 3rd, etc.

I've seen some research on cracking the state of rand but the articles generally don't deal with constrained/truncated ranges.

P.S. I'm trying to understand the proof/math behind why it's insecure, not looking for suggestions of more secure approaches.

Perhaps a better fit for Crypto SE?
– Jedi
Commented Jul 30, 2016 at 14:22 — Jedi, Commented Jul 30, 2016 at 14:22
Quite possibly, I didn't even know such an SE existed.
– Peter O'Callaghan
Commented Jul 30, 2016 at 15:20 — Peter O'Callaghan, Commented Jul 30, 2016 at 15:20

Sjoerd · Accepted Answer · 2016-08-30 08:04:14Z

array_rand calls mt_rand internally. This is a Mersenne twister algorithm with a state size of 624 numbers. This means that if you get 624 consecutive outputs of mt_rand, you know the whole state and can predict all future numbers.

As you correctly indicate it is pretty hard to obtain the output of mt_rand in real-life cases, because it is typically limited to some range. This is not necessarly a problem for the attacker: if the application calls mt_rand(0, 8) the attacker only knows three bits of the state, but he also needs to predict only three bits to predict the algorithms output.

Another practical problem with cracking PRNGs is that the attacker needs to request tokens from the same process, because different processes have different PRNG states. Typically when you connect to a server the request gets handled by a random process. You can do up to 100 requests on one connection, but this is still short of the 624 requests you need to get the state of mt_rand.

In your example you call array_rand 12 times, so the attacker does not get any intermediate output of mt_rand. There are too many combinations here to brute-force.

So this could be practically secure, but it would be risky. One application I looked into also used mt_rand for token generation, and then had another page that called mt_srand(). This made it possible to seed the PRNG with a known value. I also wrote something about cracking the PHP PRNGs, although it does not go into mt_rand state cracking.

Julie Pelletier · Accepted Answer · 2016-07-30 15:09:16Z

You could build your data based on the more secure openssl_random_pseudo_bytes() from the OpenSSL library instead. That obviously implies a base conversion in order to get the required range but it doesn't rely on a seeded random table as other functions (such as rand()).

Note that PHP's built-in mt_rand() is actually much better than the classic rand() in terms of efficiency and randomness (more normal distribution), but is still seeded.

Both methods, when properly implemented, will require the same effort to brute-force. The main difference is that once the brute force is successful, it may be possible for an attacker to either find more details on your implementation patterns or find the seed which therefore make it easier for the next attack.

In practice, cases where it will actually make a real difference are extremely rare, but there have been cases where it has been abused. The best (and only) example I know of is at a casino slot machine in Montreal where this guy spent weeks trying to find patterns and eventually did because of the seed.

"but is still seeded" All random number generation algorithms are seeded. The difference between a PRNG and CSPRNG is not the fact that it's seeded, what it's seeded with or who seeds it, it's several other properties like "given N outputs from the CSPRNG, output N+1 cannot be computed." (Where N, as far as I know, is usually infinity.) — Luc, Commented Jul 30, 2016 at 15:25

Luc · Accepted Answer · 2016-07-30 15:37:38Z

Well if you want to know how to predict a PRNG, google it. Find out which PRNG is used for array_rand and google it, for example "predict mersenne twister" (without quotes) gives me two github links (in the top 3 results) to people who've managed to write a program to predict next outputs based on previous ones.

You ask specifically after predicting the PRNG when the raw output is not given, e.g. when it is used to generate random characters (which have only a limited range, e.g. 0-26 for lowercase letters). This makes it a lot harder but I could imagine it still being done with some guesswork if you don't have the source code (black box test).

In a white box test, where the source code is known, it should be fairly trivial. One might need more outputs to recover the state (if part of the PRNG's output is thrown away), but it should be a very similar process.

I'm not sure there is a mathematical proof, like you ask, that proves all non-CSPRNGs must be predictable. I think they are all predictable except the ones that are made (and perhaps proven) to have certain properties.

Stack Exchange Network

Predictability of PHPs array_rand

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
php
random
token
.

Hot Network Questions

Predictability of PHPs array_rand

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged phprandomtoken.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
php
random
token
.