Google OAuth “invalid_grant” nightmare

Published in

Inside Timekit

9 min read

Jul 18, 2016

At Timekit, we use the Google Calendar API extensively. A big part of our business is basically built on top of Google, as many of our customers use it sync bookings and availability to their Google calendar. So to say that the integration is key to our operational success is a grand understatement.

As with any software projects, sometimes you run into edge-cases that sucks you into a black hole for days on end. It starts out innocent enough — ”lemme just Google this real quick” — but quickly unfolds into a heroic detective novel with bad error messages as clues and the villain always lurking within arms reach (or so you think). In total, we spent a week worth of man hours hunting down this particular “invalid_grant” problem 🙈

Some background facts worth mentioning:

We’re acquiring refresh tokens for offline access, syncing Google accounts when users are not actively logged in.
We’re using to Google Calendar API, so the integration is user-specific
We’re using the OAuth 2.0 protocol through Google’s PHP SDK

First clue

Over the course of time, we started to get reports from some users that their Google sync and event creation were failing. We had been monitoring our logs and had noticed a recurring error that we couldn’t quite explain. It checked out; many of the users that experienced problems got the same error message:

400 - Google_Auth_Exception
Error refreshing the OAuth2 token, message: ‘{
 “error” : “invalid_grant”
}

Alright, let’s see what the RFC 6749 OAuth 2.0 spec has to say about it:

invalid_grant
The provided authorization grant (e.g., authorization code, resource owner credentials) or refresh token is invalid, expired, revoked, does not match the redirection URI used in the authorization request, or was issued to another client.

Right — so for literally any reason possible, our tokens are getting rejected by Google. Not too surprising, but there’s no additional error description/reason returned by Google, so that didn’t help us much.

Okay, let’s see what the rest of the world has to say about it:

Google OAuth “invalid_grant” nightmare — and how to fix it (3)

Hey, at least we’re not alone here!

It’s worth noting that there were no apparent pattern in when the error occured: both Gmail and Google Apps accounts, both new and old users, happens at different times and never happened for all users at once. Sigh. Let the fun begin.

I’m gonna spare you the witch hunt and cheat a bit here by fast-forwarding countless hours of searching, testing, cursing and crying. Here’s a complete list we compiled of what might be wrong and what we did to rule it out:

⭐️ Hint: the solution for our case is number 10 & 11 in the list

By FAR the most popular answer with tons of upvotes on StackOverflow. Apparently if timestamps in our servers and at Google were misaligned, token validation would fail. A bit obscure, sure, but could it be that simple?

We installed ntp (guide 1, 2) and checked if there were a time offset. 5 ms. Fair enough, milliseconds do matter, so after it had synchronized, we tried to call Google with one of the problematic accounts again. Same result — “invalid_grant”. We double checked that ntp was indeed synchronizing on our API server, but to no avail. No simple fix this time.

Side note: if a large time offset actually is your problem, you should know that the default behaviour by ntp is to re-synchronize slowly over a few hours or days. This might give you a false-positive if you expect the change to happen instantly. It is possible to force the synchronization instantly though.

Kinda obvious and very reasonable, but the thing was that some accounts were working fine where others didn’t and we hadn’t changed our signup code for months. Surely all accounts would fail if we weren’t granted offline access for any of them, right? Let’s double check just to be sure:

$client->setAccessType("offline");

Yup, it’s there. Moving on.

Because that the errors seemed to occur at random, this was a good candidate. We also knew that Google did in fact enforce usage limits, so we checked:

Google OAuth “invalid_grant” nightmare — and how to fix it (4)

As evident by their dashboard, we were no where near the limits of 1,000,000 events per day. Same per user — we might hit 5–10 requests per hour, but definitely not 1 request per 200ms. We even tried raising the limits just to be sure, with no effect.

There is a limit of 25 on how many refresh tokens that can be administered per account per client. I’m a bit confused about the terminology, but I assume client means a registered app client (essentially Google API key) and accounts are users, so we would only run into this problem if we generated multiple refresh tokens per account — which we don’t. We only get a refresh token on first authorization and, if for some reason, Google throws us a new refresh token, we make sure to use that one in the future. So when Google writes “If the limit is reached, creating a new token automatically invalidates the oldest token without warning”, that shouldn’t be a problem.

A token that hasn’t been used for 6 months automatically expires. As many of our users that got the “invalid_grant” error was created recently (under 6 months ago), that couldn’t be it.

Many answers suggested that we we might be using the wrong identicator when talking with Google. This might actually be true in cases where you are accessing the Google API in a context that is not user-specific, such as Google Analytics and for Google Apps administration. In those cases you should use service workers instead as they are meant for server-to-server usage with API keys.

For us though, we were integrating other people’s specific accounts, so that was not it. On we go.

We had a lurking feeling that too many access tokens in a short time might trigger some temporary throttling by Google.

A few months earlier, we found a side-effect in our refresh token part of the code where we requested a new access token every time we talked with Google — even though previous access tokens were still valid (access tokens has an expiration of 1 hour),

Consider the following SDK method:

$client->refreshToken($myRefreshToken);

You can pass a refresh token to it. Would you expect it to:
1) simply set the passed refresh token as an internal variable and be used in future API calls or
2) call the Google API directly and retrieve a new access token using the passed refresh token?
It turns out it was the second option (code). We were calling the method every time we made a Google request instead of only doing it when the current access token had expired (there is a method for checking that).

After we fixed the bug, we would request maximum 24 new access tokens per day. We verified that it actually worked and could safely disregard this hypothesis too.

When we started the development of the Google integration about a year ago, the SDK was still in v1.x.x and since then, a v2.0.1 had been released. The changes in the major version bump was mostly cleanup of old methods (that we weren’t using) and there we no mentions of anything related to refresh/access tokens, but hell; at this point we were getting desperate.

After a “composer update” and some method renaming to fit the changes, we tried again. Same friggin’ “invalid_grant” laughing us in the face.

Based on the error description of the OAuth2 spec, the error message would be returned in cases where the tokens were malformed, incomplete or simply non-existing. The refresh tokens followed the same pattern of starting with a “1/” and a long string of random numbers/characters, resulting in 44 or 62 characters in total.

Here’s an example:

1/BtXr3R4lfNdNx6s715Fz0PF2v8wj_aSwDcJTD2a4MWfBactUREfofsF9C7PrpE-j

If we sent gibberish to Google, it would trigger the same “invalid_grant”. So this was actually a good bet. But the failing accounts all looked to fit the same recipe as the working ones. We couldn’t find a pattern mismatch. And based on the fact that accounts had worked previously, with the same refresh token, and then suddenly stopped working, it contradicted this argument.

When a user revokes access from the apps permission page, the next request we makes to Google gets the following response:

Error refreshing the OAuth2 token, message: ‘{
 “error” : “invalid_grant”,
 “error_description” : “Token has been revoked.”
}’

Hi-five Google — an error description!! So based on the existence of this error description, it seemed likely that Google would always supply it in cases if revoked access, right?

Well, good news, this is where our detective story takes a turn for the better. After getting in touch with a handful of users, we found that none of them had the “Timekit.io” app on their permission page. It wasn’t there — same as if they had revoked it manually. But it was our users that complained about the errors in the first place so surely they would know if they themselves had actively revoked access, right?

Here’s the kicker:

12 hours after access has been revoked, Google stops sending the “error_description” in the response. Yes, really.

So was this it? Would all our troubles be because some users had, by accident, revoked access to us? I mean, of course some users that actually intended to revoke access (they wouldn’t want to use our service anymore) would get the error, but they weren’t complaining either — it was expected behaviour.

It got us thinking though: maybe some process at Google automatically revokes access in certain conditions, without the user's knowledge? 🤔
And with that, we circled in on the real culprit.

In December 2015, Google announced that they changed the default behaviour for password resets:

In an effort to further increase account security for Google Apps users, a recent change has been made to our security policy, whereby OAuth2 tokens issued for access to certain products will now be revoked when a user’s password is changed. For example, if a user loses their device, and changes their Google password, their mail and other data will stop syncing to that device when the password is reset.

😂

We got in touch with some of the users, asked if they had changed their passwords recently, and yes they indeed had. Although a bit disappointing from an techy engineer's perspective, this explanation actually did make the most sense. The token was indeed revoked and the grant became invalid, thus the error from Google. Also, as a token revocation in Google’s mind is the same no matter the action (password reset or manual), the 12-hour rule above also applies here. That was it. Finally.

The good news is that if the user simply re-authenticates with us, the get a new refresh token from Google and can continue uninterrupted from there on.

Side note: to make things trickier, the change was not rolled out to Google Apps accounts. This was part of the reason why password resets didn’t jump into our eyes: we were still getting “invalid_grant” on Google Apps accounts. As we couldn’t find any settings in the Google Apps admin dashboard about app revocation on password resets, we must assume that they had simply revoked access manually. But the fact that the same error “invalid_grant” error were true for both cases was extremely misleading.

Update: as mentioned by Adam in the comments, the change also applies to Google Apps accounts now (since October 2016). See the announcement by Google.

So, let’s recap:

Access can be revoked manually/intended or automatically by a reset password (the latter is true only for non-Google Apps!).
The first 12 hours, you’ll get an error message “Token has been revoked” for both cases.
After 12 hours, the message disappears and you’re left with the generic “invalid_grant” without any error description.
There doesn’t seem to be any way of knowing whether the revocation was due to a password reset. You gotta assume it can be either.
To be sure, we handle all “invalid_grant” cases by sending an automatic one-time email to the user with descriptions on how to reconnect.

Google, if you’re listening, a little wish:

A) Please include an “error_description” at all times, for any reason. It can reduce troubleshooting from days to minutes.

B) Make sure that the “error_description” for reset password revocation is different from manual/intentional revocation.

I hope you found this helpful if you’ve ran into “invalid_grant” headaches yourself. Feel free to reach out with any questions or new findings you might have, I’m all ears.

Shameless plug: want to be the main protagonist in our next detective story? 🤓 Pop us an email at yourfriends@timekit.io

Google OAuth “invalid_grant” nightmare — and how to fix it (2024)

References