Disclaimer:
- I am not a real web dev.
- I think these behaviors are a bit surprising at first, but they indeed make sense for performance and security reasons.
- this blog is mostly written in memory of the couple of hours I wasted.
I have a web service where users can log in and manage their data stored in this service. One feature is that users can download their data: basically, by clicking a button on the website, a new page will pop up with the URL for downloading the data, and the browser will handle the rest. Pretty standard.
As you might expect, this is pretty straightforward to implement, I guess the only thing that might be non-trivial is: the backend is basically a rest api server, and the authentication is done by the frontend setting the Authorization
header with the user’s token for every single request.
However, for the download GET request, I don’t think there is a way to open a URL on a new page and ask the browser to set that header. So what I end up doing is: one POST request for download token creation which handles receiving all arguments and authentication, and another download GET request, which requires the download token as part of its URL. Download tokens are short-lived so the implementation can be really simple.
I am aware of some other approaches but I found them either an overkill or a bit weird.
Anyway, the approach works fine and the service is running in production nicely.
So why there is this blog
Well, one day I started adding a new feature to this service. It is very similar, still just a button for users to download files, the only difference is: the server needs to do a non-trivial computation to generate the file. The computation is not that huge tho, a few seconds in worse cases, not long enough to cause an HTTP timeout or something. Not sure what you think about this 🤔, but at least I am not expecting any pitfalls. Indeed it works fine locally (famous last words?) but then when I deploy this to production …
Pitfall 1
I noticed that every time a user clicks the download button, the backend receives two identical GET requests, so the server is doing this somewhat expensive computation twice. SAD.
Why? It turns out that the browser (I am using Chrome) has this optimization: if a GET does not get any response within a few seconds, it will start to wonder if there is anything wrong with that request and just send another one.
This shouldn’t affect the correctness because GET requests should be idempotent.
As a quick hack, I tried making the download token only valid once (goodbye idempotence). Of course it didn’t work, after the second request is made, the browser will just close the first one. Maybe I could also just make the server respond something first, then produce the rest later, but I found it a bit weird.
Pitfall 2
The next thing I tried: move the expensive computation from the second GET request to the first POST request, so token creation may cost a few seconds and the server just maintains a mapping from the download token to some binary data. What can be wrong with this? Well, the browser will just consider the “open new page” part as a popup and block it (there is a small UI component indicating this happened and the user could unblock it, but it is not user-friendly). Apparently, browsers has very complicated rules on what kind of pop-up is allowed and what is not. For example, when the user interacts with the page (e.g. clicking a button), then it is fine to open a new page, but you can’t do it later.
What’s next
I think hope that’s all. What I will try next are:
- Make the POST request do the computation, and have a spinner on the frontend during that request then display another download button when the data is ready for download.
- Or: The GET request does the computation, but only does it for the first time (with a lock to avoid race), later request will use the cached result.