Batch Process Brainstorming
The document discusses various methods to handle asynchronous batch processes in a large Web app. Standalone scripts are suggested, but they may not be suitable for large processes due to the need for code sharing and the potential for code duplication. URL-addressable scripts, while more efficient, can cause issues with server usage and security, and may require embedding security credentials in the URL itself.
Generated by Azure AI on June 24, 2024When you’re building a big Web app, oftentimes you get to a point when you need to run some asynchronous batch process. You need to do something at, say, 2 a.m. that doesn’t involve a request from a browser.
I ran into this problem the other day, and I tossed around some of the more obvious ideas of how to handle it –
A Standalone Script
The simplest method is obviously to just build the script as a standalone program. This is fine if it’s a simple script, and it has the benefit of being self-contained – you can run the script on a different machine than the one the app is actually running on.
But remember that you’re running this script out of the context of the rest of the app. The problem comes when you need to share code. Sooner or later, you’re going to want to instantiate an object, and your options get smaller then. You can either duplicate the code, directly include it from the Web app if they share a file system, or…what?
Jumping through hoops to get all your necessary code isn’t too bad for one script, but what is you have 20? Duplicating all that…duplication, would get unmanageable pretty quickly.
Make the Script URL Addressable
This is the simplest case: just make the script URL-addressable, then “run” it by scheduling an HTTP request using WGet or something. This has the advantage of being run completely in context – all the necessary code is available to be included in the script.
The problem with this is that if the script is big and ugly, you’re going to pound on your Web server. A big batch process could peg your Web server for 30 minutes or so, until it finishes up.
As we mentioned above, with a standalone process, you “go around” the Web portion of it, and usually just access the datastore directly (assuming the Web portion is presentation-related only). If you use client-side cursor, you get all the records in one shot, “drag” them back to your command line script, and your database can relax. None of this applies if you’re running it in-context – the script is just like a really greedy user as far as the app is concerned.
Additionally, you need to have a different security set-up for this method. Since a single HTTP request isn’t going to maintain a cookie, session state isn’t an option. Your security credentials will need to be embedded in the URL itself. This may or not make you cringe.
Build a Command Line Wrapper
eZ publish does this, and it’s a pretty good idea if you’re going to do a lot of this stuff. They have a specific class designed to help you run scripts outside the context of the Web application.
A way to do this would be to “wrap” command line processes in another script. Call that script with the “target” script as an argument. The wrapper would either include the target or evaluate the code in it, thus providing you with a consistent sandbox in which you run your command line stuff. You can import whatever you need into this sandbox so it’s available to all the scripts that are run “within” it.
Am I missing some good ideas here? I know a lot of you guys out there have run into this same issue. How did you handle it?