Manually Monitoring Jobs: A Makeshift Approach
Manual monitoring of your job schedule is not ideal; after all, you implemented automated job scheduling for a reason. With manual monitoring, you’re dedicating a valuable team member to watching job statuses come in all day and maybe all night long. Ultimately, most of your jobs should work successfully, so manually monitoring your job stream for the rare occurrence of something going wrong is a waste of your team’s time. Manual monitoring might be worthwhile during a trial phase of a product as you learn how it works, but beyond a trial phase, you should not rely on manual job monitoring.
Automated Job Monitoring: A Nuanced Look at Your Schedule
Automated job monitoring is the solution you need and there are three different types of automated monitoring.
Job Status Monitoring, which allows you to set parameters for jobs: thus when a job hits a certain state, you’ll be notified. For example, you might have a job that you want to know when it enters a job queue for the agent. You can set a job status monitor so that you’ll be notified when this happens. This prevents you from having to watch the job stream manually to see when this particular job arrives at a certain status.
Job Runtime Monitoring, which boils down to job overruns or underruns.
An overrun means that a job takes a significant amount of time longer than you expected it to run. For example, you might have a job that typically runs in a half an hour. If this job runs for more than an hour, you will want to be notified because some necessary resources might be locked, in which case a system may need to be shut down so that your job can access the resources it needs to move forward again. Underruns are a bit more insidious.
Underruns let you know that part of your process didn’t work correctly. For example, For example, data may not have been brought from the database and put into a flat file that your job needs to process, so the file is just sitting there empty. When your process runs, it only takes a few seconds, but you know that usually this job takes 45 minutes to complete. This means that an upstream process didn’t work. In this way, underrun monitoring is extremely important for helping you to debug common problems in your job stream.
The third type of automated monitoring is job submission monitoring for late starts. This occurs with jobs that either don’t have a set schedule, i.e. jobs that are purely event-driven but generally they run at 1400 hours, or jobs that have a schedule but also have event-driven components. Job submission monitoring also comes into play when you have a busy job queue: your job could be submitted to the queue but not have started running yet. A lot of times, jobs are critically important and they take a fixed amount of time to complete. For your job schedule to succeed, these jobs need to start at a certain time or within a certain amount of time of being submitted in order for things to proceed smoothly.
Reacting to Job Monitors
So how do you react to these monitors? In reaction to some of the above events, you’ll want to kick off another job that can clean up the problem and restart your job or otherwise do some sort of third-party notification that perhaps your scheduling tool can’t handle.
The other way that your enterprise job scheduler can react is to notify someone. Many job schedulers include email notifications, and it’s becoming more and more common to send push notifications via the cloud to a cell phone. Being alerted instantly to important job statuses is critical to maintaining an efficient and reliable job schedule for your organisation. The ability to react to a problem before it affects the whole job stream or cross-department processes is very helpful.
Finally, you can kill the job if it’s running amok. In most cases, you would want to kill a job that is an overrun running way past the boundary that you’ve set. For example, a set of database jobs must be done by 1400 hours because there is a whole load of new jobs coming in from the finance department that are of higher priority. If the job is not complete by 1400 hours, you can kill it so that the higher priority jobs can continue seamlessly. Killing overrun jobs is a reasonable expectation for your job scheduling software.
Conclusion
Automated job monitoring is a key part of monitoring your job schedule in a dynamic environment. If you use your reactions appropriately, you will save a lot of time when it comes to troubleshooting issues with your job streams.
Automate Schedule provides you a central interface from which you can monitor your job streams. Automate also provides automatic notification, which means that you can react to job statuses instantly as needed, keeping your schedule up and running for your important processes.
Skybot Scheduler is a system that addresses the above issues and SPI, the African distributor for utility software products and services to the Open Systems segment of the IT industry, is the sole sub-Saharan Africa distributor for Help/Systems and Skybot Software
For more information, please contact Chris Anderson of SPI Group Pty chris@spi.co.za or visit our website www.spi.co.za
Comments