BPI Information Technology Operations Guide – Build Phase
Description
- An operations guide that provides technical information to IT system administrators and staff regarding installation, set-up, operations and troubleshooting procedures.
Organisations Value
- The IT System Operations Guide ensures that all IT staff have consistent, well-documented and controlled procedures for all required system activities. A well-controlled operations guide also provides system administrators with an effective avenue for implementing changes to operational procedures as the IT system evolves.
- If the system or application is deployed without a well-written, comprehensive and formally-documented operations guide, inconsistencies in the management of the IT system will arise and key technical procedures may be lost over time.
Approach
The IT System Operations Guide outlines the procedures that explain how system operations are performed. The procedures developed should be clear, easy-to-follow and readily accessible. They should cover not only normal operation of the system, but the handling of exceptional or crisis situations as well.
- Determine the requirements of the IT System Operations Manual , including:
- Design, amount of detail, and format standards Type and level of user (i.e. audience of the IT System Operations Guide)
- Technique for recording and controlling user manuals (i.e. on-line hyper-text, manual paper, workflow, computerize help 'wizards', etc.)
- Determine requirements for controlling distribution and modification of documentation.
- Draft individual procedures:
- System installation and set-up
- System operations
- Hardware and software system standards
- Back-up and disaster recovery
- Security.
- Compile, a draft IT System Operations Guide (on paper or electronic format).
- Review guide with the appropriate decision-makers and revise, as appropriate.
- Test procedures with selected IT staff using the draft documentation.
- Finalize guide.
Guidelines
Problems/Solutions
- Reference documentation provided by the vendor is usually written in generic terms and is limited to the standard system functions and operations. Consider developing additional reference material to cover client-specific manual procedures, workflow and operations. However, it is important to balance the value added from custom-developed reference documentation with the ongoing cost of maintaining it.
Tactics/Helpful Hints
- The following table illustrates the issues that the systems operations guide may need to cover. It also indicates whether these are normal (A), optional (O), or unlikely (X) for two example types of system centralized system and local-office or medium system. It is assumed, for this example, that standard operational functions will have been fully defined and/or automated at centralized mainframes, but not at local medium systems. Note that there may be exceptions to this, particularly where processes have been fully automated.
System | Central System | Local Medium System |
Turning on and operating equipment (e.g. changing paper, loading and unloading tapes and cartridges) | X | O |
Loading the real-time application(s) | Ã | Ã |
Running batch processes, e.g. reporting runs, interfaces | Ã | Ã |
Running Backup routines | O | Ã |
Shutdown procedures | X | Ã |
Actions to take on expected messages or error conditions - i.e. routine interaction between the operator and the computer | Ã | Ã |
Running database recovery | Ã | Ã |
Run scheduling, dependencies with other systems, and timing | Ã | O |
Linking in vendor's remote diagnostics or maintenance service | X | O |
How to report unresolvable errors, e.g. contacting local support, contacting the supplier, escalation procedures etc. | O | O |
When and how to invoke disaster recovery procedure | O | O |
Magnetic media rotation requirements | X | O |
Output control and distribution procedures | Ã | X |
Off site security backups | X | O |
Setting system level access security | X | Ã |
Environmental needs, e.g. temperature, ventilation, clean power supply | X | O |
Archiving of data and file retention requirements | X | O |
Procedural control requirements | O | O |
Physical access to the computer equipment | X | O |
Procedures for applying software updates (and falling back to the old system if necessary) | X | O |
Report Distribution Management (manual or by Report Distribution Management System - RDMS) | O | O |
Handling of special stationery (especially regarding financially valuable documents or stationery) | O | O |
- Although primarily focused on procedures, this process will involve the definition of an 'operational environment'. It may, therefore, require a prototyping approach to investigate the most effective ways of running and controlling the system before procedures can be drafted.
Resources/Timing
Be prepared to address the following issues in the development of a IT system operations guide:
- Job Control Language (JCL)
- Job Control Language (commonly known as JCL) is the generic name for pre-coded instructions that control how the overall computer system processes the various applications and other tasks that it runs. Some computer systems use different names for this (e.g. DCL Dec Control Language or SCL - ICL's System Control Language)., however the expression JCL is universally understood.
- JCL is used to control most activities on the computer, although, in some systems, there may also be a higher level control program effectively replacing many of the operator's tasks and decisions. Some examples of the functions of JCL and associated software are:
- Chain programs or routines together into an overall suite, e.g. main batch processing, interfaces, reporting runs etc.
- Routine backing up of data
- Loading and unloading the Transaction Processing service
- Running recovery procedures (normally on special request following a problem)
- Automatic detection of errors in the processing cycles
- Automatic recovery from errors in the processing cycle
- Routine housekeeping (e.g. disk reorganization / optimization)
- Production Schedules
- The definition of operational procedures may include the setting up of regular run cycles. The scheduling of runs can be a complex activity. Most large organizations with a centralized mainframe will have a specialist section responsible for scheduling the overall system. The project team should work with them to establish a suitable schedule. With smaller, local systems, it will often be the sole responsibility of the project team. Factors may include:
- Processing needs of the new system and its users
- Requirements for routine housekeeping, backups, file reorganization etc.
- Dependencies with feeder system
- Capability of the computer to handle concurrent loads (including demands from other systems running at the same time)
- Required availability for the system to be available for real-time usage
- Feasibility of running batch updates and real-time processing simultaneously
- Cyclical and special run requirements (e.g. month end, year end)
- Normal and peak run times for the various processing cycles.
- Disaster Recovery
- Disaster planning should cover any system that is significant to the successful operation of the organization. The plan should identify the vital requirements to recover the system in the event that the normal facilities have become unusable. Put in place routine procedures to ensure that the plan could be operated successfully if required. Considerable thought and preparation may be required to ensure that the plan is foolproof. (It has been found that disaster recovery plans almost always fail unless they have been tested.)
- Some key factors to consider are:
- Access to appropriate replacement equipment (frequently using a specialist bureau or, similar equipment with adequate capacity at another 'friendly' user organization (possibly on a reciprocal basis).
- Access to telecommunications lines to connect into office networks as required.
- Access to basic system software, configured correctly to run the applications, e.g. Transaction processing and database environments properly setup.
- Access to the software to run the applications
- Access to parameter files, databases etc.
- Access to recent copy of master file data and transaction data.
- Method of identifying lost transactions when the system is restarted from the backup data (e.g. reprocessing of forms or log files if available)
- Access to any special media, special stationery etc.
- Details of how to contact key personnel required to set up the system
- Secure offsite storage for the items required to set up the system. Note that the location should not suffer the same risks, e.g. it should be physically isolated, should not be on the banks of the same river etc.
- Lists showing what these items are and how to access them
- Access security set up for the emergency system
- Accommodation for vital staff at the backup site
No comments:
Post a Comment