| Introduction
|
Aims
| To provide a brief introduction to the programming language COBOL. To provide a context in which its uses might be understood. To introduce the Metalanguage used to describe syntactic elements of the language. To provide an introduction to the major structures present in a COBOL program.
|
Objectives
| By the end of this unit you should -
|
Prerequisites
| None. This is the first unit in the course.
|
| What is COBOL?
|
Introduction
| COBOL is a high-level programming language first developed by the CODASYL Committee (Conference on Data Systems Languages) in 1960. Since then, responsibility for developing new COBOL standards has been assumed by the American National Standards Institute (ANSI). Three ANSI standards for COBOL have been produced: in 1968, 1974 and 1985. A new COBOL standard introducing object-oriented programming to COBOL, is due within the next few years. The word COBOL is an acronym that stands for COmmon Business Oriented Language. As the the expanded acronym indicates, COBOL is designed for developing business, typically file-oriented, applications. It is not designed for writing systems programs. For instance you would not develop an operating system or a compiler using COBOL.
|
How widely used is COBOL?
| For over four decades COBOL has been the dominant programming language in the business computing domain. In that time it it has seen off the challenges of a number of other languages such as PL1, Algol68, Pascal, Modula, Ada, C, C++. All these languages have found a niche but none has yet displaced COBOL. Two recent challengers though, Java and Visual Basic, are proving to be serious contenders. COBOL's dominance in underlined by the reports from the Gartner group.
Surprised by
COBOL's success? |
| People are often surprised when presented with the evidence for COBOL's dominance in the market place. The hype that surrounds some computer languages would persuade you to believe that most of the production business applications in the world are written in Java, C, C++ or Visual Basic and that only a small percentage are written in COBOL. In fact, the reverse is actually the case. One reason for this misconception lies in the difference between the vertical and the horizontal software markets. In the vertical software market (sometimes called "bespoke" software) applications cost many millions of dollars to produce, are tailored to a specified company, encapsulate the business rules of that company, and only a limited number of copies of the software may be in use. A good example of this kind of application is the DoD MRP II system. This system is "used to manage almost 550,000 spare and repair parts and equipment items with an inventory value of $28 billion. The system runs on Amdahl mainframes at multiple locations throughout the U.S. and contains over 4,000,000 lines of COBOL code." [http://www.uppermohawkinc.com/corporate_capabilities.htm] In the horizontal software market, applications may still cost millions of dollars to produce but thousands, and in some cases millions, of copies of the software are in use. As a result, these applications often have a very high profile, a short life span, and a relatively low per-copy replacement cost. The Microsoft Office suite (Word, Excel, Access) is an example of an application in the horizontal software market. Because of the highly competitive nature of this marketplace considerations of speed, size and efficiency often make languages like C or C++ the language of choice for creating these applications. Applications written for the vertical market, on the other hand, often have a low profile (because they are usually written for use in one particular company), a very high per-copy replacement cost, and consequently, a very long life span. For example, the cost of replacing COBOL code has been estimated at approximately twenty five dollars ($25) per line of code. At this rate, the cost of replacing the DoD MRP II system mentioned above, with a system written in some other language, would be some one hundred million dollars ($100,000,000). The importance of ease of maintenance often makes COBOL the language of choice for these applications. The high visibility of horizontal applications like Microsoft Word or Excel persuades people that the languages used to write these applications are the market leaders. But however many copies of Excel are sold, it is just a single application produced by a limited number of programmers. Many more programmers are involved in coding or maintaining one off, "bespoke", applications. And these programmers generally write their programs in COBOL.
|
Some characteristics of COBOL applications
| As exemplified by the DoD MRP II example above, COBOL applications are often very large. Many COBOL applications consist of more than 1,000,000 lines of code - with 6,000,000+ line applications not considered unusually large in many shops. COBOL applications are also very long-lived. The huge investment in creating a software application consisting of some millions of lines of COBOL code means that the application cannot simply be discarded when some new programming language or technology appears. As a consequence business applications between 10 and 30 years-old are common. This accounts for the predominance of COBOL programs in the year 2000 problem (12,000,000 COBOL applications vs 375,000 C and C++ applications in the US alone - [Capers Jones]). Twenty years ago when programmers were writing these applications they just didn't anticipate that they would last into this millennium. COBOL applications often run in critical areas of business. For instance, over 95% of finance–insurance data is processed with COBOL [In Cobol’s Defense]. The serious financial and legal consequences that can result from an application failure is one reason for the near panic over the year 2000 problem. COBOL applications often deal with enormous volumes of data. Single production files and databases measured in terabytes are not uncommon. Some characteristics that contribute to COBOL's
success |
| COBOL is self-documenting It is easy for programmers unused to the business programming paradigm, where programming with a view to ease of maintenance is very important, to dismiss the advantage that COBOL's readability imparts. Not only does this readability generally assist the maintenance process but the older a program gets the more valuable this readability becomes. When programs are new, both the in-program comments and the external documentation accurately reflect the program code. But over time, as more and more revisions are applied to the code, it gets out of step with the documentation until the documentation is actually a hindrance to maintenance rather than a help. The self-documenting nature of COBOL means that this problem is not as severe with COBOL programs as it is with other languages Readers who are familiar with C or C++ or Java might want to consider how difficult it becomes to maintain programs written in these languages. C programs that you have written yourself are difficult enough to understand when you come back to them six months later. Consider how much more difficult it would be to understand a program that had been written fifteen years previously, by someone else, and which had since been amended and added to by so many others that the documentation no longer accurately reflects the program code. This is a nightmare still awaiting maintenance programmers of the future COBOL is simple We noted above that COBOL is a simple language with a limited scope of function. And that is the way it used to be but the introduction of OO-COBOL has changed all that. OO-COBOL retains all the advantages of previous versions but now includes -
COBOL is non-proprietary (portable)
One reason for the maintainability of COBOL programs has been given above - the readability of COBOL code. Another reason is COBOL's rigid hierarchical structure. In COBOL programs all external references, such as to devices, files, command sequences, collating sequences, the currency symbol and the decimal point symbol, are defined in the Environment Division. When a COBOL program is moved to a new machine, or has new
peripheral devices attached, or is required to work in a different
country; COBOL programmers know that the parts of the program that
will have to be altered to accommodate these changes will be
isolated in the Environment Division. In other programming
languages, programmer discipline could have ensured that the
references liable to change were restricted to one part of the
program but they could just as easily be spread throughout the
program. In COBOL programs, programmers have no choice. COBOL's
rigid hierarchical structure ensures that these items are restricted
to the Environment Division.
|
References and further reading
| In this semi-formal document I have not been fastidious in referencing all the souces I have used. Because of that I would like to acknowledge that most of factual material and some of the commentary was gleaned from the following sources. These may also serve as further reading. Most of this material is available on the web but as the web is dynamic and links are all to easily broken I won't give the URL's here. You will have to search for them. ResourcesAnkrum,T. Scott - COBOL -- A Best Practice (Sept, 2001)- COBOLReport.com Arranga,Edmund C. - The Viagrazation of COBOL - COBOLwebler.com Arranga, Edmund C. & Price, Wilson - Fresh from Y2K, What's next for COBOL? (March/April 2000) - IEEE Software Arranga et al - In COBOL's Defense : Roundtable Discussion (March/April 2000) - IEEE Software Badower, Justin - COBOL: Foundation of the future - COBOLwebler.com Brown, Gary DeWard - COBOL: The failure that wasn't - COBOLReport.com Burger,Thomas Wolfgang - COBOL in an open source future (May 2000) - IBM developerWorks : Linux : Linux articles Carr, Donald & Kizior, Ronald J. - The Case for Continued COBOL Education (March/April 2000) - IEEE Software Feiman, J. - The Gartner Programming Language Survey (October 2001) - Gartner Advisory Glass, Robert L. - Cobol - A Contradiction and an Enigma - COMMUNICATIONS OF THE ACM September 1997/Vol. 40, No. 9 Jones, Capers - The global economic impact of the year 2000 software problem (Jan, 1997) Kappelman, Leon A. - Some Strategic Y2K Blessings (March/April 2000) - IEEE Software Kizior, Dr. Ronald J. & Carr, Donald & Halpern, Dr. Paul - What Professionals think of the Future of COBOL? Murach, Mike - Is COBOL Dying ... or Thriving? (February 2001) - The Cobol Newswire Pagnan, Martin - Can A Java Programmer Be Transitioned To Cobol? (Feb, 2002) - COBOLReport.com Reimann, Artur -COBOL, Language of Choice - Then and Now (January, 2001) - COBOLReport.com Sayles, Jonathan - COBOL and the Enterprise Business Application Programming Legacy - MicroFocus Ltd. Silverberg, Fred, COBOL and the Business Programming Paradigm (1996) Sneed, Harry M.- The Evolution of COBOL - COBOLReport.com Wilkinson,Stephanie - From the Dustbin, Cobol Rises (May, 2001)- eWeek
|
| Introduction to Programming
|
Introduction
| In this section a gentle introduction to programing in general, and to programming in COBOL in particular, is provided. This is done by writing some simple COBOL programs that use the three main programming constructs - Sequence, Iteration and Selection. Don't worry if you don't understand these programs at this point. The main purpose of this section is to give you a first look at some simple COBOL programs. Let's write a
program |
| A program is a collection of statements written in a
language the computer understands. A computer executes program statements one after another in sequence until it reaches the end of the program unless some statement in the program alters the order of execution. Computer Scientists have shown that any program can be written using the three main programming constructs; This section introduces COBOL programming by writing some simple COBOL programs using these constructs.
|
Sequence Program Specification
| We want to write a program which will accept two
numbers from the users keyboard, multiply them together and display
the result on the computer screen. Any program consists of three main things;
|
Program Statements and Data items
| What COBOL program statements will we need to do the
job specified above and what data items will we need to
access?
Getting the Algorithm right |
| Now all we need to do to have a working program is to declare the items needed to store the data and to place the statements shown above in the correct order. Click on these animations to see our attempts to write a program that produces the correct answer. These attempts illustrate importance of arranging the statements in a program so that they are executed in the correct order.
Because this program is short, simple and easy to understand, you may think that programming mistakes like these could never happen. But don't be deceived - when writing a larger program it is all to easy to make the mistake of trying to use, or output, the contents of a data item before you have assigned it a value.
|
An annotated look at the COBOL program
| Click on the animation below for an annotated version of the program. Click anywhere in the animation to see the first and subsequent annotations.
|
Selection Program Specification
| Write a program that accepts two numbers and an operator from the user and then performs the appropriate calculation for that operator. The operator must be either the addition (+) or the multiplication (*) operator. In this example run it is assumed that the user enters an addition sign (+) as the operator
|
Iteration Program Specification
| Write a program that accepts two numbers and an operator from the user and then performs the appropriate calculation for that operator. The operator must be either the addition (+) or the multiplication (*) operator. This is only a partial example run. It follows the flow of control through the program for only one and a half iterations.
|
| COBOL basics
|
Introduction
| This section presents the fundamentals of constructing COBOL
programs. It explains the notation used in COBOL syntax diagrams and
enumerates the COBOL coding rules. It shows how user-defined names
are constructed and examines the structure of COBOL programs.
|
COBOL idiosyncrasies
| COBOL is one of the oldest programming languages in use. As a result it has some idiosyncrasies which programmers used to other languages may find irritating. When COBOL was developed (around the end of the 1950's) one of the design goals was to make it as English-like as possible. As a result, COBOL uses structural concepts normally associated with English prose such as section, paragraph and sentence. It also has an extensive reserved word list with over 300 entries and the reserved words themselves, tend to be long. COBOL programs tend to be verbose especially when compared to languages like C. When COBOL was designed, programs were written on coding forms (see below) , punched on to punch cards, and loaded into the computer using a punch card reader. These media (coding forms and punch cards) required adherence to a number formatting restrictions that some COBOL implementations still enforce today, long after the need for them has gone. Although modern COBOL (COBOL 85 and OO-COBOL) has introduced many of the constructs required to write well structured programs it also still retains elements which, if used, make it difficult, and in some cases impossible, to write good programs.
|
COBOL syntax
| COBOL syntax is defined using particular notation sometimes called the COBOL MetaLanguage. In this notation, words in uppercase are reserved words. When underlined they are mandatory. When not underlined they are "noise" words, used for readability only, and are optional. Because COBOL statements are supposed to read like English sentences there are a lot of these "noise" words. Words in mixed case represent names that must be devised by the programmer (like data item names). When material is enclosed in curly braces { }, a choice must be made from the options within the braces. If there is only one option then that item in mandatory. Material enclosed in square brackets [ ], indicates that the material is optional, and may be included or omitted as required. The ellipsis symbol ... (three dots), indicates that the preceding syntax element may be repeated at the programmer's discretion.
|
Some notes on syntax diagrams
| To simplify the syntax diagrams and reduce the number of rules
that must be explained, in some diagrams special operand endings
have been used (note that this is my own extension - it is not
standard COBOL). These special operand endings have the following meanings:
|
An example syntax diagramNote that for clarity data items may be separated from one another by means of an optional comma. This has been done in the COMPUTE statement opposite
| In COBOL, evaluating an arithmetic expression and assigning the result to a data item is achieved by means of the COMPUTE statement. The syntax diagram for the COMPUTE is shown below.
This syntax diagram may be interpreted as follows; We must start a COMPUTE statement with the keyword COMPUTE. We must follow the keyword with the name(s) of the numeric data item (or items - note the ellipsis symbol (...)) to be used to receive the result of the expression. The #i suffix at the end of word Result tells us that a numeric identifier/data item must be used. Since the ellipsis symbol is placed outside the curly
brackets we can interpret this to mean that each result field can
have its own ROUNDED phase. In other words we
could have a COMPUTE statement like
- where Result1 would be assigned a value of 18 and Result2 would be assigned a value of 17.8. The square brackets after the Arithmetic Expression indicate that the next items are optional but if used we must choose between the ON SIZE ERROR or NOT ON SIZE ERROR phrases. Because the END-COMPUTE is contained within the square brackets it must only be used when a SIZE ERROR or NOT SIZE ERROR phrase is used.
|
COBOL coding rules
| Traditionally, COBOL programs were written on coding forms and then punched on to punch cards. Although nowadays most programs are entered directly into a computer, some COBOL formatting conventions remain that derive from its ancient punch-card history. On coding forms, the first six character positions are reserved for sequence numbers. The seventh character position is reserved for the continuation character, or for an asterisk that denotes a comment line. The actual program text starts in column 8. The four positions
from 8 to 11 are known as Area A, and positions from 12 to 72 are
Area B. Although many COBOL compilers ignore some of these formatting restrictions, most still retain the distinction between Area A and Area B. When a COBOL compiler recognizes the two areas, all division names, section names, paragraph names, FD entries and 01 level numbers must start in Area A. All other sentences must start in Area B. In our example programs we use the compiler directive (available
with the NetExpress COBOL compiler) - $ SET
SOURCEFORMAT"FREE" - to free us from these formatting
restrictions. Ancient COBOL coding form
|
Name construction
| All user-defined names, such as data names, paragraph names, section names condition names and mnemonic names, must adhere to the following rules:
|
The structure of COBOL programs
| COBOL programs are hierarchical in structure. Each element of the hierarchy consists of one or more subordinate elements. The hierarchy consists of Divisions, Sections, Paragraphs, Sentences and Statements. A Division may contain one or more Sections, a Section one or more Paragraphs, a Paragraph one or more Sentences and a Sentence one or more Statements. We can represent the COBOL hierarchy using the COBOL metalanguage as follows;
Divisions
Section names are devised by the programmer, or
defined by the language. A section name is followed by the word
SECTION and a period.
Paragraphs A paragraph name is devised by the programmer or defined by the
language, and is followed by a period.
Sentences and
statements
|
| The Four Divisions
|
Introduction
| At the top of the COBOL hierarchy are the four divisions. These divide the program into distinct structural elements. Although some of the divisions may be omitted, the sequence in which they are specified is fixed, and must follow the order below.
|
The IDENTIFICATION DIVISION
| The IDENTIFICATION DIVISION supplies information about the program to the programmer and the compiler. Most entries in the IDENTIFICATION DIVISION are directed at the programmer. The compiler treats them as comments. The PROGRAM-ID clause is an exception to this rule. Every COBOL program must have a PROGRAM-ID because the name specified after this clause is used by the linker when linking a number of subprograms into one run unit, and by the CALL statement when transferring control to a subprogram. The IDENTIFICATION DIVISION has the following structure:
The keywords - IDENTIFICATION DIVISION - represent the division header, and signal the commencement of the program text. PROGRAM-ID is a paragraph name that must be specified immediately after the division header. NameOfProgram is a name devised by the programmer, and must satisfy the rules for user-defined names. Here's a typical program fragment:
|
The ENVIRONMENT DIVISION
| The ENVIRONMENT DIVISION is used to describe the environment in which the program will run. The purpose of the ENVIRONMENT DIVISION is to isolate in one place all aspects of the program that are dependant upon a specific computer, device or encoding sequence. The idea behind this is to make it easy to change the program when it has to run on a different computer or one with different peripheral devices. In the ENVIRONMENT DIVISION, aliases are assigned to external devices, files or command sequences. Other environment details, such as the collating sequence, the currency symbol and the decimal point symbol may also be defined here.
|
The DATA
DIVISION
| As the name suggests, the DATA DIVISION provides descriptions of the data-items processed by the program. The DATA DIVISION has two main sections: the FILE SECTION and the WORKING-STORAGE SECTION. Additional sections, such as the LINKAGE SECTION (used in subprograms) and the REPORT SECTION (used in Report Writer based programs) may also be required. The FILE SECTION is used to describe most of the data that is sent to, or comes from, the computer's peripherals. The WORKING-STORAGE SECTION is used to describe the general variables used in the program.
|
The PROCEDURE DIVISION
| The PROCEDURE DIVISION contains the code used to manipulate the data described in the DATA DIVISION. It is here that the programmer describes his algorithm. The PROCEDURE DIVISION is hierarchical in structure and consists of sections, paragraphs, sentences and statements. Only the section is optional. There must be at least one paragraph, sentence and statement in the PROCEDURE DIVISION. Paragraph and section names in the PROCEDURE DIVISION are chosen by the programmer and must conform to the rules for user-defined names.
Some COBOL compilers require that all the divisions be present in a program while others only require the IDENTIFICATION DIVISION and the PROCEDURE DIVISION. For instance the program shown below is perfectly valid when compiled with the Microfocus NetExpress compiler. Minimum COBOL program
| Copyright NoticeThese COBOL course materials are the copyright property of Michael Coughlan. All rights reserved. No part of these course materials may be reproduced in any form or by any means - graphic, electronic, mechanical, photocopying, printing, recording, taping or stored in an information storage and retrieval system - without the written permission of the author. (c) Michael Coughlan Last updated : April
2002 mailto:michael.coughlan@ul.ie | ||||||||||||||||||||||||||||||||||||||||||||||