Chapter 1 Introduction
In this chapter you will be introduced to the history of the C language
and where C falls in the hierarchy of higher-level languages. Also, the
operating system and compiler the student will be working with will be
discussed. In addition, the concept of structured programming and problem
solving will be introduced.
1.1 History of C
Dennis Ritchie of Bell Labs created C in 1972. He and Ken Thompson worked
on designing the UNIX operating system. C came from Thompson's B language.
C was created as a tool for working systems programmers that needed a more
readable programming language than assembler but still needed the low level
access capabilities of an assembler.
C has rapidly become one of the most important and popular programming
languages. Most of the UNIX operating system, and MS-DOS are written in
C as are most compilers and other systems and applications software.
1.2 Higher Level Languages
C is often called a middle-level computer language. Middle-level does not
mean C is less powerful, harder to use, or less developed than high level
languages such as BASIC or Pascal; nor is C similar to a low-level language
such as assembly language. C combines elements of a high-level language
with the functionalism of an assembler.
-
High Level
-
ADA, BASIC, COBOL, FORTRAN, Pascal, PL/I, Algol
-
Middle Level
-
C, FORTH, C++
-
Low Level
-
Assembler
A middle level language gives programmers a minimal set of control and
data-manipulation statements that they can use to define high-level constructs.
A high-level language is designed to try to give programmers everything
they could possibly want already built into the language. A low-level language
forces programmers to define all program functions directly because nothing
is built in. Middle-level languages are sometimes thought of as building
block languages, because the programmer first creates the routines to perform
all the program's necessary functions and then puts them together. C and
C++ allows a programmer to define routines to perform high-level commands.
These routines are called functions and are very important to C and C++.
You can tailor a library of C and C++ functions to perform tasks that are
carried out by your program.
C and C++ manipulates the bits, bytes and addresses with which the computer
functions. Unlike BASIC which operates on strings of characters, C and
C++ operates on characters. In BASIC there are built-in read and write
statements. In C and C++ these procedures are performed by functions that
are not part of the C and C++ language itself. These input-output functions
are written in C and C++ to perform these operations.
C and C++ has very few statements to remember and only 60 keywords as
opposed to 159 in BASIC. This means that a C and C++ compiler can be written
quite easily. Since C and C++ operates on the same data types as the computer,
the code output by C and C++ is efficient and fast. C and C++ can be used
in place of assembler for most tasks.
C and C++ were first used for systems programming. Systems programming
refers to a class of programs that either are part of or work closely with
the operating system of the computer.
C and C++ is used for systems programming when:
-
The program must run quickly; C and C++ programs run almost as fast as
ones in assembler.
-
C and C++ is a programmers language, it lacks restrictions and easily manipulates
bits, bytes, and addresses.
-
A programmer needs direct control of I/O and memory management functions
that C and C++ gives.
1.3 Operating Systems
The C and C++ language is available on every operating system in use today.
The two most widely popular operating systems in use have the majority
of their code written in the C language, UNIX and MS-DOS or PC-DOS.
This class is conducted on a mini-computer using the UNIX operating
system. The UNIX operating system is a multi-user, multi-tasking operating
system. The term multi-user means that multiple people can access the system
at the same time. The term multi-tasking means that each user can have
the system running more than one task or job or program at the same time.
Most modern day operating systems, those developed in the 1970's and later
are patterned after the UNIX operating system.
The student in this class needs to know how to logon to the UNIX system,
use a text editor, use system utility programs, compile C and C++ programs,
access the host system over a network for terminal emulation and file transfer.
See Appendix A for details on how to use a UNIX system.
1.4 Structured Programming and Problem Solving
C is a structured language as are ADA and PASCAL. BASIC, COBOL, and FORTRAN
are non-structured languages. The most distinguishing feature of a structured
language is that it uses blocks. A block is a set of statements that are
logically connected.
A structured language supports the concept of subroutines with local
variables. A local variable is simply a variable that is known only to
the subroutine or block in which it is declared. A structured language
also supports several loop constructs, such as the while, do-while,
and for. A structured language allows separately compiled subroutines
or blocks to be used without being in the same program source file. This
means that a library of useful, tested blocks or subroutines or functions
can be accessed by any program written. A structured language is usually
free form.
People learn programming languages so they can use the computer
as a problem-solving tool. At least four steps can be identified in the
computer-aided problem-solving process:
-
Problem analysis and specification.
-
Algorithm development. >LI>Program coding.
-
Program execution and testing.
1.4.1 Problem Analysis and Specification
Most problems that are to be solved with a computer usually break down
to an input component, a process component and an output component. Because
the initial description of a problem may be somewhat vague and imprecise,
the first step in the problem- solving process is to review the problem
carefully in order to determine its input - what information is
given and which items are important in solving the problem - and its output
- what information must be produced to determine that the problem was solved.
The process identifies what actions must be performed on the input
inorder to produce the output. Input, process and output are the major
parts of the problem's specification, and for a problem that appears
in a programming text, they are usually not too difficult to identify.
In a real-world problem encountered by a professional programmer, however,
the specification of the problem often includes other items and considerable
effort may be required to formulate it completely.
1.4.2 Algorithm Development
Once a problem has been specified, a procedure or process to produce the
required output from the given input must be designed. Since the computer
is a machine possessing no inherent problem-solving capabilities, this
procedure must be formulated as a detailed sequence of simple steps. Such
a procedure is called an algorithm.
The steps that comprise an algorithm must be organized in a logical,
clear manner so that the program that implements this algorithm is similarly
well structured. Algorithms and programs are designed using three
basic methods of control:
-
Sequential:
-
Steps are performed in a strictly sequential manner, each step being executed
exactly once.
-
Selection:
-
One of several alternative actions is selected and executed.
-
Repetition:
-
One or more steps is performed repeatedly.
These three structures appear to be very simple, but in fact they are sufficiently
powerful that any algorithm can be constructed using them.
Programs to implement algorithms must be written in a language that
the computer can understand. It is natural, therefore, to describe algorighms
in a language that resembles, the language used to write computer programs,
or as it is more commonly called, pseudocode.
Unlike high-level programming languages such as Pascal or C, there is
not a set of rules that defines precisely what is and what is not pseudocode.
It varies from one programmer to another. Pseudocode is a mixture of natural
language, such as English, and symbols, terms, and other features commonly
used in one or more high-level languages. The following features are common
to most pseudocodes:
-
The usual computer symbols are used for arithmetic operations: + for addition,
- for subtraction, * for multiplication, and / for division.
-
Symbolic names (identifiers) are used to represent the quantities being
processed by the algorithm.
-
Some provision is made for including comments. This is usually done by
enclosing each comment between a pair of special symbols such as /* and
*/.
-
Certain key words that are common in high-level languages may be used:
for example, read or enter to indicate an input operation;
display, print, or write for output operations.
-
Indentation is used to set off certain key blocks of instructions.
The structure of an algorithm can be displayed in a structure diagram
or flowchart that shows the various tasks that must be performed and their
relation to one another. These diagrams are especially useful in describing
algorithms for more complex problems.
1.4.3 Program Coding
The third step in using the computer to solve a problem is to express the
algorithm in a programming language. In the second step, the algorithm
may be described in English or pseudocode, but the program that implements
that algorithm must be written in the vocabulary of a programming language
and must conform to the syntax of that language. The major portion of this
text is concerned with the vocabulary and syntax of the programming languages
C and C++.
In any programming language, names are used to identify various quantities.
These names are called variables. In C and C++ variable names must
begin with a letter or underscore character, which may be followed by any
number of letters, digits and underscores up to a maximum length of thirty
characters. This allows us to choose names that suggest what the variable
represents.
In the pseudocode description of an algorithm, words such as "enter"
and "read" are used for input operations and "display", "print", and "write"
are used for output operations. One C or C++ statement that may be used
for input is gets, and one that may be used for output is printf.
These two statements are not really part of the C and C++ language definition,
but are functions that exist in external libraries provided by the compiler
manufacturer. The C and C++ languages actually have no syntax for input
and output statements, but through out the years people developing C compilers
and now C++ compilers have reached agreement on certain external functions
that are to be provided to support input and output along with other functions
needed to develop meaningful programs.
1.4.4 Program Execution and Testing
The fourth step in using the computer to solve a problem is to execute
and test the program. The procedure for entering a program into the computer
varies from one machine to another. Additional details about input of program
statements is provided by the instructor. Usually, a text editor is used
to input statements. The program source file must be compiled to produce
an object module, sometimes called a relocatable binary module(RBM). The
RBM is then given to a linkage editor utility which binds the various RBMs
together to make a load module. The load module can then be executed under
control of the host operating system.
If the load module fails to produce the desired results, there could
be some type of logic fault in the algorithm, a poor implementation of
the algorithm has been done, or simply a typing mistake has caused the
meaning of some statement or statements to change. The program source file
can be modified, compiled, linked and executed again for another test run.
This process continues until the load module delivers the desired results.
1.4.5 Software Engineering
Programming and problem solving is an art in that it requires a good deal
of imagination, ingenuity, and creativity. But it is also a science in
that certain techniques and methodologies are commonly used. The term software
engineering has come to be applied to the study and use of these techniques.
The life cycle of software, that is programs, consists of five
basic phases:
-
Problem analysis and specification.
-
Algorithm development.
-
Program coding.
-
Program execution and testing.
-
Program maintenance.
This book will deal with algorithm development and program coding. The
initial step in the software life cycle, problem analysis and specification,
will be left to another class and another book.