Properties of the C Programming Language
and their Implications for Software Quality
|
Prof. David Bernstein
James Madison University
|
|
Computer Science Department
|
bernstdh@jmu.edu
|
Motivation
- C is Very Popular:
- An enormous amount of code has been and continues to be
written in C (in the TIOBE index it was number 1 for Jan.
1988, number 1 for Jan. 1998, and number 2 for Jan 2008)
- C is Prone to Defects:
- Especially when programmers are familiar with "heavyweight"
languages that protect them where C does not
- The Questions:
- Why is C prone to defects?
- Why is C popular despite these drawbacks?
Overview
- Consider the guiding principals used by the standards
committee
- Discuss the motivations of both the implementers of C compilers
and C programmers
- Identify the resulting characteristics/properties of C
- Consider the implications for software quality
What is an Implementation?
- Formal Definition:
- Particular software, running in a particular environment,
under particular control options
- A Loose Definition:
- A compiler command including flags/options
The Guiding Principles
- Existing code is important:
- The bulk of existing code should be acceptable to any
implementation
- C code can be portable:
- The language and library should be as widely implementable
as possible
The Guiding Principles (cont.)
- C code can be non-portable:
- The ability to write machine-specific code is one of the
strengths of C
- Avoid "quiet changes" between implementor and programmer:
- Avoid changes to the language which cause a working program to
work differently without notice
The Guiding Principles (cont.)
- A standard is a "treaty":
- Implementers and programmers have different objectives
- Keep the spirit of C:
- (a) Trust the programmer.
- (b) Don't prevent the programmer from doing what needs
to be done.
- (c) Keep the language small and simple.
- (d) Provide only one way to do an operation.
- (e) Make it fast, even if it is not guaranteed to be portable.
The Spirit of C Revisited
- A Recognized Shortcoming:
- C code is often not safe/secure
- A New Facet for the Cx1 Revision:
- (f) Make support for safety and security demonstrable
Resulting Characteristics of C
- C is Lightweight:
- Many things are the responsibility of the programmer,
not the language
- C is Permissive:
- The language does not prevent the programmer from doing almost
anything
- C is Close to the Machine:
- Many operations are defined in terms of how the target machine's
hardware does it, not a general abstract rule
(e.g., whether
char
values widen to signed
or unsigned values depends on which byte operation is more
efficient on the target machine)
Some Recent History - C9X (1994)
- Support international programming
- Codify existing practice; try not to invent
- Minimize incompatibilities with C90
- Minimize incompatibilities with C++ (but don't try to become C++)
- Maintain conceptual simplicity
Some Recent History - C1X (2007)
- Programmers need the ability to check their work
(for security and safety reasons)
- No invention (under no circumstances should the language
be used to invent new concepts)
- The ability to mix and match code from different standards
is important
Kinds of Behavior
- Locale-Specific Behavior:
- Defined: Behavior that depends on the nationality, culture,
language, etc... of the implementers/implementation
- Example:
isLower()
for characters other than
the 26 letters in the ASCII character set
- Unspecified Behavior
- Defined: A behavior for which the standard provides two
or more possibilities
- Example: The order in which the arguments (which may be
expressions) of a function are evaluated
(e.g., in
f(g(i), h(i))
) the order in which
g(i)
and h(i)
are evaluated
is unspecified)
Kinds of Behavior (cont.)
- Implementation-Defined Behavior:
- Defined: A behavior that is unspecified in the standard
but specified in a particular implementation
- Example: Propogation of the high-order bit when a signed
integer is shifted right
- Undefined Behavior
- Defined: A behavior that violates a "shall" or "shall
not" requirement, a behavior that is noted as undefined
in the standard, or a behavior that is not discussed in
the standard
Why Allow for Explicitly Undefined Behaviors?
- So the implementor need not catch program errors that are
difficult to diagnose.
- To avoid defining edge cases that would favor one implementation
strategy over another.
- To identify possible extensions.
Some Implications
- Implication of All Four Kinds of Behaviors:
- Portability problems often arise
- Implications Of Undefined Behaviors:
- Anything can happen when the compiled program
is executed
- Optimizing compilers are not obligated to generate code for
undefined behaviors
Levels of Portability
- Strictly Conforming Programs:
- Use only those features of the language and library
specified in the C Standard
- Do not produce output that is dependent on any unspecified,
undefined, or implementation-defined behavior
- Conforming Programs:
- Are acceptable to a conforming implementation (i.e., may depend
on nonportable features of a conforming implementation)
Type Safety
- Defined:
-
Preservation: If a variable
x
has a type t
and x
evaluates to a value v
then v
has type t
-
Progress: Evaluation of an expression either results in
a value or there is another way to proceed
- The Type Safety of C:
- Most people consider C to be weakly-typed
- Examples in C:
- If you cast a pointer to an entity of
type
t
to a pointer to an entity of
type s
and dereference it, then the
result is undefined
- If you perform an operation on signed and unsigned
integers of differing lengths using implicit conversion, then
the result can be unrepresentable
There's Always More to Learn