Systems Analysis FundamentalsMIS 524

MIS 300

How Are Systems Described and Depicted from a User Viewpoint?

(A sixty-minute systems analysis course!)

Introduction

A better way to phrase this question would be "How can we describe our information-using activities so that we can see our way through to improvements?" This, in turn, implies several things:

1. A description which is complete enough so that we don't miss important activities and information,

2. A description which can be used to find and diagnose shortcomings or opportunities for improvement and

3. A description which can be used to specify what those improvements might look or operate like.

Based on these criteria, we quickly come to a set of specifics:

a. The description should involve information and should describe those activities that we engage in while we use that information
b. The activities should be important rather than trivial
c. Hence, the activities should therefore relate to the goals of our job in important, tangible ways
d. The information should be critical to the performance of those activities so that improvement can be considered an important goal of the exercise
e. The quality of the information as well as the quality of the activities should be an important consideration in the description, so that we can tell when improvement should be necessary.

Techniques

A number of techniques have developed over the past forty years to determine information needs and to describe related information activities. In no special order, systems analysts have used methods that focus on any or all of the following:

i. Business processes
ii. Information use activities
iii. Decisions
iv. Business-related objects
v. Critical factors in business or job performance
vi. Information-related events

In this brief introduction, we will focus on a venerable, but easy-to-understand technique of analyzing business information usage. This technique is called "information activity analysis" and consists of describing the following:

A. Sources of data
B. Ultimate destination of information
C. Processes that change or use information
D. Data movement
E. Aggregations of data (equivalent to "files")
F. Detailed description of the data used and information produced.

This technique makes no hard distinction between "data" and "information". A soft distinction is this: things look more like data when they represent raw facts and have not been processed very much; things look more like information when they have been processed to tailor them to specific decision-making or conclusion-drawing activities. Thus, an invoice is generally considered data because it represents or depicts the results of a routine business activity and is very close to the activities that generate it (i.e., customer purchases, clerical recording). A sales report is generally considered information because it has been processed (aggregated, averaged, formatted into rows, columns, pages, etc.) to facilitate decision making about sales activities, salesperson performance, product performance, and so forth.

In this technique, data is seen to move or "flow" from sources through processes to destinations. En route, it may be aggregated and stored. The job of an information "system" is to improve the quality of data received ("input") from outside the system to become information ("output") for decision makers or agents also outside the system.

Thus the general model is this:

Outside source...provides data as input...to the system...which processes it ... to create output ... from the system... to destinations.

Data Flow Diagrams

The picture that tells this story is called a Data Flow Diagram (DFD). Click here to see a sample.

DFDs have four types of objects that constitute a description of an information system:

1. Data flows
2. Data stores, which represent aggregations or files of data collected together
3. Processes, which calculate, format, or manipulate information
4. External entities (which are the sources and destinations).

Data flows are things like "account number", "employee name", "price", "salary"; these are elementary data items collected by businesses. Data flows also include more complex things like "employee-information" (which might include name, employee number, hiring date, etc.) or "transaction-data" (which might include item sold, number of items, date of sale, etc.) and even more complex structures. As you can see, there is no real limit to how complex data flows can become and much of the challenge is describing these structures and the characteristics of the individual items within them. For example, a typical data description might look like this:

Employee-information:
    Employee-name:
        Employee-surname [alphabetic, max 30 characters]*
        Employee-initial [alphabetic, max 2 characters]
        Employee-first-name [alphabetic, max 30 characters]
        Honorific [2 characters (MR,MS,DR)**]
    Employee-hire-date:
        Hire-year [nnnn*** (1900-2099)****]
        Hire-month [nn (01-12)]
        Hire-day [nn (01-31)*****]
    Employee-position-code [alphabetic, one character (A,B,C,D,E,F,G)]
    Employee-job-description [alphabetic, max 60 characters]

Etc.

* This indicates what the data appear like; the actual language used to describe the data varies from IT shop to IT shop, although there are some standards industry-wide.
** This indicates that the two characters must be either "MR", "MS" or "DR"; note that in this webpage we are not indicating that all honorifics must be one of these, only that in this example, someone has decided that this is the case. Of course, this kind of specification is often the source of error or annoyance for users!
*** There are various ways of indicating numerical formats; this is merely one example. The COBOL programming language uses the cryptic code of "9999" to indicate a four-digit number. Again, these conventions vary from place to place.
**** Another example of specification, in this case indicating a range from 01 to 12.
***** In very complex examples, there will be some dependency among data items in a structure. For example, we know that June (month 06) will always have 30 days in it, so a combination of month and day of 0631 would be illegal. And February makes things even more complicated. Often these dependencies are difficult to specify and this is another source of error in information systems.

Data Stores contain information aggregations and can be thought of as a set of records in a file. A data store is usually described in exactly the same terms as a data flow and in fact is really nothing more than a data flow "at rest". One important thing to remember is that a secure system doesn't allow information to flow from data stores to external entities (destinations) or from external entities (sources) to data stores directly without vetting by some sort of authentication, verification, and validation process. This is often another major source of error or problems in information systems.

Processes are ways of changing data into information or information into "higher quality" information. The ways of doing this are generally by computations, selection (data reduction), and reformatting. Typical information system processes calculate averages, format reports, divide data into classes, produce copies, and direct traffic (sending some information to one process and other information to another process). Processes don't really create information; they simply take lower "quality" information and make it more useful. Processes that introduce really new information all by themselves are really rogue processes; imagine clerks forging information onto sales receipts as an example of a rogue process. When a process isn't working correctly, we say it has a "bug" because it's introducing information into the system that wasn't intended to be there, information that seemingly comes from "nowhere."

Unmentioned above is the fact that every process has an agent, a person or machine that does the work. This is an essential part of every system story, because it is often the case, especially where people are concerned, that mistakes arise through inability to carry out the process correctly or quickly enough (remember the timing in the DFD based on "logical dependency.").

External entities are sources and destinations. They are considered to be outside the system. Sources may be other systems, people, or processes. Destinations may be the same kinds of objects. When we perform systems analysis (who knows where the second "s" came from in "systems"?) we do not concern ourselves with the "behavior" of external entities. We identify them and the data they create or the information they need (they are often users, after all) but we are not really concerned with what they do with the information or how they produce the data. Of course, a major system study will involve a number of interlocking systems and may then actually become concerned with what to any one of these systems might appear to be external. Please note that the goal of any system is to satisfy the destinations! So if the destinations are unhappy, it's useful to trace backwards through the system to the process, data store or even source that is causing the problem. For improvement, it's always best to start with the goals.

What Data Flow Diagrams Tell

DFDs are a relic from an older time in Systems Analysis, but because they are easy to understand, draw and read, they are very useful in having discussions with non-technical users. Perhaps the hardest problem in systems analysis is the communication gap between those who know the technology -- and the latest technology is always incomprehensible to the non-initiated -- and those who know their jobs. Both sets of individuals are experts, both are valued by their organizations, and the tragedy is that there is often little in the way of common language for them to communicate. The benefit of IT is, after all, in the appropriate use of it, but how to determine that appropriate use?

Anyway, DFDs were developed by a former social worker, Larry Constantine, who felt this need. While there are numerous alternative diagrams (including the aptly named "Rich Picture" of the so-called "Soft Systems" school in the UK and the more modern "use case" diagram) to represent user experience, the advantage of DFDs is that they are useful in all stages of system improvement: description, diagnosis, design and debate. Few other diagrams capture the breadth of user concerns: sequence, complexity, data, processing, responsibility, volume, etc.

First, a DFD neatly partitions what is a system's responsibility from what isn't. The system boundary serves this function. Of course if the boundary is incorrectly intuited (and this is unfortunately an easy mistake to make), analysts and users can spend a lot of time investigating things that really can't be known or shouldn't be of concern (like what a busy manager is going to do with a report received from a system) or might miss an important process. However, the system boundary is useful in addressing concerns such as privacy, security, and integrity of data: if information is going missing, it must be crossing this boundary unchecked.

Second, a DFD handily provides a "logical" view of what is happening because of the dependency relationships hinted at above. For example, because each output depends on some combination of the inputs, it is possible to note errors, omissions, inconsistencies, etc. in the output and actually trace back logically to those processes or sources that are causing or initiating the errors. In complex systems this isn't such an easy task, but that's what systems analysts are paid the big bucks to find out, after all.

Third, because DFDs involve data flows and data stores, it is possible to calculate volumes and even rates of information flow over time. It is relatively easy to estimate, because of the logical structure, how much information is to be processed and how busy a system or its agents will be. A major source of human error in systems is the pressure of time and volume. So we can tell what volume of output requires what volume of input (and thus what kind of checking, verification, possibility for error and reentry, etc.).

Fourth, because DFDs are logical, it becomes almost natural to ask, given an information need, "Where will the data come from?" and lay out a path for processing raw data into useful information. This is the essence of the design problem.

Fifth, DFDs can be used for diagnosis. For instance, it should be easy to locate bottlenecks (processes where volume and time considerations can prove critical), potential slowdowns (processes upon which many other processes depend), redundancy (processes that aren't necessary), loopholes and omissions (not testing for certain conditions or allowing for certain combinations), useless procedures that are never in fact performed and so on.

Sixth, DFDs have a property that hasn't been mentioned yet: decomposition. Imagine you have a process called "Prepare Annual Sales Report" that is part of a sales management system. Now this is not a simple process, is it? You have sales information for various products and sales people, information about anomalies, data concerning geographic regions and, as well, the report has to appear in a number of formats for individuals such as sales people, sales managers, marketing managers, production managers, division vice presidents, and so forth. Let's focus on the various reports and suppose there are six different formats (different ways of sorting and aggregating the data, for example, by person, product, region, date, etc.). Each format will require a special process to create. In other words, the process "Prepare Annual Sales Report" really consists of many sub-processes including the six just mentioned and perhaps many others.

We might represent this structure like this:

Prepare Annual Sales Report =

    Calculate Sales Information,
    Prepare Salesperson Report,
    Prepare Sales Manager Report,
    Prepare Marketing Manager Report,
    Prepare Product Manager Report,
    Prepare Division VP Report
    ...

In this way, we can "decompose" the various processes into a structure of simpler subprocesses. They are simpler in many ways, but the most important simplicity here is that they involve fewer agents, fewer data items, simpler processes and fewer data stores. That makes them easier to discuss and helps bridge the communication gap. It's still in the realm of the user: these processes are familiar to them. But because they are simpler, it's easier to find places for improvement. Of course, if the source of improvement is between the subprocesses, decomposition may serve to point this out, too.

SUMMARY

The above is only intended to give you an idea of how systems are analyzed with an eye to improvement. There are literally hundreds of other tools, techniques and technologies useful to professional systems analysts to do the job. I've selected DFDs because they are useful, easy to read, and easy to draw and because they represent the major elements you as a user need to be concerned with: where the data are coming from, what system products are being used for, and what the system is doing to turn the incoming data into productive information.

(Click here for a visual depiction of these concepts (.ppt))