Users' Evaluations of Packages: Demonstrations Versus Hands-On Use
Kieran Mathieson
Department of Decision and Information Sciences
Oakland University
Rochester, MI 48309-4401
(248) 370-3507
Email: mathieso@oakland.edu
URL: http://www.sba.oakland.edu/faculty/mathieson/mathieson.htm
Terence Ryan
Division of Business and Economics,
Indiana University, South Bend
1700 Mishawaka Avenue,
South Bend, IN 46634
Internet: tryan@iusb.edu
August 14, 1997
Users' Evaluations of Packages: Demonstrations Versus Hands-On Use
Abstract
Users have an important role in evaluating the fit between their tasks and software packages. Users' time is not free, however, and should be used carefully. An experiment compared different types of experience with an information system: direct (such as hands-on testing) and indirect (such as watching a demonstration). Both allowed subjects to distinguish good systems from poor systems. However, direct experience resulted in more extreme beliefs. This suggests that inexpensive demonstrations should be used initially, since they can help users discriminate between packages. Hands-on testing, though more expensive, can be used when demonstrations do not yield a clear preference.
Keywords: Package, users, selection, belief, evaluation, experiment
Although hardware prices are falling, software development costs remain high. Many firms are therefore buying packages rather than building their own information systems (IS). These packages range from microcomputer-based word processors costing hundreds of dollars to mainframe-based application packages costing millions of dollars.
Choosing the wrong package can have serious consequences (Galletta, King and Rateb, 1993). A key issue is matching the package's capabilities to user requirements. As Keider (1984) succinctly put it: "User effectiveness is ... the only lasting measure of success."
Some authors suggest that users should be heavily involved in evaluating different options (e.g., TenEyck, 1990). Users' main contribution is their knowledge of the problem the package is supposed to solve. This is particularly important when selecting a complex system that performs mission-critical tasks. MIS professionals might not be familiar with all of the nuances of the tasks and the environments in which they are performed.
Users can be involved in a number of ways. Two common activities are attending demonstrations and hands-on testing. Both activities allow users to see whether a package will make their jobs easier. However, user involvement is not free. Demonstrations take time, as much as a full day or two for a complex system. Hands-on testing is even more expensive, since users must learn how to apply the package before they can evaluate its effect on their tasks.
It is important to make good use of peoples' time during package selection, while allowing them to provide useful input on different alternatives. Demonstrations are cheaper than hands-on testing, so it is tempting to rely solely on demonstrations. However, research in social psychology suggests that direct experience with an object provides more information than indirect experience, leading to stronger beliefs about the object (Fazio, 1989). If this applies in package selection, it is possible that direct experience with a package (such as hands-on testing) results in stronger beliefs about task support than indirect experience (such as a demonstration).
This research examines the effect of direct and indirect experience on users' perceptions of a system's level of task support. Note that the focus is on task support rather than the system itself, since the task is the user's true domain of expertise. The next section outlines the theoretical basis of the study. The method and results are then described, and some conclusions drawn.
When examining a package, users are less interested in its technical attributes than its effect on their jobs. The easier a system makes their tasks, the more attractive it will be. Further, users should be able to judge the effect of a package on task difficulty more accurately that they can judge its technical aspects.
The relationship between task and system has been captured in the notion of "fit." Vessey (1991) and Vessey and Galletta (1991) use the term "cognitive fit," which results "from matching the characteristics of the problem representation to those of the task" (Vessey and Galletta, 1991, p. 65). When there is a mismatch between the representation given by the system and that required by the task, users spend effort either transforming the system's representation to that of the task or vice versa. Similarly, Miller (1988) examined "the discrepancy (or fit) between perceived job needs and IS capabilities" (p. 277). Goodhue (1986) discussed "satisfactoriness," an individual's subjective belief about "the correspondence between job requirements and IS functionality" (p. 191).
This suggests that a package can support one task well while supporting another poorly. Vessey and Galletta (1991) found empirical support for this idea in their comparison of the effectiveness of graphs and tables for different tasks. Further, Todd and Benbasat (1991) studied DSS users' choices when there is a mismatch between problem representations. They operationally defined fit as the cognitive effort required by various task/system combinations. They suggested that "decision makers tend to adapt their strategy selection to the type of decision aids available in such a way as to reduce effort" (p. 87)
Therefore, when judging a package, it is the combination of task and system that is important, not the system by itself. Further, when users judge a system, they are usually not interested in the system per se, but in its support for their tasks. The better a system supports a task, the easier the task will seem. A system can make one task easier than another. Conversely, the same task can seem easier when completed with one system than another. Of course, in most cases users only have one task/system combination, and may not be aware of other possibilities. When asked "How easy is this task?", they will reply in the context of the tools they have.
Now consider the effect of experience type on judgments of task support. Regan and Fazio (1977) distinguish between direct and indirect experience with a psychological object. Direct experience involves personal contact with the object, such as using a system to complete a task. Indirect experience involves either a description of another person's contact with the object or inference from experience with a similar object. Watching a demonstration is an example.
Direct experience has a number of effects on beliefs, where a belief is an association between an object and an attribute (Fishbein and Ajzen, 1975). Beliefs formed from direct experience are held more confidently, are more resistant to change, and influence behavior more than beliefs formed from indirect experience (Fazio, 1989). In other words, beliefs derived from direct experience are "stronger" in a number of respects than beliefs derived from indirect experience (see Raden, 1985, for a discussion of the various dimensions of strength). It has been suggested that direct experience provides more information about an object than indirect experience (Fazio and Zanna, 1981).
The most important aspect of strength, as far as a firm's package adoption decisions are concerned, is its extremity. Extremity is a belief's distance from a midpoint of indifference (Raden, 1985). For instance, the belief "the task is very difficult with this system" is more extreme than the belief "the task is somewhat difficult with this system." Variations in extremity directly affect users' evaluations, and could influence decisions about IS based on those evaluations. For example, an IS might be adopted because it makes a task "very easy" for the user, compared to another system that makes the task "somewhat easy." Other aspects of strength, while potentially important in some contexts, have less direct impacts on system design and adoption decisions.
This leads to the hypothesis:
H1. Direct experience with task/system interaction leads to more extreme beliefs about task difficulty than indirect experience.
Notice that while direct experience should make beliefs stronger, the direction of the change should depend on fit. Specifically, while a task should be perceived as relatively easy when fit is good, direct experience should lead to the task being perceived as easier than indirect experience. While a task should be perceived as relatively difficult when fit is poor, direct experience should lead to the task being perceived as more difficult than indirect experience. So experience type by itself should have no effect on perceived difficulty. Experience type should affect perceived difficulty only through its interaction with fit.
H2. Direct experience by itself should not affect evaluations.
These predictions are shown graphically in Figure 1.
Note that care must be taken in evaluating hypotheses such as H2, which propose a lack of effect. Statistical insignificance is not enough. One must ask the question: what is the probability of finding an effect if it had existed? This issue is addressed later.
The 220 subjects were undergraduate students in an introductory MIS class at an American university. All of the subjects had already passed a course in basic computer applications, covering word processing, spreadsheets and databases. Subjects had received 5 hours of formal instruction in SQL and had completed a take-home exercise before the experiment.
The two independent variables were Task (easy or hard) and Experience (direct or indirect). The dependent variable was perceived task difficulty. Two control measures were used to ensure subjects (1) were responding accurately and honestly and (2) understood the tasks.
Subjects were given two database query tasks to solve with the same database. The database had already been designed and data entered, that is, the subjects were presented with a prepackaged solution, with tables, interface and data. The database management system used in the experiment was XDB. Each task consisted of five questions. Each question could be answered by a single SQL query. Task Te (easy) involved fewer multi-table queries (joins) than Th (hard). Boehm-Davis, Holt, Koll, Yastrop and Peters (1989) have shown that queries involving multiple tables are more difficult than those involving single tables, affecting both subjects' performance and their preferences for database formats.
The system and the tasks are shown in the appendix. Notice that it is the interaction of task and system that determines difficulty. The tables could be rearranged to make Te difficult and Th easy. For example, moving the column ARTIST_STYLE from table ARTIST1 to ARTIST2 would make the first query in the appendix more complex. So, the designation of a task as "easy" or "hard" is purely relative to this particular database structure.
For each task, subjects either (1) developed and entered SQL queries themselves (direct experience), or (2) read a description of another person's queries (indirect experience). The subjects were told that the person who developed the queries was a student in the previous semester. Figure 2 shows a sample output screen.
Figure 3 shows the items used to measure perceived task difficulty. To provide a comparison level, subjects were asked to rate the task relative to an SQL assignment they had completed shortly before participating in the experiment. The responses to the three items were averaged, giving a range of 1 (very easy) to 7 (very difficult).
A self-rated computing ability scale adapted from Cheney and Nelson (1988) was administered. The scores on the ability scale were not of interest in themselves, but served as a cover for a validity scale designed to detect whether subjects were (1) carefully reading each item and (2) giving honest responses. Subjects were asked to rate their skills in three non-existent areas, as shown in Figure 3. The validity score for each subject was the average of these three items, giving a range of 1 to 5.
Figure 3 shows the items used to measure the clarity of each task, that is, how well the task was understood by the subjects. Clarity was measured to ensure that only subjects who understood the task were included in the sample. The responses were averaged, giving a range of 1 (strongly disagree that the task was clear) to 7 (strongly agree that the task was clear), with 4 as a neutral midpoint. The inter-item reliabilities were 0.73 for the first trial and 0.89 for the second.
A number of experimental sessions were conducted in a PC laboratory. A monitor was present in each session to ensure that subjects did not work together. After selecting a PC, subjects were given a description of the database and, to remind them of SQL syntax, a sheet listing valid queries that had been used as examples in class. The experiment consisted of two trials. For the first trial, either Te or Th was distributed. The choice was random. The experience type for the trial, direct or indirect, was chosen randomly. Subjects in the direct groups used their workstations to enter queries. Their machines automatically shut off after 20 minutes. Subjects in the indirect groups ran a program that displayed each question along with a query a fictitious individual had developed to answer the question. Subjects then rated the difficulty of the task. The sequence was repeated for the second trial, although the task was the one that had not been used for the first trial. That is, if a subject received Te for the first trial, he or she received Th for the second, and vice versa. Again, experience type for the second trial, direct or indirect, was chosen randomly. Thus, some subjects received direct experience in both trials, others received indirect experience in both trials, while the rest received direct experience in one trial and indirect in the other. Task difficulty was measured at the end of each trial. Finally, the clarity and validity instruments and a demographic questionnaire were administered. All items were administered by a program developed for this study.
The inter-item reliability (Cronbach's alpha) of the perceived task difficulty scale was 0.88 for the first trial and 0.81 for the second. The average validity score was 1.43, with a standard deviation of 0.62, showing that most subjects paid attention to the items and were honest about their skills. Subjects scoring above 2 on the validity scale, corresponding to a self-rating of greater than "Low" on the non-existent skills, were eliminated from the analysis.
The average score for the task clarity instrument was 5.38 with a standard deviation of 1.07 for the Te group. The mean and standard deviation for Th were 5.15 and 1.27. Subjects with a clarity score of less than 4 (the midpoint on the scale) for either task were eliminated from the analysis. These subjects may have been unsure what the task required, and their evaluations of the system’s ability to support the task may be suspect. These criteria eliminated 49 subjects from the sample, leaving 171 subjects.
The results for both trials are shown in Tables 1 and 2. The mean task difficulty for Te was 2.97 across both trials, while the mean for Th was 4.31. For both trials, the main effect of Task was significant. This suggests that the task difficulty manipulation was effective.
It was predicted that perceptions of task support would be affected by the interaction of fit and experience type, with direct experience generating stronger perceptions than indirect experience. As Tables 1 and 2 show, the interaction was not significant for the first trial, but was significant for the second. Figure 4 graphically depicts the interaction for the second trial. It matches the prediction shown in Figure 1. Perceived task difficulty was more extreme for direct experience than for indirect experience. H1 was supported for the second trial, but not for the first.
The main effect of Experience was not significant in either trial, suggesting that experience type by itself does not influence perceived difficulty. As noted above, a null result cannot be accepted based simply on its lack of significance. Harcum (1990) provides guidelines for testing whether a null result can be accepted, all of which are met by this study. First, the effects were not significant in either trial. The a levels were not marginal, but were high for both cases. Second, statistical power was calculated for each test, using the procedure outlined by Cohen and Cohen (1983). R2 was 0.46 for the first test and 0.17 for the second. Setting a to 0.05 (by convention) and using the observed values for n and r, the overall power of both analyses is greater than 0.99. This is not too surprising, given the relatively large number of subjects and the size of the effects. Third, the hypothesis is a part of a set of logically consistent hypotheses. They were based on the results of earlier research in social psychology, rather than being created for an exploratory study. It is therefore reasonable to suggest that an effect for Experience would have been found if it had existed.
The results showed that direct experience with an IS can lead to more extreme beliefs about task support than indirect experience. This effect did not occur immediately, but only on a second trial. It appeared that experience type had no direct effect, but acted only through its interaction with fit. That is, direct experience did not lead to more positive or more negative evaluations than indirect experience. Instead, it magnified the differences in evaluations that existed because of differences in task/system fit.
Information overload might explain why the effects of experience type appeared only in the second trial. It has been suggested that direct experience provides more information than indirect experience (Fazio and Zanna, 1981). However, people's ability to assimilate new information is limited (Markus and Zajonc, 1985). During the first trial, subjects were presented with a large amount of new information. They were exposed to a database they had never seen before, as well as the task itself. Their capacity to absorb new information may have been overloaded. If indirect experience already gave subjects more information than they could handle, direct experience would have had no more effect on beliefs than indirect. During the second trial, however, subjects were using a database they had seen before in a context they were already familiar with. Only the task was new. In this case, the extra information provided by direct experience might have been able to have an observable effect. This explanation is consistent with current thinking and with the data gathered here, but is not proven. Conclusions drawn from this suggestion are speculative.
The main limitation of the study is that it is a laboratory experiment. This was necessary given the tight control over task/system fit required to test the hypotheses. Using students provided a large pool of homogeneous subjects, also necessary to achieve the degree of statistical power needed to test H2.
The question still remains: are the results generalizable? Clearly, there is no one task and no one group of subjects that represents all real-world situations. Researchers could study one task context per month for the next century and not cover them all. It is impractical to insist that every possible circumstance be studied. The central question is: are there theoretically important aspects of the research context that do not apply in the real world, or vice versa? The hypotheses tested here do not have many contextual elements. The subjects were not technical experts, completed a task they understood with a computer system, and were asked for their assessment during an evaluation session. They were human, and were therefore subject to memory load limitations. The context seems to include the main elements of real evaluation situations that are relevant to the questions being asked in this study. Perhaps the most serious issue is the relative lack of expertise of the subjects. They may have been relatively susceptible to memory load effects. However, even given this problem, there are still many real situations to which the results would apply.
The results have implications for package selection. Recall that demonstrations are relatively inexpensive, but provide only indirect experience with a package. Hands-on testing gives users direct experience, but is more costly. The results of the study show that both indirect and direct experience allow users to detect differences in task support. However, direct experience appears to magnify differences in perceptions, at least under some conditions.
This suggests a cost-effective package evaluation strategy. First, demonstrations should be used to eliminate options that clearly do not fit the firm's needs. Users should be able to distinguish these packages from demonstrations alone. This might be enough to identify users' favorite option. When it is not, hands-on testing of the remaining packages might yield a clear preference. Fewer alternatives will be involved in hands-on testing, reducing the overall costs of the process.
Finally, recall that direct experience appeared not to provide more information than indirect experience on the subjects' first exposure to a system. Therefore, an initial demonstration might be useful before hands-on testing of a package. This would be an inexpensive way of introducing users to the system. Their subsequent testing of the system would provide more detailed information on the package.
References
Boehm-Davis, D A, Holt, R W, Koll, M, Yastrop, G and Peters, R Effects of Different Data Base Formats on Information Retrieval Human Factors, Vol 31 (1989) pp 579-592.
Cheney, P H and Nelson, R R A Tool for Measuring and Analyzing End-User Computing Abilities Information Processing and Management Vol 24 (1988) pp 199-203.
Cohen, J and Cohen, P Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 2nd ed, Erlbaum, USA (1983).
Fazio, R H On the Power and Functionality of Attitudes: The Role of Attitude Accessibility In Pratkanis, A R, Breckler, S J and Greenwald, A G (eds.) Attitude Structure and Function, Erlbaum, USA (1989) pp 153-179.
Fazio, R H and Zanna, M P Direct experience and attitude-behavior relationship In Berkowitz, L (ed.) Advances in Experimental Social Psychology, Vol 14, Academic Press, USA (1981) pp 162-202.
Galletta, D F, King, R C and Rateb, D The Effect of Expertise on Software Selection Database (1993) pp 7-20.
Goodhue, D. IS Attitudes: Towards Theoretical Definition and Measurement Clarity Proceedings of the Seventh International Conference on Information Systems, San Diego, USA (1986) pp 181-194.
Harcum, E R Guidance From the Literature for Accepting a Null Hypothesis When Its Truth is Expected The Journal of General Psychology Vol 117 (1990) pp 325-344.
Keider, S P Managing Systems Development Projects Journal of Information Systems Management, Summer 1984, pp 33-38.
Markus, H and Zajonc, R B The Cognitive Perspective in Social Psychology In Lindzey, G and Aronson, E Handbook of Social Psychology, Volume 1: Theory and Method, Random House, USA (1985) pp 137-230.
Miller, J Information Systems Effectiveness: The Fit Between Business Needs and Information Systems Capabilities Proceedings of the Tenth International Conference on Information Systems, Boston, USA (1989) pp 273-288.
Raden, D Strength-Related Attitude Dimensions Social Psychology Quarterly Vol 48 (1985) pp 312-330.
Regan, D T and Fazio R On the Consistency Between Attitudes and Behavior: Look to the Method of Attitude Formation Journal of Experimental Social Psychology Vol 13 (1977) pp 28-45.
TenEyck, G Software Purchase: A to Z Personnel Journal Vol 69 (1990) pp 72-79.
Todd, P and Benbasat, I An Experimental Investigation of the Impact of Computer Based Decision Aids on Decision Making Strategies Information Systems Research Vol 2 (1991) pp 87-115.
Vessey, I and Galletta, D Cognitive Fit: An Empirical Study of Information Acquisition Information Systems Research Vol 2 (1991) pp 63-84.
Biographies
Kieran Mathieson is Associate Professor of MIS at Oakland University. He received his doctorate from Indiana University. His research focuses on the manner in which beliefs about information systems are formed, and marketing applications of the World Wide Web.
Terence Ryan is Assistant Professor of Management Information Systems at Indiana University, South Bend. He received a Ph.D. in management information systems from Indiana University. Dr. Ryan is a member of the Decision Sciences Institute and The Institute of Management Sciences. His research interests involve the assessment of information systems and systems development methods.
Perceived Task Difficulty Items
1. I think that the question set was (Much harder / Much easier) than the assignment I turned in earlier.
2. How easy were the questions in this question set, compared to the assignment you turned in earlier? (Very difficult / Very easy)
3. How easy was it to extract the information needed for the questions from the database, compared to the assignment you did earlier? (Very difficult / Very easy)
Validity Items
1. Rate your ability to use dynamic compression software (e.g., FAS, DyComp II). (Very high / Very low)
2. Rate your ability to use network recognition systems (e.g., NWork, XWR).
(Very high / Very low)
3. Rate your ability to use division access software (e.g., DivInd3, DAPS).
(Very high / Very low)
Task Clarity Items
1. It was clear to me what information the questions in the question set were asking for, even if I did not necessarily know how to write a query to get the information. (Strongly agree / Strongly disagree)
2. Even though I might not have been sure how to answer the questions in the question set, I was (Very sure / Very unsure) what information the questions were asking for.
3. I knew what information the questions were asking for.
Think about the questions themselves, not the queries. Even if you were not sure how to formulate the queries, the questions might still have been clear. (Strongly agree / Strongly disagree)
Figure 3.
Instruments
All items used fully-anchored Likert scales. For brevity, only the end points are shown. The difficulty and clarity items used 7-point scales, while the validity items used 5-point scales.
Source |
DF |
Sum of Squares |
Mean Square |
F |
P |
E |
1 |
0.55 |
0.55 |
0.56 |
0.45 |
T |
1 |
140.07 |
140.07 |
142.66 |
0.00 |
E * T |
1 |
0.08 |
0.08 |
0.08 |
0.77 |
Error |
167 |
163.97 |
0.98 |
Table 1.
Effect of Task, Experience Type, and Their Interaction on Perceived Task Difficulty for the First Task
T = task
E = experience
Source |
DF |
Sum of Squares |
Mean Square |
F |
P |
E |
1 |
0.28 |
0.28 |
0.25 |
0.62 |
T |
1 |
31.70 |
31.70 |
27.95 |
0.00 |
E * T |
1 |
5.48 |
5.48 |
4.83 |
0.03 |
Error |
167 |
189.37 |
1.13 |
Table 2.
Effect of Task, Experience Type, and Their Interaction on Perceived Task Difficulty for the Second Task
T = task
E = experience
Appendix
Database and Tasks
The following set of tables was accompanied by descriptions of each field, along with valid values for enumerated types, and a general description of the firm using the database.
ARTIST1
ARTIST_ID |
ARTIST_NAME |
ARTIST_STYLE |
21 |
KAISER, K |
BLUES |
ARTIST2
ARTIST_ID |
ARTIST_LABEL |
SIGN_DATE |
INSTRUMENT |
21 |
BLUENOTE |
12/10/86 |
PIANO |
OUTLET1
OUTLET_ID |
MANAGER |
OUTLET_LOC |
7 |
DOUG AULTS |
WESTPORT |
OUTLET2
OUTLET_ID |
OPEN_DATE |
NUMBER_EMPL |
OUTLET_AREA |
7 |
10/01/88 |
14 |
975.00 |
RELEASE1
RELEASE_ID |
RELEASE_TIME |
NUM_TRACKS |
ARTIST_ID |
COL421 |
40 |
11 |
26 |
RELEASE2
RELEASE_ID |
RELEASE_NAME |
RELEASE_TYPE |
COL421 |
OLLIE NORTH AND MOTHER |
CD1 |
INVENTRY
RELEASE_NAME |
OUTLET_ID |
AMT_IN_STOCK |
OLLIE NORTH AND MOTHER |
1 |
5 |
Appendix (cont.)
Tasks
The first three questions from the good-match task, with queries.
1. What style does artist Yashamita play?
SELECT ARTIST_STYLE FROM ARTIST1
WHERE ARTIST_NAME = "YASHAMITA"
2. What is the location of the outlet where Doug Aults is manager?
SELECT OUTLET_LOC FROM OUTLET1
WHERE MANAGER = "DOUG AULTS"
3. What are the release id codes for artist #22?
SELECT RELEASE_ID FROM RELEASE
WHERE ARTIST_ID = 22
The first three questions from the poor-match task, with queries.
1. What is the location of outlet number 5?
SELECT OUTLET_LOC FROM OUTLET1
WHERE OUTLET_ID = 5
Note: This easy problem was included so subjects in the direct groups could initially focus on the details of the software, without being concerned about a complex query.
2. What are the id codes for releases stocked by the outlet located at "CENTRAL"?
SELECT RELEASE_ID FROM RELEASE2, INVENTRY, OUTLET1
WHERE OUTLET_LOC = "CENTRAL"
AND OUTLET1.OUTLET_ID = INVENTRY.OUTLET_ID
AND INVENTRY.RELEASE_NAME = RELEASE2.RELEASE_NAME
3. What release types exist for artist Cash?
SELECT RELEASE_TYPE FROM RELEASE1, ARTIST1, RELEASE2
WHERE ARTIST_NAME = "CASH"
AND ARTIST1.ARTIST_ID = RELEASE1.ARTIST_ID
AND RELEASE1.RELEASE_ID = RELEASE2.RELEASE_ID