|
|||||
General Instructions For each project, the instructor will assign you a
team and data to analyze. Delegate tasks and collaborate as seems
appropriate, based on your various skills (submit only one report). For
each project, submit a concise (5-10 page) report as a Microsoft Word document
(not a spreadsheet or PowerPoint) that
answers the questions posed. Strive for effective writing (see the textbook Appendix
I). Creativity and initiative will be
rewarded. Include a very short Executive Synopsis. Avoid careless spelling
and grammar. Paste graphs and computer tables or output into your written
report. It may be easier to format tables and charts in Excel and then use Paste
Special > Picture to avoid weird
formatting and permit re-sizing within Word. |
|||||
P-1 Random teams are assigned on Moodle (submit only one
report). Data:
Download Big Dataset 02 - Crime in Major Cities from Moodle. Your team is assigned one crime category
(you can change it if you wish). Copy the city names and the chosen crime
data column to a new spreadsheet. Delete lines (if any) with missing data. Analysis:
(a) Sort the observations (with city names). (b) List the top 10 and bottom
10 data values (with city names). (c) For the entire data set, calculate the
mean and median. What do they tell you about center? Would the mode be
helpful for this type of data? Explain. (d) Calculate the standard deviation.
(e) Calculate the standardized z-value for each observation. (f) Are
there outliers or unusual data values (see p. 137)? Discuss. (g) Use MegaStat (or Minitab or Excel) to make a histogram.
Describe its shape. (h) Calculate the quartiles. Make a boxplot and describe
it. (i) Make a scatter plot of your kind of crime
versus a different type of crime. What does it show? (j) Ambitious
students: Sort the database in random order (see bottom of page 36) using
Excel’s function =RAND(). Copy and paste the first
few sorted lines into your report to illustrate your sorting method. Comment
on anything unusual (or interesting things that you might find on the web). |
|||||
|
|||||
P-2 The instructor will assign you a team and a company. Data: Download the company’s quarterly revenue spreadsheet from Moodle. Analysis: (a) Briefly describe the company’s history, its products and services, competition, and market conditions (supply, demand, substitutes). Possible sources are Wikipedia or yahoo biz or the Mergent database from Kresge Library (Find Articles > Databases A to Z > M > Mergent). (b) Fit several trends (e.g., linear, quadratic, exponential) using Excel or Minitab or MegaStat. (c) Interpret each fitted trend equation. (d) Forecast the next 4 quarters (t = 29, 30, 31, 32) based on trend alone using each fitted trend model (i.e., plug in the time index for periods n+1, n+2, n+3, n+4). (e) Use MegaStat (or Minitab) to calculate quarterly seasonal factors. Is there noticeable seasonality? Explain. (f) Multiply each quarterly trend forecast by its seasonal factor. Make a chart showing the seasonally-adjusted forecasts (see the annotated Amazon case posted on Moodle). (g) Using the four criteria for assessing forecasts (see p. 614), which trend model (if any) would yield credible forecasts? If none, then what? (h) As an added perspective, have someone on your team create a chart of stock price for your company’s ticker symbol over the same period of time that is covered by your revenue data. Does it track the company’s revenue? Hint: For historical stock prices, try yahoo finance or BigCharts . (i) If you had more time, what might you do? |
|||||
|
|||||
P-3 You will be assigned team members and a dependent variable (see Moodle) from the 2010 state database. The team may change the assigned dependent variable (instructor assigned a safe one just to give you a quick start). Data: Download the state database posted on Moodle.. Analysis: (a). Propose a reasonable model of the form Y = f(X1, X2, ... , Xk) using up to 12 predictors. (b) Use regression to investigate the hypothesized relationship. (c) Try deleting poor predictors until you feel that you have a parsimonious model, based on the t-values, p-values, standard error, and R2adj. (d) For the preferred model only, obtain a list of residuals and request residual tests and VIFs. (e) List the states with high leverage and/or unusual residuals. (f) Make a histogram and/or probability plot of the residuals. Are the residuals normal? (g) For the predictors that were retained, analyze the correlation matrix and/or VIFs. Is multicollinearity a problem? If so, what could be done? (h) If you had more time, what might you do? |
|||||
|
|||||