Excel Project

Individual Project Using Excel for Hypothesis

Testing

Due date: December 20th, 2023 by 11:59pm

Electronic Submission Only

Instructions

1) You should submit your excel file along with the answers for full credit

2) Do not email me your work

3) Submit both your excel file and word file with answers to the folder named “Excel

Project Submission” under Assignment tab in Classes

4) If excel file is not submitted, only half credit will be given

5) No late work is accepted

Scenario #1

Please go to the website and scroll down to the

bottom, “items of interest.” Enter your birth year and choose top 1000 popularity on the

“Population Names by Birth Year” column.

A. Guys should check male name column and girls should check female name column.

Find out the most popular name and calculate the proportion of it out of total.

Conduct 95% confidence interval for the proportion of people sharing that name in

your cohort.

B. Now, move to the “Popularity of a Name” column. According to the name you chose

from A, determine whether there is sufficient evidence to conclude that the name (you

chose) is more popular in the year of 2000 (if not working, try 2001, 2002, or 2003,

etc.) than it was in your cohort. Use the 5% level of significance. (This question

needs all five steps we talked about for hypothesis testing including null/alternate

hypothesis)

Note: For this question, use excel for all the numerical work and clearly state your answers.

Scenario #2

Please go to the website

popproj.html and download Table 1. Projected Population by Single Year of Age, Sex, Race,

and Hispanic Origin for the United States: 2016 to 2060 in excel file. Among all the data

inputs, look only for when all sex, origin, and race are “0.” Then, look for data inputs in the

column of POP_1 and POP_2.

A. Conduct 95% confidence interval for each POP_1 and POP_2 using mean and

standard deviation calculated using excel.

B. With the data inputs on file, is there a sufficient evidence that the population mean in

POP_1 is lower than the one in POP_2 (sex, origin, and race are all “0”) with 4%

level of significance? (Show all five steps)

C. How about POP_3 and POP_4? Is there a sufficient evidence that the population

mean in POP3 is greater than the one in POP_4 (sex, origin, and race are all “0”) with

5% level of significance? (Show all five steps)

Scenario #3

Please go to the website

index/ and check “USCRN” on the datasets from 1895 to 2013. Use those data sets in the

excel. You can click on CSV and download the data sets for both USCRN and CLIMDIV.

A. Conduct 90% confidence interval of temperature in USCRN datasets from 1895 to

2013 time period.

B. Now, check “CLIMDIV” on the datasets for the same time period. Conduct 93%

confidence interval of temperature in CLIMDIV.

C. Now, I am told that the average temperature reported by CLIMDIV is higher than the

one reported by USCRN. Do the dataset provide sufficient evidence to support this

with 4% level of significance?

Scenario #4

Please go to the website and, on the

“View Selected Records” tab, set the conditions as “Daily Records” for timescale, “Highest

Max Temperature” for parameter, “March 14, 2018” as the Starting Date and “April 12, 2018”

as the End Date in the Date Range, “All” for record type, “Country” for location category,

and “United States” for country, and then click on “Show Records.”

A. Find the station name with the median value of the “record.” (Look for the “record

column”)

B. Construct 95% confidence interval with the “record.”

C. How many stations are there outside of the range within 2 standard deviations? If the

same station shows more than once, count it as one.