SAS, rst long homework, due 30 September, worth 25% of ...james/STAT579-F18/hw1-long.pdf · grade,...

3
SAS, first long homework, due 30 September, worth 25% of overall grade, all subproblems weighted equally Turn in a total of 1 zipped file only, which should have exactly 8 files in it. It would be easiest for us if your file is name X.zip, where X is your student number. Answers to questions should be in English in complete sentences. You should not say that the answer is in the output or in the following plot, and so forth. You can say things like, “based on Figure 3 (page 4), there appears to be a positive relationship between X and Y”. But don’t only say that the relationship between X and Y is shown in the plot without identifying in English what that relationship is. 1. In your solutions, write down the day and month of your birthday (you don’t need to include the year). Go to http://quakesearch.geonet.org.nz/ to download earthquake data. Download one month of earthquake data starting on your birthday in 2013. This should be one calendar month. For example, if your birthday is September 9th, download earthquakes in the range of September 9th, 2013 to October 8th, 2013. If your birthday is December 15th, you would use December 15th, 2013 to January 14th, 2014. Note that the date used by this website is in the order year-month-day Because there are different numbers of days in the month, the number of days will vary between 28 and 31. Since the number of earthquakes per day is also variable, the number of observations in each student’s dataset will probably be different (with the possible exception of two or more people having the same birthday). The website looks like this Don’t specify a region and leave the button labeled “map extent” as it is, which means that it will list seismic events detected within the map shown. From this website, you can narrow down the search by region with New Zealand (I used to live in the Canterbury region), or by specifying latitude and longitude ranges. To turn in your work, you will need to generate four files: (1) a Word or other word-processed document describing your answers, (2) a .sas file which runs and generates relevant answers, (3) a .log file, (4) a .csv file which has your NZ earthquake data. Please call your .sas file X.sas where X is your student number, and call your .csv file X.csv where X is your student number. Please DON’T print any of these files. Instead send them electronically, zipped together with answers to homework 2. There will be too many files for every one to send separate files of everything. Note that using SAS, it will be relatively straightforward for me to generate individualized answer keys for all homeworks reading in your .csv file and analyzing with my own SAS code. If I (or the grader) has questions about your answers, we should also be able to run your SAS code. 1

Transcript of SAS, rst long homework, due 30 September, worth 25% of ...james/STAT579-F18/hw1-long.pdf · grade,...

Page 1: SAS, rst long homework, due 30 September, worth 25% of ...james/STAT579-F18/hw1-long.pdf · grade, all subproblems weighted equally ... is September 9th, download earthquakes in the

SAS, first long homework, due 30 September, worth 25% of overallgrade, all subproblems weighted equally

Turn in a total of 1 zipped file only, which should have exactly 8 files in it. It would be easiest for us if yourfile is name X.zip, where X is your student number. Answers to questions should be in English in completesentences. You should not say that the answer is in the output or in the following plot, and so forth. Youcan say things like, “based on Figure 3 (page 4), there appears to be a positive relationship between X andY”. But don’t only say that the relationship between X and Y is shown in the plot without identifying inEnglish what that relationship is.

1. In your solutions, write down the day and month of your birthday (you don’t need to include the year).Go to http://quakesearch.geonet.org.nz/ to download earthquake data. Download one month of earthquakedata starting on your birthday in 2013. This should be one calendar month. For example, if your birthdayis September 9th, download earthquakes in the range of September 9th, 2013 to October 8th, 2013. If yourbirthday is December 15th, you would use December 15th, 2013 to January 14th, 2014. Note that the dateused by this website is in the order year-month-day Because there are different numbers of days in the month,the number of days will vary between 28 and 31. Since the number of earthquakes per day is also variable,the number of observations in each student’s dataset will probably be different (with the possible exceptionof two or more people having the same birthday).

The website looks like this

Don’t specify a region and leave the button labeled “map extent” as it is, which means that it will listseismic events detected within the map shown. From this website, you can narrow down the search by regionwith New Zealand (I used to live in the Canterbury region), or by specifying latitude and longitude ranges.

To turn in your work, you will need to generate four files: (1) a Word or other word-processed documentdescribing your answers, (2) a .sas file which runs and generates relevant answers, (3) a .log file, (4) a .csv

file which has your NZ earthquake data. Please call your .sas file X.sas where X is your student number,and call your .csv file X.csv where X is your student number. Please DON’T print any of these files. Insteadsend them electronically, zipped together with answers to homework 2. There will be too many files for everyone to send separate files of everything. Note that using SAS, it will be relatively straightforward for me togenerate individualized answer keys for all homeworks reading in your .csv file and analyzing with my ownSAS code. If I (or the grader) has questions about your answers, we should also be able to run your SAScode.

1

Page 2: SAS, rst long homework, due 30 September, worth 25% of ...james/STAT579-F18/hw1-long.pdf · grade, all subproblems weighted equally ... is September 9th, download earthquakes in the

Note that all questions only involve the variables origintime, longitude, latitude, magnitude (Richterscale), and depth (which is measured in kilometers), so you might want to initially subset your data on justthose variables to make it easier to work with. Once you have generated the .csv file, write SAS code toread in the data and answer the following questions. Make sure that your answers are based on earthquakesand not on quarry blasts.

1. How many detectable earthquakes were there in your data? Be sure that this corresponds to eventtype

being earthquake and not quarry blast.

2. Make a correlation matrix using proc corr of the variables. What is the relationship between magni-tude and depth in your data?

3. Is there a correlation between latitude and longitude for this data? How can you interpret a correlationbetween these variables?

4. Is there a correlation between latitude and magnitude for this data? If so, what might this mean forthe two main islands of New Zealand (North Island and South Island)?

5. Describe missingness in the data. For the entire dataset, how many observations had at least onemissing value for the variables of interest? What percentage of the data is missing (i.e., if each rowhas five variables and two values missing, this percentage would be 40%)?

6. Create a dataset that counts the number of earthquakes on each day and the largest magnitude of theearthquakes that occurred that day (e.g., for the earthquake data in class, there were 25 observationson 03/31/2011, but one of them was a quarry blast, so the answer would be 24 for that day and thelargest magnitude was 6.9356. This dataset should only have between 28 and 31 observations (one foreach day of the month). Make this data set sorted by date from earliest to most recent.

7. Make three plots which are time series of the earthquakes depths, magnitudes, and latitudes. (Here,you can plot the variable of interest against time.) Make the points connected by lines, as is typicallydone in time series plots.

8. Make a plot of the latitude (y-axis) against longitude (x-axis) of the earthquakes and describe the plot.

9. Make a plot of the magnitude (y-axis) against depth (x-axis) of the earthquakes and describe the plot.

10. Make a variable which is the time between successive earthquakes. Make a histogram of the timesbetween earthquakes and describe the distribution.

2. Recall the crime data in Freedman.txt. You will compare this data to more recent crime rates listedon Wikipedia at http://en.wikipedia.org/wiki/United States cities by crime rate.

For this problem, copy and paste the crime table from the Wikipedia site into a .txt file to be readinto SAS. Note tha tthe data should be tab-delimited and that numerical entries for large numbers (e.g.,population sizes) contain commas. You can deal with this either by using a comma format in SAS or byremoving commas from your .txt file. Either way is fine. For this problem, create the .sas file, .txt filefrom Wikipedia, .log file, and your commentary and answers to questions in a Word or other word-processesdocument. All files should be zipped together with your answers to problem 1. You can name your filesanything you want for problem 2.

1. Merge your Wikipedia data with Freedman.txt by city name. You will work only with the cities thatthese two data sets have in common. The resulting data set should have city, state, two variablesfor population (one from Freedman and one from Wikipedia), and variables for crime rate from bothFreedman.txt and Wikipedia. The only crime rate used from the Wikipedia data will be the violentcrime rate.

2. Describe the missingness in the data for the variables of interest.

2

Page 3: SAS, rst long homework, due 30 September, worth 25% of ...james/STAT579-F18/hw1-long.pdf · grade, all subproblems weighted equally ... is September 9th, download earthquakes in the

3. Create a variable which is the change in population size from 1975 to 2012. Make the values positivefor cities that have increased in population size

4. Have any cities decreased in population size? If so, which ones?

5. Create a variable which is the change in violent crime (assume Freedman’s crime rate is for violentcrime). Plot the change in population versus the change in crime rate. Is there a relationship betweenchange in crime rate versus change in population size?

6. Determine the ranks of the cities in terms of their crime rate for both Freedman and Wikipedia datafor cities that both data sets have in common. Plot the ranks against each other (rank on Wikipediaversus rank in Freedman). Do the ranks correlate well? Describe what you see.

7. For each state, determine the city with the highest crime rate and the city with the lowest crime rateusing the Wikipedia data, and print out a dataset where each state is in only one row, and the rowalso has the city with the highest crime rate, the city with the lowest crime rate, as well as the highand low crime rates. (Each row should have 5 variables.)

3