2016-2017 Individual Coursework
MAT013 Coursework
Deadline: May 4, 2017
Instructions
The outputs of this coursework will be:
- A written report in doc-, pdf- or html-format describing your code (SAS and R) and screenshots with comments to be handed in to Lauren Trundle.
- A file containing the required SAS code. Name this file STUDENTNUMBER-SAS-lastname (eg. 123456-SAS-Evans) and email it to Lauren Trundle with MAT013 as the subject. Note that all operations needed to complete the coursework should be included in the SAS code.
- A file containing the required R code. Name this file STUDENTNUMBER-R-lastname (eg. 123456-R-Evans) and email it to Lauren Trundle with MAT013 as the subject. Note that all operations needed to complete the coursework should be included in the R code.
Coursework
-
Consider the data set two_thirds.csv which contains two columns of numerical data. These represent 1st and 2nd guesses for the game: “Two Thirds of the Average”.
The rules of which are explained here: vknight.org/two_thirds_of_the_average_game/two_thirds_of_average_game_fill_in_sheet.pdf
Using R:
- Draw two histograms: one for each set of guesses.
- Identify the winning guess for each set of guesses.
Attempt to do this in a generic way (so that your code would work for a different data set).
[15]
-
A perfect number is a natural number that is equal to the sum of its divisors (excluding itself). For example \(1,2,4,7\) and \(14\) divide \(28\) and \(28=1+2+4+7+14\).
Write code in SAS that allows one to write to a csv file a data set with all natural numbers less than a given parameter \(N\) as well as a boolean variable indicating if the number is perfect or not. For example, for \(N=6\) the csv file would contain the following:
1, False 2, False 3, False 4, False 5, False 6, True
[30]
-
Using R, create a function which has two arguments and returns the product of two matrices without using the operator
%*%
. To compute the product, use the nestedfor
cycles and functionsnrow(..)
,ncol(..)
. Generate 2 random matrices and verify the output of the function by comparing with the operator%*%
.[10]
-
Consider the data set jokes.csv which contains jokes that have been ranked for the Edinburgh Fringe Festival for the years 2009-2015.
Using R:
- Identify the effect of joke length on the performance of a joke.
- Identify if authors who have repeated entries seem to do better than authors who do not.
(Note that this question is not asking for sophisticated sentiment analysis.)
[25]
-
Suppose that we want to compute the integral
where \(p(x)\) is the density of standard normal distribution. Consider the sum
where \(z_i,i=1,...,N,\) are independent identically distributed random variables with the standard normal distribution. By the Central Limit Theorem, we have
Thus, \(S_N\) is an estimator of \(I\).
Consider the function \(f(x)=\cos(x)/(1+x^2)\). Using R, write a function with argument \(N\) which returns \(S_N\) . Write a file with 50 evaluations of this function for \(N=1000\). What can you say statistically about \(I\)?
[20]