MAT013 Coursework

Deadline: May 4, 2017

Instructions

The outputs of this coursework will be:

  • A written report in doc-, pdf- or html-format describing your code (SAS and R) and screenshots with comments to be handed in to Lauren Trundle.
  • A file containing the required SAS code. Name this file STUDENTNUMBER-SAS-lastname (eg. 123456-SAS-Evans) and email it to Lauren Trundle with MAT013 as the subject. Note that all operations needed to complete the coursework should be included in the SAS code.
  • A file containing the required R code. Name this file STUDENTNUMBER-R-lastname (eg. 123456-R-Evans) and email it to Lauren Trundle with MAT013 as the subject. Note that all operations needed to complete the coursework should be included in the R code.

Coursework

  1. Consider the data set two_thirds.csv which contains two columns of numerical data. These represent 1st and 2nd guesses for the game: “Two Thirds of the Average”.

    The rules of which are explained here: vknight.org/two_thirds_of_the_average_game/two_thirds_of_average_game_fill_in_sheet.pdf

    Using R:

    1. Draw two histograms: one for each set of guesses.
    2. Identify the winning guess for each set of guesses.

    Attempt to do this in a generic way (so that your code would work for a different data set).

    [15]

  2. A perfect number is a natural number that is equal to the sum of its divisors (excluding itself). For example \(1,2,4,7\) and \(14\) divide \(28\) and \(28=1+2+4+7+14\).

    Write code in SAS that allows one to write to a csv file a data set with all natural numbers less than a given parameter \(N\) as well as a boolean variable indicating if the number is perfect or not. For example, for \(N=6\) the csv file would contain the following:

     1, False
     2, False
     3, False
     4, False
     5, False
     6, True
    

    [30]

  3. Using R, create a function which has two arguments and returns the product of two matrices without using the operator %*%. To compute the product, use the nested for cycles and functions nrow(..), ncol(..). Generate 2 random matrices and verify the output of the function by comparing with the operator %*%.

    [10]

  4. Consider the data set jokes.csv which contains jokes that have been ranked for the Edinburgh Fringe Festival for the years 2009-2015.

    Using R:

    1. Identify the effect of joke length on the performance of a joke.
    2. Identify if authors who have repeated entries seem to do better than authors who do not.

    (Note that this question is not asking for sophisticated sentiment analysis.)

    [25]

  5. Suppose that we want to compute the integral

    where \(p(x)\) is the density of standard normal distribution. Consider the sum

    where \(z_i,i=1,...,N,\) are independent identically distributed random variables with the standard normal distribution. By the Central Limit Theorem, we have

    Thus, \(S_N\) is an estimator of \(I\).

    Consider the function \(f(x)=\cos(x)/(1+x^2)\). Using R, write a function with argument \(N\) which returns \(S_N\) . Write a file with 50 evaluations of this function for \(N=1000\). What can you say statistically about \(I\)?

    [20]