MAT013 Coursework

Deadline: May 4, 2018

Instructions

The outputs of this coursework will be:

  • A written report in doc-, pdf- or html-format describing your code (SAS and R) and screenshots with comments to be handed in to Andrey Pepelyshev.
  • A file containing the required SAS code. Name this file STUDENTNUMBER-SAS-lastname (eg. 123456-SAS-Evans) and email it to Andrey Pepelyshev with MAT013 as the subject. Note that all operations needed to complete the coursework should be included in the SAS code.
  • A file containing the required R code. Name this file STUDENTNUMBER-R-lastname (eg. 123456-R-Evans) and email it to Andrey Pepelyshev with MAT013 as the subject. Note that all operations needed to complete the coursework should be included in the R code.

Coursework

  1. Consider the data set two_thirds.csv which contains two columns of numerical data. These represent 1st and 2nd guesses for the game: “Two Thirds of the Average”.

    The rules of which are explained here: vknight.org/two_thirds_of_the_average_game/two_thirds_of_average_game_fill_in_sheet.pdf

    Using R:

    1. Draw two histograms: one for each set of guesses.
    2. Identify the winning guess for each set of guesses.

    Attempt to do this in a generic way (so that your code would work for a different data set).

    [15]

  2. A perfect number is a natural number that is equal to the sum of its divisors (excluding itself). For example \(1,2,4,7\) and \(14\) divide \(28\) and \(28=1+2+4+7+14\).

    Write code in SAS that allows one to write to a csv file a data set with all natural numbers less than a given parameter \(N\) as well as a boolean variable indicating if the number is perfect or not. For example, for \(N=6\) the csv file would contain the following:

     1, False
     2, False
     3, False
     4, False
     5, False
     6, True
    

    [30]

  3. Using R:

    Write a function that will return the \(n\)th Fibonacci number, \(F(n)\).

    Modify the function so that it returns the \(n\)th number of the sequence defined by:

    Where \(a,b,\alpha\) and \(\beta\) are input parameters.

    Adapt your function so that it will write all numbers of the form \(K(n)\) less than some number \(k\) to a csv file. The name of the csv file must not be an input parameter to the function but include the parameters \(a,b,\alpha\) and \(\beta\) as well as the date on which the code was run. For example: general_fib_a=2_b=3_alpha=10_beta=2_2018-04-24.csv.

    [10]

  4. Consider the data set jokes.csv which contains jokes that have been ranked for the Edinburgh Fringe Festival for the years 2009-2015.

    Using R:

    1. Identify the effect of joke length on the performance of a joke.
    2. Identify if authors who have repeated entries seem to do better than authors who do not.

    (Note that this question is not asking for sophisticated sentiment analysis.)

    [25]

  5. Suppose that we want to compute the integral

    where \(p(x)\) is the density of the exponential distribution with mean 1. Consider the sum

    where \(z_i,i=1,...,N,\) are independent identically distributed random variables with the exponential distribution. By the Central Limit Theorem, we have

    Thus, \(S_N\) is an estimator of \(I\).

    Consider the function \(f(x)=\cos(x^2)/(1+x)\). Using R, write a function GetSN with argument \(N\) which returns \(S_N\) . Write a file with 50 evaluations of the function GetSN for \(N=1000\). What can you say statistically about \(I\)?

    [20]