MAT013,MA4513 Class Test 2021

Instructions:

Once you have finished the class test:

  1. Call the file for the R code: STUDENTNUMBER-lastname (eg. 123456-Evans.R).
  2. Email the file to Andrey Pepelyshev with ‘MAT013-STUDENTNUMBER-lastname’ as the subject.
  3. The duration of the class test is one hour.
  4. The email with the R code must be sent before 12:20 on May 14, 2021 [plus the approved extra time].

The class test contains 4 questions.
Each question contains a few tasks.
In all questions, d[k] is the k-th digit of your student number. For example, d[2]=9 for the student number 1934867.
Write the description of figures as comments in your R program.

Questions for the class test:

Question 1

  1. Write code that will obtain \(70\) random points \((x,y)\) where \(x\) is uniformly sampled between 0 and d[5]+1, and \(y\) is sampled from the exponential distribution with parameter d[6]+1, where d[k] is the k-th digit of your student number. Export points to the csv-file with name "rpoints.csv". Depict and describe the scatterplot of these points. Write the description of the figure as comments in your R program.

    [10]

  2. Write a function that will return the \(n\)th Fibonacci number, \(F(n)\).
    Modify the function so that it returns the \(n\)th number of the sequence defined by:

    where \(\alpha\) and \(\beta\) are input parameters, d[k] is the k-th digit of your student number.
    Export 20 numbers of the form \(K(n)\) with \(\alpha=2\) and \(\beta=1\) to the csv-file with name "fib_numbers.csv".

    [10]

Question 2

    The Iris dataset iris.csv was used in R.A. Fisher's classic 1936 paper, "The Use of Multiple Measurements in Taxonomic Problems". It includes three iris species with 50 samples each as well as some properties about each flower. The columns in this dataset are: Id, SepalLength, SepalWidth, PetalLength, PetalWidth and Species.

    1. Create two new variables SepalArea as SepalLength*SepalWidth/2 and PetalArea as PetalLength*PetalWidth/2. Depict and describe the scatterplot of the two new variables SepalArea and PetalArea using different colors for each specie.

      [10]

    2. Find the three flowers which are closest to the flower in row d[3]+d[4]+d[5]+d[6] using the euclidean distance for variables SepalLength, SepalWidth, PetalLength and PetalWidth, where d[k] is the k-th digit of your student number. Print the row d[3]+d[4]+d[5]+d[6] and three rows corresponding to the most similar flowers from the Iris dataset.

      [10]

    Question 3

    Download the file temp2021-3.csv to your PC. This file contains monthly time series of anomaly temperature for different parts of Earth. The first column is the year and the second column is the month of observations. Other columns correspond to some parts of Earth. The first row contains the column names.

    1. Import the dataset temp2021-3.csv.
      Depict and describe the time series from the column d[3]+d[4]+3 versus time with the title containing the column index and the column name, where d[k] is the k-th digit of your student number.
      Write the description of the figure as comments in your R program.

      [10]

    2. Compute the correlation matrix between columns: d[3]+d[4]+3, d[3]+d[4]+4, d[3]+d[4]+5 and d[3]+d[4]+6.
      Identify and write two of the most correlated columns amongst the four columns.
      Depict and describe the scatterplot of these two most correlated columns with showing column names as axes names.
      Estimate and describe the linear model between these two most correlated columns.
      Write the descriptions as comments in your R program.

      [30]

    Question 4

    Download the file champ-16-17.csv. This file contains the football games in the Championship league in the season 2016/2017. Values in the file are separated by comma. The meaning of column names is given here. Each row contains a detailed information about a football game including the names of the HomeTeam and AwayTeam and goals scored.

    1. Let the variable MyTeam be the name of HomeTeam from the row d[4]+d[5]+d[6], where d[k] is the k-th digit of your student number.
      Print the variable MyTeam.
      Compute the average number of goals of the MyTeam when the MyTeam played at home.
      Compute the number of wins, draws and loses of the MyTeam when the MyTeam played at home.
      Write these computed numbers as comments in your R program.

      [20]