- The MATH3714 and MATH5714M modules are assessed by an examination (80%) and a practical (20%). This is the practical, worth 20% of your final module mark.
- You must hand in your solution via Gradescope by Thursday, 5th December 2023, 5pm.
- Reports must be typeset (not handwritten) and should be no more than 6 pages in length (8 pages for MATH5714M).
- Within reason you may talk to your friends about this piece of work, but you should not send R code (or output) to each other. Your report must be your own work.
The Met Office historic station dataset contains long-term weather measurements from a network of weather stations across the UK. The full dataset includes records from 37 stations, with some series extending more than 100 years – the oldest stations (Oxford and Armagh) began recording in 1853. Each station is identified by its name and precise geographical coordinates.
The dataset contains five key monthly measurements:
- Mean daily maximum temperature (
) - Mean daily minimum temperature (
) - Days of air frost (
) - Total rainfall (
) - Total sunshine duration (
For the practical we will consider a subset of this dataset. Please download the practical data from here:
This practical is deliberately open-ended. There is no single right or wrong answer to this practical. Sometimes you need to chose your own approach and justify your decisions.
Include relevant R code in your report. The reader should be able to replicate your analysis and get the same results.
Use the page limit to guide you in deciding how much detail to include.
Only include code, plots and output that are relevant to your discussion.
Where multiple plots demonstrate similar points, include only the most illustrative example.
Task 1 (15 marks)
Fit a linear model relating average maximum temperature (tmax
) to year
Create a scatter plot showing the data and fitted line. Test whether there is a
significant relationship between year and maximum temperature at the 5%
significance level. State your conclusion in context.
Marking criteria:
- Correct model fitting
- Appropriate scatter plot
- Correct hypothesis test
- Clear conclusion
- Presentation of results
Task 2 (25 marks)
The data shows strong seasonal variation. Explain how this affects the results from task 1. Fit an improved model for maximum temperature, using the year and month as inputs. Using plots and numerical summaries, demonstrate that the new model is an improvement over the original model.
Marking criteria:
- Brief explanation of how seasonality affects task 1 results
- Correct model specification with month as a factor
- Appropriate model diagnostics
- Clear explanation of model improvements
- Presentation of results
Task 3 (20 marks)
Using the weather station as another categorical variable, fit a model for maximum temperature which uses the year, month and station as inputs. Discuss whether there is any evidence that the temperature trend varies between stations. Discuss your results in context.
Marking criteria:
- Correct model specification including station effects
- Appropriate analysis and discuss
- Presentation of results
Task 4 (20 marks)
Modify the model from task 2 (using the year and month as inputs) to include a quadratic term for the year. Explain why this might be a good idea. Using plots and numerical summaries, discuss whether the quadratic term improves the model fit.
Marking criteria:
- Correct model specification with clear justification
- Appropriate model diagnostics
- Presentation of results
Task 5 (MATH5714M only, 10 marks)
Use kernel density estimation to estimate the distribution of the residuals for the model from task 2. Based on your results, discuss whether the residuals are normally distributed.
Marking criteria:
- Correct use of kernel density estimation
- Appropriate choice of bandwidth
- Clear discussion of normality
- Presentation of results
Task 6 (MATH5714M only, 10 marks)
The column af
in the dataset records the number of days with air frost. Using
either the Nadaraya-Watson kernel regression or local polynomial regression,
fit a model for af
as a function of tmax
Create a plot showing both the data and the fitted regression curve.
Justify your choice of regression method.
Discuss your results in context.
Marking criteria:
- Correct model fitting
- Appropriate choice of regression bandwidth
- Clear justification of regression method used
- Appropriate analysis and discussion
- Presentation of results