Maths KS3-4 Statistics 1
KS3-4 National Curriculum KS3 Statutory requirements: Pupils should be taught to: Statistics describe, interpret and compare observed distributions of a single variable through: appropriate graphical representation involving discrete, continuous and grouped data; and appropriate measures of central tendency (mean, mode, median) and spread (range, consideration of outliers) construct and interpret appropriate tables, charts, and diagrams, including frequency tables, bar charts, pie charts, and pictograms for categorical data, and vertical line (or bar) charts for ungrouped and grouped numerical data describe simple mathematical relationships between two variables (bivariate data) in observational and experimental contexts and illustrate using scatter graphs. draw and interpret box plots regarding quartiles and IQR create cumulative frequency diagrams and interpret them using measures of average and spread KS4 Statutory requirements: In addition to consolidating subject content from key stage 3, pupils should be taught to: infer properties of populations or distributions from a sample, whilst knowing the limitations of sampling interpret and construct tables and line graphs for time series data {construct and interpret diagrams for grouped discrete data and continuous data, i.e. histograms with equal and unequal class intervals and cumulative frequency graphs, and know their appropriate use interpret, analyse and compare the distributions of data sets from univariate empirical distributions through: appropriate graphical representation involving discrete, continuous and grouped data, {including box plots}; appropriate measures of central tendency (including modal class) and spread {including quartiles and inter-quartile range} apply statistics to describe a population use and interpret scatter graphs of bivariate data; recognise correlation and know that it does not indicate causation; draw estimated lines of best fit; make predictions; interpolate and extrapolate apparent trends whilst knowing the dangers of so doing 2
Introductory Week The weights (kg) of 15 children: 37, 42, 31, 35, 48, 29, 50, 36, 44, 28, 63, 35, 41, 52, 43 Lightest child: 28 kg Heaviest child: 63 kg Median: 41 kg Lower quartile: ¼ (n+1) th piece of (ordered) data* 15 items of data means n = 15 15 + 1 = 16 ¼ (16) = 4 th item. 28, 29, 31, 35, 35, 36, 37, 41, 42, 43, 44, 48, 50, 52, 63 35 is the 4 th item. *can also use ¼(n) and round up. Upper quartile: ¾ (n+1) th piece of (ordered) data* 15 items of data means n = 15 15 + 1 = 16 ¾ (16) = 12 th item. 28, 29, 31, 35, 35, 36, 37, 41, 42, 43, 44, 48, 50, 52, 63 48 is the 12 th item. *can also use ¾(n) and round up. Sometimes questions use Q0, Q1 etc: Q0 = minimum, Q1 = lower quartile, Q2 = median, Q3 = upper quartile, Q4 = maximum Interquartile range, IQR = Q3 - Q1 48 35 = 13 = IQR 3
Outliers: Items of data in the set smaller than Q1 (1.5 x IQR) or larger than Q 2 + (1.5 x IQR) are outliers. Task 1: a) What is the median number of petals? b) What is the interquartile range? c) What percentage of the daisies in the sample has fewer than 30 petals? Task 2: Work out the median and upper & lower quartiles for these data sets. Then draw box plots for all of them. a) Percentage achieved in a Statistics test: 78 82 74 45 68 75 93 54 61 70 48 66 62 51 77 b) Minutes per day spent playing computer games: 40 26 60 64 33 39 28 46 47 51 55 c) Time taken (in minutes) to solve a crossword puzzle: 12 24 21 16 8 9 3 31 18 27 35 41 26 12 17 6 5 19 29 d) Weight in kg of year 10 boys: 47 51 63 39 42 57 36 37 49 32 60 54 56 45 52 e) Height in cm of a group of year 10 girls: 153 147 160 146 162 158 159 149 152 150 163 4
Box Plots Terms & Comparisons 5
Task 1: 23 boys and 11 girls were given a maths test. Their scores are listed below: Boys: 7, 13, 15, 19, 35, 35, 37, 43, 44, 44, 45, 46, 47, 47, 49, 51, 52, 55, 55, 56, 78, 82, 91 Girls: 7, 18, 23, 47, 58, 63, 68, 72, 72, 75, 87 Use box plots to compare the differences between the boys and girls scores and comment on the differences. Comment on skew and distribution. Task 2: The data below shows the IQ of 11 Maths and 11 Geography university graduates. MATHS: 98, 103, 105, 99, 110, 94, 98, 100, 120, 117, 113 GEOGRAPHY: 93, 99, 110, 111, 95, 97, 90, 99, 92, 102, 103 a) Calculate the quartiles, median and interquartile range for each subject b) Use your data to draw box plots for each subject c) Comment on at least two aspects of your box plots to compare the IQs of the graduates. Task 3: The data below shows the price of petrol (to the nearest pence) at different locations for Shell and BP. SHELL: 130, 129, 132, 133, 136, 130, 129, 130, 130, 131, 133, 134, 140 BP: 131, 133, 133, 133, 132, 134, 138, 130, 133, 134, 135, 136, 137, 132, 135 a) Calculate the quartiles, median and interquartile range for each company b) Use your data to draw box plots for each company c) Comment on at least two aspects of your box plots to compare the price of petrol in each company. 6
Analysing frequency data & Cumulative Frequency Finding the mean from a frequency table: Number of goals scored by a team over 10 games. Number of goals (x) Frequency (f) fx 0 2 0 x 2 = 0 1 2 1 x 2 = 2 2 5 2 x 5 = 10 3 1 3 x 1 = 3 Total: 10 15 Total number of goals = 15. Total number of games = 10. 15 10 = 1.5 = mean number of goals scored per game. Finding mean, mode and median in a grouped frequency table: The table below shows the number of minutes late some trains left a train station. Number of minutes late (m) Frequency (f) 0 < m 4 11 4 < m 8 13 8 < m 12 7 12 < m 16 9 16 < m 20 4 Modal number of minutes late is 4 < m 8 because that happened 13 times. Median number of minutes late: add up f, (11 + 13 + 7 + 9 + 4) = 44. Median = the (n + 1)/2 value. 45/2 = 22.5. Find the 22.5 th value. Add up the frequencies till the 22.5 th value is found: Number of minutes late (m) Cumulative frequency 0 < m 4 11 4 < m 8 11 + 13 = 24 22.5 th train is in 4 < m 8 so median train was late by that much. Estimate mean number of minutes by which a train was late: Can t do f(x) on grouped classes. Have to find a single figure for f. Find midpoints. Midpoint of 0 < m 4 = 2. Midpoint of 4 < m 8 = 6, etc. Number of minutes late (m) Frequency (f) Midpoint (x) Total minutes late (fx) 0 < m 4 11 2 11 x 2 = 22 4 < m 8 13 6 13 x 6 = 78 8 < m 12 7 10 7 x 10 = 70 12 < m 16 9 14 9 x 14 = 126 16 < m 20 4 18 4 x 18 = 72 Sum of f = 44. Sum of fx = 368. Estimate of mean = 368 44 = 8.36 (2 d.p.) 7
Frequency diagrams from grouped data are called frequency polygons and they re plotted at the midpoints of the groups: Frequency Minutes late Unlike frequency polygons, cumulative frequency diagrams are plotted against the upper bounds of the groups. This table shows the lengths of 40 babies at birth: Length Frequency Cumulative frequency 30 < l 35 4 4 35 < l 40 10 14 40 < l 45 11 25 45 < l 50 12 37 50 < l 55 3 40 Upper class boundaries are 35, 40, 45, 50, and 55. Cumulative frequency diagrams can be used to read off the quartiles and the median. Here, Q1 is the 10 th value, the median is the 20 th value, and Q3 is the 30 th value. 8
Task 1: Find the midpoint of each of these groups. a) b) c) Task 2: This table shows information about the ages of 60 people. a) Draw a frequency polygon and a cumulative frequency diagram from the data. b) Find the modal class and median value and estimate the mean. c) Use the cumulative frequency diagram to estimate the quartiles and IQR. d) Use your answers to c) to draw a box plot for this data. 9
Task 3: a) Draw a frequency polygon and a cumulative frequency diagram from the data. b) Find the modal class and median value and estimate the mean. c) Use the cumulative frequency diagram to estimate the quartiles and IQR. d) Use your answers to c) to draw a box plot for this data. Task 4: 30 students ran a cross-country race. Each student s time was recorded. The table shows information about these times. Time (t minutes) Frequency 10 t < 14 2 14 t < 18 5 18 t < 22 12 22 t < 26 8 26 t < 30 3 a) Draw a frequency polygon and a cumulative frequency diagram from the data. b) Find the modal class and median value and estimate the mean. c) Use the cumulative frequency diagram to estimate the quartiles and IQR. d) Use your answers to c) to draw a box plot for this data. 10
Time Series and Moving Averages This table shows the number of visitors to a seaside town: Quarter 1 2 3 4 1 2 3 4 1 2 Year 2005 2005 2005 2005 2006 2006 2006 2006 2007 2007 Visitors (1000s) 14 24 9 8 12 22 11 7 11 20 If this information is plotted on a graph, it looks like this: This shows that there is a wide variation in the number of visitors depending on the season. There are far less in the autumn and winter than spring and summer. If we want to see a trend in the number of visitors, we calculate a 4-point moving average. Find the average number of visitors in the four quarters of 2005: (14 + 24 + 9 + 8) 4 = 13.75 Find the average number of visitors in the last three quarters of 2005 and first quarter of 2006: (24 + 9 + 8 +12) 4 = 13.25 Find the average in the last two quarters of 2005 and the first two quarters of 2006: (9 + 8 + 12 + 22) 4 = 12.75, etc. The last average we can find is for the last two quarters of 2006 and the first two quarters of 2007. We plot the moving averages on a graph, making sure that each average is plotted at the centre of the four quarters it covers: Slight downward trend in visitors. 11
1. Time always goes along the x-axis. 2. If the times are given as a period (e.g. 3 months) plot the point in the middle. 3. Points should be joined in order using a ruler. Task 1: This table shows the number of computer games sold in a supermarket each month from January to June Jan Feb Mar Apr May Jun 147 161 238 135 167 250. a) Work out the three month moving averages for this information. b) Plot the data on a time series. Task 2: A shop sells DVD players. The table shows the number of DVD players sold in every three-month period from January 2003 to June 2004. Year Months Number of DVD players sold 2003 Jan Mar 58 Apr Jun 64 Jul Sep 86 Oct Dec 104 2004 Jan Mar 65 Apr Jun 70 a) Calculate the four point moving averages for this data. b) Plot it on a time series. c) Comment on the trend you observe in the moving averages. 12
Task 3: This table records how many detentions are given out per day by a teacher. Draw a time series graph for this data with a line for the moving average. Comment on the trend you observe. Week 1 Week 2 Week 3 Day Mon Tue Wed Thu Fri Mon Tue Wed Thu Fri Mon Tue Wed Thu Fri Deten tions 4 8 12 7 18 3 6 10 7 16 3 6 7 5 13 Task 4: The data below gives information about the average number of children born per woman. a) Plot the data on a time-series graph with moving averages. b) Write down three conclusions about the information shown on your graph. 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 China 5.81 5.68 5.47 5.87 5.51 3.78 2.63 2.64 2.34 1.87 1.74 1.67 1.6 Ireland 3.48 3.44 3.86 3.95 3.81 3.57 3 2.45 2 1.9 1.95 2.03 2.11 UK 2.08 2.33 2.69 2.76 2.29 1.83 1.73 1.81 1.81 1.76 1.68 1.73 1.86 Task 5: The Second World War was between 1939 and 1945. The table below shows information about the life expectancy in Germany, UK and USA between 1935 and 1946. 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 Germany 61.5 61.0 62.1 62.2 62.6 62.2 62.2 59.6 57.8 52.7 46.1 60.5 UK 62.1 61.9 61.9 63.3 63.7 60.9 61.4 63.9 64.0 64.9 65.9 66.5 USA 60.9 60.4 61.1 62.4 63.1 63.2 63.8 64.6 64.3 65.1 65.6 66.3 a) Plot the data on a time-series graph with moving averages. b) Write down three conclusions about the information shown on your graph. 13
Histograms This table shows the ages of 25 children on a school bus: Age Frequency 5 10 6 11 15 15 16 17 4 > 17 0 To draw a histogram you need the class boundaries. They are 5, 11, 16, and 18. The class widths are therefore 11 5 = 6, 16 11 = 5, and 18 16 = 2. The area of a histogram represents the frequency. So the area of the bars has to be 16, 5 and 4. Since it s the area and not the height that represents the frequency, instead of frequency on the y-axis, it has to say frequency density. Area = frequency = frequency density x class width. So frequency density = frequency class width. Task 1: 14
Task 1 continued: a) Finish the frequency table using the data. b) Plot a histogram using the table. Task 2: Task 3: Draw a histogram from this data. Height (x cm) Frequency 100 < x 120 20 120 < x 140 25 140 < x 150 25 150 < x 160 20 160 < x 200 10 15
Task 4: Draw a table with hours watched, frequency, and frequency density by reading this histogram. Task 5: Draw a table with height, frequency, and frequency density from this. Height (h) in cm 16
Types of Sampling Random sampling means every member of a population has an equal chance of being selected. You can do this using a table of random numbers or random number generators after somehow ordering the population. Example: Of 1000 pupils in a school, 50 are to be questioned about their favourite pop group. How should the pupils be chosen? Solution: The pupils should be numbered 000, 001, 002, 003 999. You can use a calculator to generate random numbers. Three-digit random numbers can then be used to choose 50 pupils. Stratified sampling is made up of fractions of the population. The sample size for the fraction is proportional to the size of the fraction in the population. Example: Go back to the survey in which 50 pupils in a school of a 1000 pupils were asked what music they liked. To make sure the survey is accurate you will need a range of pupils across the year groups - different fractions. Pupils in year 7 may like different music to those in year 11. Solution: To work out the sample size for year 7: There are 180 students in year 7 - this is the size of the fraction. There are 1000 pupils in the school - this is the size of the whole population. You want answers from 50 people in total - this is the size of the whole sample. So we want 50/1000 of the population, so for year 7 we want 50/1000 of 180. (50/1000) x 180 = 9. Year Number of pupils Number of pupils in sample 7 180 (50/1000) x 180 = 9 8 200 (50/1000) x 200 = 10 9 240 (50/1000) x 240 = 12 10 220 (50/1000) x 220 = 11 11 160 (50/1000) x 160 = 8 Systematic sampling: A regular pattern is used to choose the sample. Every item in the population is listed, a starting point is randomly chosen and then every nth item is selected. For example, a mixed (male and female) class could be listed in alphabetical order and every sixth student selected, starting with the 3rd student. This is a simpler and quicker method to select a (random) sample, but may be unrepresentative if a pattern exists in the list. For example, every sixth student in the above sample may be a girl. 17
Task 1: The table below shows the number of pupils in years 11 13. Year 11 12 13 Number 198 120 101 of Pupils How many pupils from each year should be selected to give a stratified sample of 10%? Task 2: Juliet lives on a housing estate. The table below gives the number of people in each age group who live on her estate. Age in years No. of people 0-19 20-39 40-59 60-79 80+ 182 88 110 72 15 For her Geography project she chooses a stratified sample of 10% which reflects these age groups. Calculate the number of people she should include from each group. Give each of your answers to the nearest whole number. Task 3: There are 180 pupils on the register at a particular primary school. The table below shows the number of pupils in each year. Year 1 2 3 4 5 6 Number 34 33 29 28 35 21 How many students need to be selected from each year to give a stratified sample of 36 pupils? Task 4: Twelve boxes of books are delivered to a school. Altogether there are 300 books delivered. From these books, 192 books are paper-back, whilst the remaining are hard-back. How many books would I need to select from each category for a 5% stratified sample? Task 5: The table below shows the number of drink cartons that are filled in one hour by a factory: Flavour Apple Orange Pineapple Tropical Strawberry No. of Cartons 135 300 175 190 200 How many cartons need to be selected from each flavour to give: a) A 10% stratified sample? b) A stratified sample of 150 cartons? 18
Questionnaires Questionnaires can include: yes/no answers tick boxes numbered responses word responses questions which require a sentence to be written Whichever style of questions you use, it is important that they: Are easy to understand Cover every possible answer Are unbiased (they do not lead respondents to give a particular answer) Are unambiguous (they have a clear meaning). Example: Problems: 1. No units are given for distance 0 to 2 miles? Kilometres? 2. Boxes overlap. Exactly 2 could go in 0 to 2 and 2 to 3. 3. All possible answers are not covered there is no box for more than 6. 19
Task 1: Look at these questionnaire questions. Explain what is wrong with each of them and rewrite them. a) How old are you (in years)? 20 or younger 20 to 30 30 to 40 40 to 50 50 or older b) How much do you usually spend on biscuits each week? a lot a little nothing don't know c) How old are you? less than 18 years old more than 18 years old d) How much money do you spend on magazines? 1 2 3 e) Do you agree that pizza is better than pasta? Yes/No f) What do you think of the changes in the canteen? Excellent Very good Good 20
Exam Questions 1. (a) 3 adults can help with walking on Saturday. Is this enough? Show your working. (b) A group of people go sailing in the ratio - number of adults : number of children = 1 : 2 What fraction of the group are adults? (c) On Sunday all the children do the activity they choose. The ratios for each activity are shown in the table. Activity Adult : children ratio Archery 1 : 3 Walking 1 : 5 Sailing 1 : 2 Work out the total number of adults needed for Sunday. 21
2. 3. 4. Complete the pie chart. 22
5. (a) One of the eggs has a length of 52 mm. What is its width? (b) All the points except one show strong correlation. Which point doesn t? (c) Pick the correct descriptions of the correlation of the 3 scatter graphs. i. Strong positive correlation ii. Weak positive correlation iii. Little or no correlation iv. Weak negative correlation v. Strong negative correlation 23
6. a) Can you use this table to calculate the exact median? Yes/No b) Can you use this table to work out the weight of the heaviest rabbit? Yes/No c) Calculate an estimate of the mean weight of the 200 rabbits. d) Here are the weights in grams of 10 more rabbits: 76.2, 89.4, 93.1, 99.7, 86.8, 79.2, 82.6, 91.9, 88.0, 95.4 Complete this table with tallies for those 10 rabbits and frequencies for all 210 rabbits. e) Which two of these four diagrams would be best to represent this data? i. stem-and-leaf ii. frequency polygon iii. scatter graph iv. histogram 24
7. (a) (b) Two of the sacks are chosen. The first sack has 17 more potatoes than the second sack. What is the greatest possible number of potatoes in the first sack? 8. 25
9. 80 men were also timed solving the puzzle. Median Interquartile range 16 minutes 17 minutes a) Who was faster on average, men or women? b) Who was more consistent, men or women? 26
10. 11. a) Plot the scatter graph. b) Draw a line of best fit. c) Use your line to predict the fuel used to travel 110km. 27
12. 30,000 magazines were sold in wales. How many were sold in total? 13. Amina asks 50 people, What is your favourite pet? Choose from cat, dog, rabbit or other. a) Which two words from those below describe the type of data she collects from each person? qualitative, continuous, primary, secondary b) Which two diagrams from those below could she use to represent the data? scatter graph, pie chart, bar chart, stem-and-leaf 14. In a survey people chose A, B, C, or D. 150 people chose B. Work out how many chose A. 28
15. The top 10% of the students are awarded a distinction. Estimate the mark needed for a distinction. 29
16. 30
17. 31
18. How much did the 800 tickets cost altogether? 32
19. 33
20. 21. 34