Sunday, 16 May 2021

Quibans 101: Criminals' names

 With thanks to Cat van Saarloos for the data file, here’s a Quibans from the Daily Mirror.

Police release names most commonly linked with crime - and it's bad news for Davids

Bottom of Form

Choosing a baby's name can be a huge decision and many parents spend time agonising over the choices in front of them.

What you are called can have a lasting impact on your life, but can it effect the chances of you ending up behind bars?

Recent research conducted by casino experts Goodluckmate has shown that some names are more common amongst troublemakers in the UK, with David and Sarah topping the list.

Top 10 lawbreaking male names

1.    David - 1,010 criminal charges

2.    Daniel - 1,001 criminal charges

3.    Michael - 895 criminal charges

4.    Paul - 874 criminal charges

5.    James - 796 criminal charges

6.    John - 742 criminal charges

7.    Mark - 742 criminal charges

8.    Lee - 701 criminal charges

9.    Christopher - 691 criminal charges

10. Andrew - 660 criminal charges

Top 10 lawbreaking female names

1.    Sarah - 117 criminal charges

2.    Amy - 111 criminal charges

3.    Claire - 104 criminal charges

4.    Lisa - 103 criminal charges

5.    Lauren - 101 criminal charges

6.    Kelly - 99 criminal charges

7.    Rachel - 98 criminal charges

8.    Nicole - 85 criminal charges

9.    Michelle - 80 criminal charges

10. Louise - 75 criminal charges

A spokesperson for Goodluckmate said: "Our names play a huge part in our identity, but can they influence who we turn out to be? Are there some names that are more likely to end up on a judge’s docket?

"We wanted to find out if a name can make someone more likely to become a criminal, so we made Freedom of Information requests to police forces around the country, asking for the names of people that were charged with crimes in the last two years, so we could discover the names most likely to commit crimes.

"In total, we received 42,671 names from various police forces around the country, allowing us to work out which names had the most criminal charges attached to them."

 

Here is a first set of questions:

Question 1) What errors can you see in the article?

Question 2) Comment on this sentence: “We wanted to find out if a name can make someone more likely to become a criminal”

Question 3) Is this a good use of public money?

Question 4) What is the problem with saying “people called David are more likely to break the law”?

Question 5) What other information would it be useful to have?

Question 6) How much more likely are men to commit crimes than women?

 

And possible answers:

Answer 1) Aside from the maths/stats errors that appear below (and which could also fit here), in this phrase “but can it effect the chances” the word ‘effect’ should be ‘affect’.

Answer 2) This is a correlation vs causation misunderstanding!

Answer 3) A casino company submitted a freedom of information request (FOI) to police forces, asking for this information.  The police forces legally have to respond to FOI request, which costs time and money.  Is it worth it for a request like this?  (Well – it produced a Quibans, but I’m struggling to see any other use!)

Answer 4) If the name David is more common then there are likely to be more law-breakers called David. 

Answer 5) A list of how common each name is.

Answer 6) On the top ten lists there are 8112 males and 973 females, suggesting that men are 8.3 times as likely to commit a crime than women.

There is further information on the casino website:

The UK’s most popular names

When we look at the UK’s current most popular names, how do they measure up?


Question 7)  Any comments about this?

 

Answer 7) I was wrong!  The names David and Sarah don’t appear on the list of most popular names. 

But there is more.  Here’s the end of the casino article:

Methodology

We made Freedom of Information requests to police forces around the UK asking for data regarding the first names of those who had been charged with a crime in the 19/20 financial year within their area. Of the constabularies we made requests to, 17 were able to send data, amounting to a total of 42,671 names. We then took the total number of people charged with each name, giving us our results.

When looking at the UK’s most popular names, we took the top five names for baby boys and baby girls in the UK in 2019.

Question 8) What’s the problem with their methodology?

Question 9) What would be a more sensible thing to do instead?  What other info do you need?

 

Answer 8) They are comparing the number of crimes committed by people with each name in 2019-2020 with how many babies that name was given to in 2019.  If the frequency of names changes year to year, then this is a daft comparison to make, because there won’t be any babies in the crime figures!

Answer 9) It would be more sensible to use name-frequency data from perhaps 30 years ago.

 

Cat has provided data from the ONS (Office for National Statistics), showing the popularity of each name for each year ending in a ‘4’.  Cat’s spreadsheet is here.  The students could analyse this.

Here is the start of the spreadsheet for male names:



And here it is with the 10 names highlighted:

My analysis follows (but there are lots of other ways of doing it). 

Taking the average position for male names in the years 1964 (babies born then would be 55 in 2019), 1974 (age 45), 1984 (age 35) and 1994 (age 25), gives the following table.  David is second on the list:

Name

Average position

James

7.5

David

7.75

Christopher

8

Michael

8

Andrew

8.5

Mark

15.25

ROBERT

15.25

John

17.75

Paul

18

RICHARD

18

MATTHEW

18.25

Daniel

19.25

STEPHEN

21.5

THOMAS

21.5

JONATHAN

23.5

STEVEN

27

PETER

27.25

NICHOLAS

27.75

SIMON

29.25

ANTHONY

30.25

WILLIAM

30.75

ADAM

31.5

Lee

33

 

Aside from Lee (position 23), the other nine names (which are shown in lower case in the table) are all in the top 12.

If instead we focus only on 1974 and 1984 we get this list:

David

3

Christopher

3.5

James

4.5

Paul

5

Andrew

5.5

Mark

6

RICHARD

6.5

Michael

7

MATTHEW

8

Daniel

10

John

13.5

Lee

13.5

 

David appears at the top, and all of the ten names are within the top 12.

 

Monday, 28 December 2020

Quibans 100: Something Fishy

This is not a Quibans about politics (and when I use this with my class I won’t be taking about the politics), but it is about common mathematical/statistical errors that people make.

It is unusual in that it starts with a tweet.  I don’t usually create a Quibans based on a ‘random person tweeting’, but because this came from the account of a former MEP who has over 70,000 followers, it seems reasonable to do so.

With my class I will start by showing the following images.  Here’s the first tweet:



This image was part of that tweet:


Here’s part of an image from one of the other tweets in the thread:



I will ask the class:  What questions do you have for me?

I am anticipating some of the following:

1)      Where do the numbers come from?

2)      What do they mean?

3)      What is he working out?

4)      Why is he doing it ‘by hand’?

And here are my responses:

1)      This is from the ‘Annex’ to the Brexit Trade Deal that was agreed by the EU and the UK government on 24 December and which is due to be voted on by MPs before the end of 2020.  This is a printout of pages 893 and 894.

2)      It shows the percentage of each type of fish that is allowed to be caught in different areas of sea by UK and by EU fishing boats.

3)      The two orange highlighted columns show the percentage for UK boats in 2021 and in 2026.  He is working out the average of these for each page.

4)      He has presumably used a calculator to add them up and then to divide by the number of types of fish on the page.  Why hasn’t he just copied it into a spreadsheet?  The document it is in is a pdf, and the annex is an image, so the text can’t just be copied and pasted!  I have retyped those two columns (I think I have done it accurately).  They are available on this spreadsheet.  (The spreadsheet hasn’t been optimised for printing.  Pages 1 to 4 include a screenshot of the original document and my typed version of the relevant columns.  The final sheet includes all of the typed data.)

Tasks for the students:

A)      For page 1, calculate the total and the mean to check the figures he has worked out here.  How many different ways can you work out the mean?

B)      Now do the same on page 2.  What is going on here?  (Find the error!)


C)     
Go to page 4.  If he makes the same mistake on page 4 as he did on page 2, what will his calculations for 2021 be?  What is the correct mean?

D)      Go to the final sheet.  What is the mean for all of the fish for 2021?  What is it for 2026?

E)      Why is it not reasonable to say there is a 2.32% increase?  Using his figures and his methodology, what would be a more sensible value to give for the increase?

F)       If UK boats catch 20% of a type of fish in one year and 30% the next, why might they not be landing more fish in the second year?  What information is missing here?

G)     Why can’t we find the mean of the 2021 figures and the mean of the 2026 figures and then calculate the change?  (This is a major issue!)


Some answers:

A)      This is a nice opportunity to use Excel in different ways.  The calculations for page 1 are correct.  (He has sensibly rounded the answers for the mean.)  We can use =SUM( : ) to work out the total and can then divide that answer by 24 (there are 24 types of fish stocks on the first page) or can do =AVERAGE( : )

B)      On page 2 the total for 2026 is correct.  But he has then divided by 27.  This should be 28.  If you use the ‘average’ command on Excel you get the correct answer and this is the same as dividing by 28.  Why has he got this wrong?  The second page starts with number 25 and goes up to number 52.  He has done 52 – 25 = 27.  This seems an obvious thing to do for lots of people.  Why is it wrong?  How many ways can they explain it? 

[If you count them you find there are 28 of them.  Or you could consider that on page 1 it goes from 1 to 24, but that is clearly 24 different fish – you don’t do 24 – 1.  Or: if you subtract 25 you are getting rid of fish number 25 – we need to remove fish number 24 (and the earlier ones), so it should be 52 – 24.]

C)      Page 4 includes fish number 77 to 87.  If he subtracts then he will assume there are 10 fish on that page, whereas it should be 11.  The total is 383.69.  He will divide by 10 to get 38.37% whereas it should be divided by 11 to give 34.88%.  (In fact he rounded off to 1dp this time and got 38.4%)  Note that you can use SUM and AVERAGE in Excel across the blank cells and cells with text in them – it just ignores them and still gives the correct answers.


D)      For 2021 the mean is 33.47% and for 2026 it is 35.93%.  These are close to the values he gives (of 33.6% and 35.92%) but not identical.  In fact, in the tweet he writes about 2020 and not 2021.  I tweeted to ask about this, but didn’t receive a reply.

E)      He seems to be saying that because the UK share (according to his calculations) is going from 33.6% to 35.92% then that is an increase of 2.32%.  We ought to refer to this a an increase of 2.32 percentage points.  From the perspective of the UK fishing industry, the percentage is going from 33.6 up to 35.92.  This is an increase of 2.32 over the original value of 33.6.  That’s actually an increase for the UK boats of 6.9%.

F)       The document talks about the percentage of fish that can be caught by the two fishing fleets.  It doesn’t confirm that the total amount is the same each year.  If 10 tonnes can be caught one year and 1 tonne the next, then 20% of 10 tonnes is 2 tonnes (2000kg) whereas 30% of 1 tonne is 300kg.  (I couldn’t find any figures about the total quotas, so I have no idea whether the amounts are the same each year.)

G)     There is a massive issue here, which means all of the information in the original tweet (and everything that has happened so far in the earlier tasks) is just nonsense.  We cannot find the average of the percentages in the way the former-MEP did, because they are likely to be percentages of different amounts.  Here’s a simplified example.  If the UK fleet gets 10% of Fish A and 20% of Fish B, it doesn’t necessarily get 15% of all of the fish.  Suppose 100 tonnes of Fish A is landed in total and 1 tonne of Fish B is landed then 10% of A is 10 tonnes and 20% of B is 0.2 tonnes.  Altogether the UK boats get 10.2 tonnes out of 101 tonnes, which is 10.1%.  Or if the UK fleet has 100% of Fish A and 0% of Fish B then it gets 100/101 = 99% of the fish and not the average of 100% and 0%.

Any time you see averages of averages it is worth asking whether it is fair/reasonable to do this!


Sources: https://twitter.com/MartinDaubney/status/1343147201112010753

https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/948104/EU-UK_Trade_and_Cooperation_Agreement_24.12.2020.pdf

Sunday, 29 November 2020

Quibans 99: Bus Gate Fines

‘Bus Gates’ seem to be the new way to restrict access to certain roads.  In the past in Cambridge we had rising bollards that would disappear into the road to allow buses and emergency vehicles to pass through.  Bus Gates are ordinary roads, with lots of signs and cameras.  A new Bus Gate was installed on what had always previously been a clear section of road.  The story below is from the Cambridge News.

Thousands of fines issued to people driving through Mill Road bridge bus gate

Nearly 5,000 fines have been issued to people driving over Mill Road bridge.

From August 28 to October 16, 4,840 fines have been issued to drivers. The number of drivers themselves being issued with fines will be fewer, since one driver can receive several fines.

Since the summer, Cambridgeshire County Council has closed the bridge to all traffic except buses, cyclists, and pedestrians.

Prior to fines being issued, there was a grace period of more than two weeks when 1,630 notices were sent out to warn drivers that they were committing an offence and should they do so again, then they would face a fine.

The fine is £60, but is reduced to £30 if the motorist pays within two weeks.

More than 96 per cent of fines paid were just £30.

This means the council has made an estimated £151,008 from the partial closure of Mill Road bridge.

The council said last year money from fines was used for frontline highways maintenance.

 

You might like to present this to a class and to ask what is wrong with it.  Alternatively, the following questions could lead the students:

Q1)  Check every number.  Which ones are exact?  Which ones have been estimated by the journalist? 

Q2)  How did they do these estimations?

Q3)  Which number is definitely wrong?  Explain!

Q4)  Produce a better version of that number.

Q5)  Compare the number of fines issued to the number of notices sent out during the grace period.  What can we conclude?

 

Some possible answers follow.

A1)  “Nearly 5,000 fines” – has been rounded. 

“4,840  fines” and “1,630 notices”.  Is it suspicious that the number of fines (4,840) and notices (1,630) are both multiples of 10?  Maybe they have been rounded off?  (For both to end in a zero by chance the probability would be 1/100.)  It is possible that they are exact, though.

The numbers for the fines are accurate.

“More than 96 per cent” is an estimate, as is “an estimated £151,008”.

A2)  4,840 rounded to the nearest thousand is 5,000.

96% of the fines were £30.  Work out 96% of 4,840 and multiply that by £30.  Work out 4% of 4,840 and multiply by £60.  Add them together.  We get £151,008.

A3)  £151,008.  You can’t possibly get a number that ends in an ‘8’ by adding integer multiples of 60 and 30. 

A4)  A better answer/method would be to say: there are about 5,000 fines and almost all were £30, so that makes a total of 5,000 x £30 = £150,000.  The total is therefore an estimated £150,000 in fines.  I think this is the best estimate to give.  If you wanted to go more deeply into this we need to decide what the language means.  Can we assume “more than 96%” means it’s up to 96.5%?  (suspecting that were to be over 96.5% then it would be written as “nearly 97%?).

If so, then a lower bound would be to take 96.5% of the fines as £30 and the rest as £60.

That gives the newspaper figure as an upper bound (but it would need to be a multiple of £30) and 150,282 as the lower bound (but again – needs to be a multiple of 30).  The bounds would therefore be 150,300 and 150,990.  If we assume the 4840 has been rounded to the nearest 10 then that broadens those bounds just a little.  (To be clear, I don’t think any of this is worth doing: £150,000 is a perfectly sensible estimate here!)

A5)  The ‘grace period of more than two weeks’ was presumably there to allow those who use the road regularly to realise that they would be fined and to warn them of that.  I would expect lots of warnings during the grace period (because it was a new thing) and fewer fines when it was actually rolled out.

We need to find the rate for both numbers.  August 28 to October 16: That’s 4 days in August, 30 days in Sept and 16 days in Oct: a total of 50 days.  4,840 fines divided by 50 days gives an average of 96.8 fines per day.  (5000 divided by 50 is perhaps more sensible – giving 100 per day.)

The grace period was “more than two weeks”.  If that means 15 days then it is was 108.7 per day, if 16 days then 101.9 per day and if 17 days then 95.9 per day.

All of these values are very close to the 96.8 per day that were actually fined.  The number of people being fined each day is the same as the number of people who drove through during the grace period.  Does that mean the bus gate just isn’t working?  Or that the fines are not a deterrent?  Or that the grace period was in August, during the holiday times when the traffic was lower anyway, and that the massive increase in traffic on the roads in Sept means a smaller percentage are bursting through the bus gate? 

 

Source: https://www.cambridge-news.co.uk/news/cambridge-news/thousands-fines-issued-people-driving-19173924

Sunday, 22 November 2020

Quibans 98: US Election night 2020

In this Quibans you may want to copy the images so you can project them one at a time.  The questions can then be posed verbally.

In the USA’s presidential election, because of the time differences across the country, the polls close on the east coast earlier than elsewhere and they start counting the votes and releasing the results while voting is still taking place in other states.

I expected the states in New England (the north-east of the US) to support Joe Biden, so I was very surprised in the early hours of the following morning to see, on the election webpage of the Wall Street Journal, this (as a ticker running across the screen):


Q1) What does this mean?  And what doesn’t it mean?

A1) It would be usual to think of this as “Trump has 23.1% more than Biden”.  But we need to be careful.  If Biden has 100,000 votes, does that mean Trump has 123,100 votes?  Or does it mean that Trump’s lead divided by the total number of votes cast is 23.1% ?  We would usually refer to the latter as a lead of “23.1 percentage points” to avoid confusion.

After my surprise at this huge lead, I clicked on the image of the state of New Hampshire to get further information:



Q2) What is going on?  What do you notice?  What is strange?  What can you work out?

A2) So far, 26 votes have been counted!  Just 26.  And 16 of those went to Trump, while only 10 were for Biden. 

The difference between 61.5% and 38.5% is 23.0% - so there must be some rounding involved.

16/26 = 0.61538…, and 10/26 = 0.384615…, so Trump’s figure has been rounded down and Biden’s rounded up, and when we subtract we get 0.230769…, which rounds to 23.1

An alternative way to get this value is to do 6/26 (where the numerator is the difference between 16 and 10).

It says that 1.0% of the expected total vote has been reported.  If it’s exactly 1% then we would expect 2600 votes to be cast in the state of New Hampshire altogether.  Using the upper and lower bounds of 1.0% we get:

26/0.0095 = 2736.8 – so the upper bound for the total number of votes is 2736 (mustn’t round up!)

26/0.0105 = 2476.1 – so the lower bound for the total number of votes is 2477 (must round up!)

That seems like a small number!

 

Here is the current state of play in New Hampshire:



Q3) What is surprising?

Q4) How many percentage points is Biden leading by?

Q5) How many votes are still uncounted?

Q6) Biden won all 4 of the electoral college votes from New Hampshire.  Why might that be considered unfair?

A3) They have counted 803,831 votes so far – which is a long way above our upper bound of 2736 ! (And Biden managed to overturn Trump’s 6-vote lead!)

A4) Biden is leading Trump by 59,275 votes.  Dividing this by the 803,831 votes counted gives us 0.07374…, which is a lead of 7.37 percentage points.

A5) I am deeply suspicious of the figure for the percentage of votes that have been counted, but if we take it as correct we get that 830,831 is between 98.95% and 99.05%

830,831 / 0.9895 = 812,360 (rounding down)

830,831 / 0.9905 = 811,541 (rounding up)

Hence, there are between 7710 and 8529 still to count.

A6) The way the US Presidential election system works, each state votes and then (for 48 of the 50 states) the electoral college votes for that state all go to the winner of the state.  In the election in 2000 the state of Florida was won by a margin of only 500 votes.  That gave all of the electoral college votes for Florida (25 of them – because Florida has a greater population than New Hampshire) to George W. Bush and resulted in him winning the election. 

In New Hampshire, Biden benefited (he got 53% of the votes but 100% of the electoral college votes), whereas in other states Trump benefited from the system.  In the 2016 election Hillary Clinton won more votes than Donald Trump, but ended up with fewer electoral college votes, so Trump was the winner.

Here is the final map from 2020:


Q7) Is there more red or blue?

A7) I think there appears to be more red.  But because the very big red states in the north have small populations, the number of electoral votes for the blue states (Biden) significantly exceeds the red states (Trump).

The Wall Street Journal provides this as an alternative version of the map:


Q8) What is going on here?

A8) The map has been scaled to show one square for each electoral college vote.  (You may want to flick back and forth between the real map and this one.  Montana is a particular casualty!).  Now there is more blue. 


Source: https://www.wsj.com/election-results-2020/live-coverage.html

 

 

 



Quibans 101: Criminals' names

 With thanks to Cat van Saarloos for the data file, here’s a Quibans from the Daily Mirror. Police release names most commonly linked wit...