Data Science in Schools

I’ve no doubt that good CS education involves finding some motivating contexts for getting the ideas across, and for pupils to get to grips with programming. Lots of teachers have found their pupils highly engaged through creating games and animations, or through interacting with the real world through physical computing and robotics, or, perhaps more unusually, through algorithmic art or composing music. I think we could make a good case for adding some data science into this mix, getting pupils to do a little visualisation and exploratory data analysis, and through this starting to answer some genuinely interesting questions. 

When we wrote the English computing curriculum, we included some explicit references to working with data: 7-11 year olds are taught “collecting, analysing, evaluating and presenting data”, and 11-14 year olds “undertake creative projects that involve selecting, using, and combining multiple applications, preferably across a range of devices, to achieve challenging goals, including collecting and analysing data.” Or at least they’re supposed to. CSTA’s standards go quite a bit further, with a whole strand given over to data and analysis, with a clear sense of progression and ambitious targets for high schoolers like “Create interactive data visualizations” and “use data analysis tools and techniques to identify patterns in data representing complex systems”.  I worry that we’ve put so much emphasis on coding that these crucial skills, and the consequent understanding gets overlooked in too many schools. It needn’t be this way. Indeed there’s plenty of scope for doing this data visualisation and analysis with code. 

I’ve been thinking recently about how we can take the foundations / application / implications (that’s roughly computer science, IT and critical digital literacy) model that underpins the English computing curriculum and apply it to related (and some unrelated) subjects, to help promote a broader and more balanced approach to curriculum design. We can use this model for thinking about data science in schools. 

If we’re serious about pupils’ learning data science, then I think we need to lay the foundations with some old school probability and statistics: typically these are already part of the math curriculum, but there’s so much more we can do here when we let our pupils use computers for this, from simulating dice rolls, through plotting graphs to calculating summary statistics for some big datasets. All these things can be done by hand (‘unplugged’?), but once pupils have an idea of the techniques, they can concentrate on selecting and using the right tools, and making sense of the results if they use technology to automate the automatable parts of the process – it’s far more interesting and useful to be able to make sense of a scatterplot (for example) than to be able to draw one by hand.

I’d also want pupils to apply this knowledge to some interesting problems. In elementary school, I’d look at opinion polls or other surveys as a way in to this, perhaps getting pupils to work collaboratively at coming up with good questions – agree / disagree Likert scales are a good starting point, and then exploring what they can learn by slicing the data they collect: is there any difference between boys’ and girls’ enjoyment of school subjects in elementary school (and is there any difference in high school…)? Later on, I’d start looking at time series: weather data is great for this. In the UK we’ve open access month on month meteorological data going back over 100 years, and a comparison of temperatures for the last 30 with the previous 70+ makes a persuasive case. Later still, I’d get pupils looking for patterns and relationships in big (or biggish) datasets: sports fans might like to play with accelerometer or GPS data from micro:bits, wearables or phones: can they work out what sport someone was playing from the datafiles (or a visualisation of them)? Could a machine do this? Big, public, anonymised datasets could be linked very powerfully to some social studies topics: what are the links between gender, ethnicity, education and income? Or pupils could learn about text mining techniques and apply these to their study of English: are there quantifiable differences between the vocabulary and grammar of Hemingway and Morrison? Or between Obama and Trump?

Even more importantly, I’d like pupils to think through some of the implications of collecting and using data as freely as we do. Coming back to my elementary school survey idea: what questions shouldn’t we ask one another? What questions shouldn’t we answer? Does it matter if your name is attached to the answers? In one day at school, how much data does a pupil generate (attendance, grades, cafeteria, accessing the internet, CCTV, online learning, behaviour management, etc…)? What happens to all this data? What could you discover about a pupil if this was all linked together? Does anyone mind? How much do internet service providers, search engines and email services know about a user? What do they use this for? Again, does anyone mind? If big tech firms provide the wonderful services they do for free, how have they got to be some of the most valuable companies in the world? The English computing curriculum includes teaching pupils ‘new ways to protect their online identity and privacy’ – what should we include here?

Some of this certainly should be part of what our pupils learn in their school computing lessons, but lots of it provides ample opportunity for cross curricular links, with math, social studies, civics and even sports! I think we as CS teachers gain so much through showing how relevant coding can be to the other things our pupils study.

Miles Berry
International Representative

Mathematics and Computer Science

As attention in England (and elsewhere) turns to the World Cup, I’ve been reading a couple of books about the mathematical modelling of football: Anderson and Sally’s The Numbers Game and Sumpter’s Soccermatics. I’d recommend them both if you’re interested in learning more about some of the patterns in the data that football (i.e. soccer) generates. I suspect young (and not so young) people are already quite familiar with the computational modelling of football, not through books such as these, but through computer games such as FIFA and Football Manager. These games make extensive use of real data, and are excellent examples of what the English computing standards describes as ‘computational abstractions that model the state and behaviour of real world problems’

The parallel between mathematical modelling and computer programming is no coincidence: there are deep historical connections between computer science and mathematics, and these remain strong to this day. Doing mathematics, at its heart, is a two-step process of thinking about a problem and then manipulating symbols according to rules: before Turing’s day the symbol manipulation (typically arithmetic) was done by people called computers, since his time (in the real world, if not always in school), this work is done by machines called computers. In either case, it’s the thinking about the problem and its solution where the real mathematics lies. Similarly, programming is a two-step process: thinking about the problem and how to solve it (the ‘computational thinking’), and then writing the instructions (the code) which mean the solution can be carried out by a dumb machine.

In his classic 1957 text, How to Solve It, Polya identifies four principles for problem solving in mathematics: understand the problem, plan a solution, carry the plan and review or extend the solution. I think all of these apply to problem solving in computing, with all but the third stage sitting comfortably within most approaches to computational thinking. There’s much common ground between computer science and mathematics: both domains demand logical thinking and a systematic approach, both result in computation, and both draw on the idea of abstraction. In her 2008 paper, Wing drew a distinction between abstraction in computer science and abstraction in mathematics, indicating that in CS, abstraction was both more general and more practical than it is in mathematics.

For those teaching in elementary school, there are so many opportunities to exploit the connections between mathematics and computer science, as they’re likely to find themselves teaching both to their class. Papert’s turtle graphics have long had their place in the mathematics curriculum as well as providing what remains a great way into coding. Scratch introduces pupils to four quadrant coordinates. Away from programming, dynamic geometry software such as Geogebra or graphics programs like Pixlr can introduce the ideas of transformations. Pupils can be introduced to probability through simulations in Scratch or Excel, and statistics through online surveys and data logging with the micro:bit.

Further up the education system, it becomes harder to bridge the artificial gap between CS and mathematics, but it’s well worth the attempt. Take any mathematics investigation or open-ended problem and, after trying a few ideas with pencil and paper, explore how you might program a computer to solve it: personal favourites are problems like ‘how many ways can you make 50 cents using coins?‘, or ‘how many perfect shuffles does it take to get a 52 card pack back in order?‘ Modelling works well here too, perhaps showing how a ball bounces, or estimating pi through the ratio of random points inside to those outside a unit circle, to creating a class (and overloaded operators) to perform fractions arithmetic. All of these are great coding activities, but they’d also develop pupils’ mathematical understanding of these ideas.

There are some great resources out there for folks interested in linking mathematics and computer science more closely. Top of my list would go the Scratch Maths project for 4th/5th grade math from University College London, and the Bootstrap World courses for algebra and data science. It’s notable that both of these have taken impact evaluation very seriously.

Miles Berry, International Representative

Assessing computing

When assessing students’ learning in computing, I think we’ve a couple of approaches. One would be to look at the projects students do, whether these are open ended, design and make tasks or more constrained solutions to problems we pose, perhaps assessing these against agreed criteria or using a rubric. The other is to ask questions and use their answers to judge what they’ve learnt: these questions can be quite open, or perhaps as straightforward as multiple choice. I think a good assessment strategy ought to draw on both approaches: we want students to be able to work creatively on extended projects, and we also want to check, from time to time, whether they can remember the things they’ve been taught.

Responses to questions certainly have a place in summative assessment at the end of a course, but I think they’ve much to offer for formative assessment before, during and after lessons or units of work:

  • How can we tell that students have made progress? By their doing better on questions at the end of a topic than they did at the beginning.
  • How can we tell if they’ve understood the idea we’ve explained? By getting responses from a carefully designed, hinge-point question straight after our introduction.
  • How can we engage students in a meaningful discussion about CS ideas? By having them work together to answer good questions?

Lots of teachers are doing this sort of thing already – writing their own questions to ask their class, or just making these up on the spur of the moment. That’s fine, but coming up with good questions is surprisingly difficult, and it’s not particularly efficient having lots of teachers all doing this independently of one another, when a divide and conquer approach to question writing would work, if only teachers could share their questions with one another.

For the last couple of years, CSTA’s UK little sister, Computing At School (CAS) has been working with assessment experts at Durham University, Cambridge Assessment and EEDI to crowd-source an ‘item bank’ of quick fire questions that teachers can use with their classes. We’ve standardised on four response multiple choice questions (a format that US-based members of CSTA are likely to be quite familiar with already), and have adopted EEDI’s Diagnostic Questions (DQ) platform for hosting the questions, making it easy for teachers to compile questions into quizzes and assign these to their classes.

Access to the questions, and use of the DQ platform is free for anyone. The questions are released under a Creative Commons licence, so teachers are able to embed these in their own virtual learning platform or presentation software if they wish, but our hope is that students attempt these on the DQ site, so we can use the data from hundreds of thousands of students attempting thousands of questions to work out how hard each questions is, whether a question is good at discriminating between stronger and weaker students, and where common misconceptions are in school level computing.

As I write, we’ve some 8,049 questions online: mostly covering middle / high school CS, but there’s some coverage of elementary school CS and of information technology and digital literacy – I’d really encourage you to register on the DQ site and have a browse of what we’ve got: you can filter down through different aspects of CS, and sort questions by most likes, most answered, most misconceptions etc. It’s easy enough to add questions to a quiz of your own, and we’ve got 384 shared quizzes which are free to use too. Once you’ve registered, you can access the questions at bit.ly/quantumquestions.

We’re already getting some insights from students’ answers to the questions, highlighting the areas of the CS that students seem to struggle with, such as understanding variable assignment, code tracing and data types. We’re also running Rasch analysis on students’ responses, and plan to use this to identify lower quality questions, as well as making it easier for teachers to find questions suited to their students’ current level of achievement.

It’s a crowd sourced project, and so we’d be very glad to have more questions: I’d be glad to support anyone interested in getting their questions onto the site, or who’d be interested in learning more about writing good questions. If you’d like to learn more about the project, check out bit.ly/projectquantum, or watch the seminar Simon Peyton Jones and I gave at Cambridge Assessment last month.

AI and CS Teaching

Last week, I had the interesting experience of giving evidence at a hearing of our House of Lords Artificial Intelligence select committee. The House of Lords is the (entirely unelected) upper house of the UK’s legislature, so for me, this was quite a big deal.

Their lordships were interested in the applications of AI to education in general, but they seemed much more interested in the opportunities that England’s computing curriculum would provide for our students to learn about AI.

In terms of the uses of AI in schools, we’re already seeing a fair few applications of machine learning and other aspects, and I think these look set to continue in the short to medium term. I certainly don’t see AIs replacing teachers any time soon, but I think there are plenty of aspects of the teacher’s role where some support from smart machines might be quite welcome, for example in assessment, with marking essays, judging the quality, rather than merely the correctness of a student’s code; in recommending appropriately challenging activities, resources and exercises for students; in carefully monitoring student activity, privacy concerns notwithstanding; and in responding quickly to students’ questions or requests for help.

If teaching can be reduced merely to setting work and marking work, then I would fear for the long term future of the profession: ‘Any teacher that can be replaced by a machine, should be’, as Arthur C Clarke famously put it. My Roehampton students think there’s much more to teaching than this though: teaching students how to be a person, how to get on with other people, and inspiring them to learn things that they’re not already interested in, to give just three examples. I don’t see the machines taking over these responsibilities any time soon.

More interesting are the opportunities to teach students about AI as part of CS education, or the broader school curriculum. The English programmes of study for computing are phrased broadly enough to allow, or perhaps even encourage, students to develop a grasp of how AI, and particularly machine learning, works, in age-appropriate ways from age five to eighteen. CSTA’s new standards allow scope for pupils to learn about machine learning too: between 3rd and 10th grade students should be able to use data to highlight or propose cause and-effect relationships and predict outcomes; refine computational models based on the data they have generated; and create computational models that represent the relationships among different elements of data collected.

There are some great tools out there to make this accessible to students, from Google’s teachablemachine, through Dale Lane’s fabulous, IBM Watson powered, machinelearningforkids.co.uk, to building machine learning classifiers in Mathematica (easy!) and Python (more tricky, but really not out of the question), as well as the fun that can be had building simple chatbots in Scratch or Python, and hacking Google Assistant using the Raspberry Pi Foundation’s AIY kit.

Great as these opportunities are, I am concerned that we’re not doing enough in schools to get students thinking through the ethical implications for individuals, society and civilisation of AI. Worryingly, England’s education ministers removed the wider ethical references to the computing curriculum we’d developed. Machine learning algorithms already make life-changing decisions for many of us, and the impact of these technologies on our lives seems likely to only increase over our students’ lives. Education is, at least in part, about preparing our students for the opportunities, experiences and responsibilities of their later lives, and I’m not sure we can do justice to this if we’re not teaching them how AI works, and to think through some of the big questions about how AI should be used.


Miles Berry


Miles Berry, International Representative