Professor Chris Bishop...what I’d like you all to do please is to hold out your left thumb at...

1

Part 1 Professor Chris Bishop Have you ever wondered why computers are brilliant at playing chess,

but can’t hold a conversation? Or why a toddler is better at recognising everyday objects than the world’s most powerful supercomputer? Join me in the quest for Digital intelligence.

Hello and welcome to the final lecture in our High-tech Trek. We’ve seen that computers can do amazing things and that they’ve touched almost every aspect of our lives, but can a machine be intelligent? Well to answer that, we first need to know what we mean by intelligence. What about understanding speech, or composing music, or discussing philosophy, or even just playing football? Well these are all aspects of intelligence and in all cases, it turns out to be difficult to get a computer to do these things well. So why is it hard to build intelligent machines?

Well let’s look at one aspect of intelligence which is recognition of everyday objects in the world around us. Now clearly this is an important problem, because around a third of the human brain is devoted to visual processing. It’s also a very exciting frontier in computer science right now, because for the first time we’re starting to make significant progress. Now to help me explain this, I have two special guests.

2

And here they are. So this is Jack and this one is Shadow and they, aren’t they gorgeous? Now, I’m sure you can all tell right away that Shadow is a dog and Jack is a cat, but how difficult would it be for a computer to tell the difference between cats and dogs? Well we’ll find out at the end of the lecture. Thank you very much.

Woman Thank you.

Chris But first, let’s try and experiment to see how well people do at this task. And I’d like you all to join in with this. What I’m going to do is to show you some images, very quickly, one after the other. Now each image will contain either a cat or a dog. If it’s a cat, I want you all to shout as loudly as you can “Cat”. And if it’s a dog, I want you to shout as loudly as you can, “Dog”. Okay? And we’ll see how you get on. Are you ready?

Audience Cat.

Dog.

Dog.

Cat.

Cat.

Dog.

Cat.

Cat.

Cat.

Dog.

Dog.

3

Cat.

Dog.

Dog.

Cat.

Dog.

Dog.

Chris Okay, excellent. Give yourselves a big round of applause for that.

Now the task of recognising structure or objects in data is called pattern recognition and you’ve all just solved a pattern recognition problem. And today we’re going to explore ways to get computers to do pattern recognition.

Now while you found that quite easy, it’s something that computers find incredibly difficult. The problem is that there is huge variation in the images, due to different sizes and different colours of cats and dogs and different shapes and there are changes in the lighting and changes in the background and these cause changes in the images. Even just working out which part of the image is the animal and which is the background is hard for a computer.

Now our brains are very good at handling this variability, but computers easily get confused. So how can we get computers to do pattern recognition? Well let’s start by looking at something that computers are good at. Here we have a screen full of words and they’re all jumbled up. Now computers would be very good at sorting all these words into alphabetical order and the reason that they’re good at this is because sorting can be done through a series of logical steps, or rules, and computers are very good at following rules very quickly.

4

So let’s see how we can use this ability to solve a problem in computer vision. Now one reason that humans have two eyes is so that we can judge distance. We can see how this works with a little experiment. So what I’d like you all to do please is to hold out your left thumb at arm’s length. And then take your right thumb and hold it just in front of your left thumb. Okay, now close your left eye and using your right eye, line up your thumbs. Now keeping very still, close your right eye and open your left eye and you’ll see that your right thumb seems to have moved to the right just a little bit. Okay?

Okay, try that again. Hold out your left thumb at arm’s length. This time take your right thumb and hold it much closer to your face. Okay, again close your left eye and using your right eye, line up your thumbs. And now, keeping very still, close your right eye and open your left eye. And this time you’ll see that your right thumb seems to have shifted much further to the right, so the amount of shift is related to the distance from your eyes. Now this is called parallax and it helps us to judge distance.

Now computers can use this principal in order to see the world in three dimensions and for this I’ll need a volunteer please. Who would like to come on down? Would you like to come down? Yeah.

If you’d like to stand just there. What’s your name?

Connor Connor.

Chris Connor? Alright Connor, what we have here is a special camera – if you could bring the hand-held camera in, just to have a look at this – this camera’s unusual because it has two lenses and they’re spaced a short distance apart, so it’s a bit like our eyes. And on the screen here we can see what each of these cameras is seeing.

Now if we lay these two images on top of each other, we’ll see that Connor’s face here is shifted in one image, compared to the other. So that displacement, that’s just like your thumbs in the experiment we

5

did just a moment ago and the computer can measure that shift and can use it to work out how far away each part of this image is.

And if we bring up the next screen, we can see this happening in real time. So Connor, if you just keep very still there and just look towards that camera, and we can even see Connor’s eyes and nose and mouth here, the colours, or the shades of grey, represent distance. So the dark regions around here are regions which are far away and the lighter regions are closer to the camera. And if you just hold up your hand for me like that, that’s good, and now that appears nearly white, because that’s closest to the camera.

Try just moving your hand forwards and backwards slowly like that, that’s good, and as it comes forwards it gets whiter and as it goes back, it becomes darker. So this gives – thanks, that’s great – so this gives computers a way to distinguish between the object in the foreground and the background. And of course, we can also use this as a way of controlling computers through using gestures. Okay, thanks very much.

So a computer can work out the distance to objects, in real time, because it’s programmed with the rules of parallax. But of course this computer has no idea what it’s actually looking at. So can we solve our pattern recognition problem of telling cats from dogs by programming the computer with logical rules?

Well back in the 1970s scientists tried creating artificial intelligence using an idea called Expert Systems, which were based on handcrafted rules. So for example, here are two images of, ah, a cat and a dog and looking at these, you might think well, let’s have a rule which says if the fur is long it’s a dog, otherwise it’s a cat. Now the problem is, when we have a rule like this, we often find there’s an exception to the rule. So here’s another pair of images – this is a cat with long fur and a dog with short fur. And the problem is that every time we think of a rule, we can often find an exception.

6

In fact, there are always exceptions to these rules. This task is simply far too complex to be expressed in terms of simple rules. So while rule based systems have been found useful for some applications, we’ve pretty much given up using them to do pattern recognition.

So what we need is a completely new approach and this is based on two very important concepts. Now the first concept is called machine learning. The idea is this – instead of programming the computer to solve the pattern recognition problem directly, we do something completely different, we programme the computer to learn from data and then we train the computer to solve the problem, so it’s a bit like the way you and I learn things from experience. Well we’ll see how computers could learn for themselves after the break.

Part 2 Chris Let’s look at something simple that we can do using machine learning.

Now in a moment you’re all going to play a computer game and I’m not going to tell you what it is, because I think it should be obvious how you play it. But what’s special is the way in which you are going to control the computer. You won’t have a mouse or a gamepad, instead you’re going to control the game by moving your bodies.

To do this we have four cameras – these are video cameras and they’re all looking out at the lecture theatre, so they’re looking at all of you. And the output from these cameras is being fed to this computer.

Now the first thing we need to do is to train the system to recognise your movements. So let’s have everybody leaning to your left first of all, as far as you can. Okay, that’s good. Now let’s have everybody leaning to your right, as far as you can. Good, okay. Now the system is trained and we’re now ready to play. Now Marcus here is just going to be controlling the speed, but it’s up to you to control the direction, so you all need to lean together to make this work. Okay, off we go.

7

And you can shout out if you think everybody should be going one way or the other.

Audience (Shouting)

Chris Oh, okay, we’ll stop there. Want to give yourselves a big round of applause?

So how does this work? Well the video from each of those cameras consists of 25 images a second and the computer compares those images with the ones that it captured during training. So here’s an image of all of you leaning to your left. And we should have another image, and this is you all leaning to your right. So the computer takes each of those images and it compares to the two training images. If it matches better to the one where you’re leaning to the left, then it steers the car to the left and if it matches better with the one when you’re leaning to the right, then it steers the car to the right. Now that’s an easy pattern recognition problem, because there are only two possibilities – left and right. And also the background and the lighting are the same in the training images as they are in the live feeds from the cameras. The key point is that we didn’t invent any rules about what the images look like when you’re leaning, we simply showed examples to the computer.

Now let’s look at a harder problem, where there’s a lot more variability in the images, which therefore takes us a step closer to our goal of distinguishing cats from dogs. Now I think you all recognise these creatures, but the question is – could you tell which penguin is which? Quite hard. Well a computer has actually been trained to solve just that problem. Ecologists are interested in monitoring the movements of penguins on Robben Island in South Africa and they need to tell which penguin is which, in order to monitor their movements. And so a group from Bristol has built a system which can help to automate this.

Now we weren’t able to bring penguins all the way from South Africa this evening, so instead please welcome three volunteers.

8

Now you’ll notice that our three penguins here have patterns of spots on their front and each pattern of spots is unique to that particular penguin, it’s like a sort of fingerprint. Now before the lecture, we trained this system to recognise each of these three penguins. So, what I’d like now, is for our first penguin, please, to come and waddle round in front of this camera. And on the screen here we can see a rectangle appear. That rectangle means that the computer has recognised that a penguin has walked in front of the camera, so it’s detected the presence of a penguin.

Now it then looks at the pattern of spots and it compares that pattern of spots with the patterns in its database. And it’s telling us that penguin is Pandora. So if you’d like to just turn around, and this is indeed Pandora. Excellent. Would you like to go and stand over there?

Right, would our next penguin like to waddle in front of the camera please? And again we see a rectangle, so it’s detected the presence of a penguin and it’s analysing that pattern of spots and this one, it thinks, is Pamela. So could you turn around? Indeed, we have Pamela. Excellent, thank you very much.

And our final penguin – this is the baby of the family. You might need to stand on tiptoes I think, oh yeah, excellent, okay it’s got it. So the rectangle has detected a penguin and, again, analysing the pattern of spots and this one is Pedro. Would you like to turn round? And it is, indeed, Pedro. Thank you very much.

Now real penguins also have this unique pattern of spots. And here we can see penguins on Robben Island in South Africa, where this system is being used. So every time you see a rectangle, that means the system has detected a penguin. And it then analyses the pattern of spots on the front of the penguin and compares it with the database to determine which penguin is which.

Now the reason the system can detect penguins is because it’s been trained on a thousand images of real penguins and it uses those

9

patterns of spots to tell the individual penguins apart, so that their movements can be monitored. Okay, thank you very much.

Now that’s an impressive system and the reason the computer can do so well at detecting penguins is that all the penguins essentially have that same shape of white region with the black border.

So we’ve seen how learning from data allows us to solve problems such as video tracking and recognising penguins. But distinguishing cats from dogs is much harder, because they’re much more variable in their appearance and also, they’re quite similar to each other. So as well as learning from data, we need to introduce our second key idea.

Let’s suppose that the computer detects patches of long fur. Now that means it’s more likely to be a dog than a cat, but it’s not certain. So the computer has to make use of uncertain and ambiguous information. Now just because something’s uncertain, it doesn’t mean that we can’t be precise or that we can’t make predictions.

In fact my own research involves a branch of mathematics called probability theory which is devoted to making precise statements about random or uncertain events. Now I can show you how this works with the help of a volunteer, so shall we have someone from this side? Um, would you like to come on down?

If you’d like to stand just there. Yes, if you’d like to come and stand just there. And what’s your name?

Emma Emma.

Chris Emma? Alright Emma, what I’d like you to do, there are some yellow beads in this pot. If you hold your hand out, I’ll just give you one of those. What I’d like you to do is just place it in the top of this demonstration.

Emma Drop it?

10

Chris Yes, just let go, that’s fine. We see it bounces around off these pegs and it lands in one of the slots. Okay, here’s another one, try that. So again, it bounces around off the pegs in a random looking way and it ends up in a different slot. So the behaviour of each of these beads is random, we can’t predict which slot a particular bead will fall into.

But what happens if we tip a whole pot of beads in here? Well, in this envelope I have a prediction for what’s going to happen. Now let’s see if this prediction comes true. So what I’d like you to do is just take that and, just very carefully from the end, that’s it, tip the whole lot in, nice and slowly, that’s good. Okay.

Okay, so each bead is bouncing around very randomly, but all the beads together made a nice smooth pattern. So let’s have a look at my prediction and we’ll see how well it fits. So here we go, and that’s what I predicted should have happened, so I think that’s, that’s not bad actually, is it?

So the height of this red curve tells us the probability that a bead will land in that particular slot. So although each bead behaves randomly, we can still make precise predictions about the behaviour of the set of beads. Okay, thank you very much.

So this means that we can teach computers to make predictions, even though they have to deal with uncertain data. For example, an individual cat or dog might have long or short fur, but typically, dogs have longer fur than cats. So the length of the fur is useful information for the computer, even though it’s not certain. That information can be combined with lots of other sources of information, such as colour and shape, in order to allow the computer to make a decision on whether it’s a dog or a cat.

So we’ve introduced probabilities as a way of allowing computers to deal with uncertainty when solving a pattern recognition problem. Now we can see how to make use of probabilities by playing a game and I need you all to help me with this.

11

So what I’ve done is, I’ve thought up a secret message and what I want you to do is to guess this message. Now each of these boxes represents one of the letters of this message and you’re going to guess one letter at a time. So we’ll start by guessing the first letter and once we’ve got that, we’ll move onto the second letter and so on. When you get the correct letter, it will appear in this box and any incorrect guesses will appear in the column underneath. So your job is to make as few incorrect guesses as possible. Okay. And, ah, Andy here is going to type the letters in as we go.

So I’m just going to go around and point to people and I just want you to give me a guess for a letter. So, do you want to start?

Boy A.

Chris A.

Girl E.

Chris E.

Child F.

Chris F.

Boy O.

Chris O.

Boy J.

Chris J.

Child I.

Chris I.

12

Child L.

Chris L.

Boy P.

Chris P.

Girl T.

Chris T.

Boy O.

Chris O? We’ve had O. Yes.

Boy C.

Chris C. Good, excellent. Okay, well C is the first letter. Okay, start again then. Next letter.

Boy M.

Chris M.

Girl E.

Chris E.

Child D.

Chris D.

Child Um, F.

Chris F.

Boy T.

13

Chris Okay, let’s have some people from this side then.

Girl G.

Chris G.

Girl A.

Chris A.

Child H.

Chris H.

Girl Ah, C.

Chris C, we’ve, yes C.

Child P.

Chris P.

Boy I.

Chris I. Look at the letter and think what might come next.

Boy R.

Chris R.

Boy K.

Chris K.

Boy L.

Chris L.

14

Boy U.

Chris U. What might come after C?

Boy I.

Chris I.

Child O.

Chris O. Excellent, right. CO. What might be the next letter?

Boy P.

Chris P.

Child M.

Chris M. Good. Next?

Boy F.

Chris F? COM, what comes next?

Boy W.

Chris W.

Boy P.

Chris P? Okay, good, P, what’s next?

Boy U.

Chris U. All together?

Audience T–E–R–S.

15

Chris S. Yes, space is a letter.

Chris Okay, what’s next? Let’s have some guesses here.

Girl A.

Chris A? Good, excellent.

Audience / Chris R–E–SPACE–C.

Chris Let’s go, let’s go back over here

Child M.

Chris M.

Child K.

Chris K.

Girl T.

Chris Okay, are computers slow?

Boy A.

Chris A. That was a hint, by the way. Anybody?

Boy Q.

Chris Q, good. Okay, what comes after Q?

Audience/ Chris U–I–C–K.

Chris Excellent. Okay, give yourselves a big round of applause.

So you’ll see that it was quite hard to get started, you had lots of wrong guesses here. But once we got a little bit into the word, you

16

could see what the next letters were going to be and it became quite quick. And then here it wasn’t quite so obvious, but once we’d got Q, we know that the next letter is nearly always U and at that point, it’s really pretty clear what the rest of the message is.

So this shows that language has structure, but the structure involves uncertainty. So some letters are more probable than other letters, but the probabilities change while we’re actually playing the game. So our second key idea is that we have to deal with uncertainty, and that we can express uncertainty in a very precise way, using probabilities.

Now computers can learn these probabilities by analysis of thousands of pages of text. For example, the computer could look for all the places where COMP occurs and then see how often it’s followed by U, how often it’s followed by an L, and so on. Now to find out what we can do once the computer has learned these probabilities, join me after the break.

Part3 Chris So language has structure and we can describe this structure using

probabilities. Let’s see what we can do once a computer has learned those probabilities.

This is a system called Dasher and it allows us to write text without using a keyboard. This is useful for disabled people and it’s also useful when we have a small device like a phone, because using a keyboard can be quite fiddly. So what I’m going to do is to write a message and the message will appear up at the top here. And I’m just going to write the message and then afterwards, I will stop and I’ll explain how it works.

So I’m just going to type the message that we had before – computers are quick. Here we go. Okay. So how does that work? Let me just go back and start again and then we can see.

17

So imagine that we took every sentence that you could ever write and we listed them all in alphabetical order, with A at the top and Z at the bottom. Then in order to write a sentence, all we have to do is to find it within that list, so writing has become a task of navigation. And we can use the probabilities to help us do that navigation. So let me just write that message again, a little more slowly this time. So we want to start with a capital C. So here we see the letters of the alphabet and they’re represented by boxes. And the size of the box corresponds to the probability that that will be the next letter.

So here’s the letter C, and it’s reasonably probable that we’ll start with a C, so it’s got a reasonably big box. So my first task is to navigate inside that box. Now to help me do this navigation, the mouse is controlling this little cross. So if that cross moves up, I can navigate towards the top of the list and if it moves down, I can navigate towards the bottom of the list. If I move to the right it speeds up, to the left it slows down. And if I go even further to the left, I can actually go backwards and undo any mistakes.

So let’s go inside that C box then. So the letter C has now appeared up here and within the C box, again I have the complete alphabet, from A down to Z. And the size of these boxes represents the probability that the corresponding letter will be next. And these boxes keep changing. So right now O is actually a very probably letter and that’s, in fact, the letter I want, so I have to navigate inside the O box.

And so I’m now inside the box corresponding to O, so I’ve typed my second letter, which is O. And again, we have the complete alphabet inside this box, from A to Z. But the size of these boxes has now changed because the probabilities have changed. I’m after M, which is a reasonably probable letter, so I now navigate inside the M box and we’ve typed M.

And so by making the boxes bigger when the probability is bigger, it makes it easier for me to write common words. If I want to write a letter

18

that’s very unusual I can still do that, it just means the box is much smaller.

Okay, so the next letter is P, so I now navigate inside the box corresponding to P. And what you’ll see now is that, ah, just as on our message guessing game, once we’ve got COMP, there’s now a fairly likely sequence of letters, which is U–T–E–R, and those are all sort of sitting there, ready for us to just select. So we can very quickly type the rest of the word ‘computers’.

And so we’ve typed ‘computers’, this is the space character, so we’re ready to carry on with the rest of the message. There’s the A of ‘are’, there’s R–E and space. And down here is ‘quick’. If I just stop there for a moment. I’m just about to go into the Q box, which is here. Now Q is nearly always followed by U, so inside the Q box, you see that the box corresponding to U fills nearly all of the space. And there we are, ‘computers are quick’.

So modelling probabilities can also be used for predictive text on mobiles phones as well. Now let’s see if we can apply probabilities to our problem of recognising objects in images and yes, I will need a volunteer for this. Shall we have, ah, let’s have somebody from up near the back. Um, you with the stripy top on – yes – would you like to come on down?

Now if you’d like to stand just there for me. And what’s your name?

Alice Alice.

Alice? Alright. So what we have here is a standard cheap webcam and it’s looking at this table. And it’s connected to a laptop and on the laptop is running some software which has been trained to recognise two categories of objects – mugs and mobile phones. So let’s see how we get on with mugs first of all. Would you like to select one of those mugs for me – that’s good – and just place it there in front of the camera. Okay, that’s good.

19

So the system has recognised that it’s a mug and here, this bar, the size of this bar represents the confidence which the system has that this is, indeed, a mug, so the higher the bar, the higher the probability. Okay, do you want to try a different mug. Okay, so it can recognise mugs, what about mobile phones? Who has a mobile phone? Who’d like to come and, yes, why don’t you come and bring your phone out? Would you like to take the mug away, that’s it, and just pop your phone down? Okay, and it’s recognised that it’s a phone and it seems reasonably confident in that. Okay, thank you very much.

So this system knows about mugs and phones, but what happens if we give it a new category of object, something it’s never seen before? Well here we’ve got some little model cars, so would you like to take one of those cars for me and pop that down?

No it doesn’t, it doesn’t know about cars, it only knows about phones and, ah, and mugs. So it, for the moment this thinks this is a phone. So what I’m going to do now is to tell it that this is actually a new category called a car. And so it, now it knows that that’s a car. Alright, just turn it round a little bit for me, see if it can still recognise it.

Okay, not! So after, alright, so it’s seen, it’s seen one example so far, so let’s just train that, so it knows it’s a car. Let’s make it harder now, let’s try a different car. So it’s had just two training examples so far.

Okay, this one it thinks is a phone. So again I’m going to tell it that this is actually a car. That’s good. Let’s try one more car then, see how we get on.

Okay, and that time it, that time it got it. And here, it’s not, it’s still not very confident, it’s just seen, I think, three examples now of cars. Usually when we train real pattern recognition systems, we don’t show them three examples, we show tens of thousands of examples. Okay, thank you very much.

20

So this system looks at the shape and the colour of the object and it uses this information to work out the probability that it’s one of the objects that it’s been trained on. Now in that demonstration there were just three categories of objects, mugs, phones and cars, and they all look fairly different from each other. And also, the background was plain and the background doesn’t change.

So that helps us understand why the cats versus dogs problem is much harder. Here are some examples of images of cats and dogs and you can see there’s a huge variability in the background, in the colour, in the lighting and of course, the animals themselves are flexible, they’re constantly changing shape. So how well can current technology do on telling cats from dogs? Well let’s just welcome back Jack and Shadow.

So earlier we took some photographs of Jack and Shadow and we added those photographs to a data set of 10,000 images of cats and dogs. Now half of those images are cats and half are dogs. And we took all those images and we fed them through a state-of-the-art object recognition system and we asked it to classify each image as either cat or dog. Now because there are equal numbers of cat and dog, if the system just guessed at random, it would get 50% correct.

Now the computer, taking about half a second to process each image, managed to get 83% correct, so it’s doing a lot better than random, but it’s still well short of being perfect. Now you might think why don’t we get the world’s most powerful supercomputer and use that? Well of course, if we did it would run a lot faster, but it wouldn’t actually be more accurate. The problem is not lack of processing power, it’s that we don’t yet know how to use lots of processing power to achieve good recognition.

So how well do people do? Well on the website that’s associated with this year’s Christmas Lectures, we conducted an experiment. Humans had to classify images against the clock, and they took an average of one second per image and they managed to get 96% correct.

21

And how did our state-of-the-art recognition system do with Shadow and Jack? Well, here are four images of Shadow and Jack and what we found is that the bottom two it got correct, and the top two it got wrong.

So just telling the difference between cats and dogs is a challenge for present day computers. Thank you very much.

Woman Thank you.

Chris And that’s just two categories. An adult human is thought to be able to distinguish tens of thousands of categories of objects. Even a three-year-old toddler is much better at recognising everyday objects than a supercomputer. And yet the progress that we’ve made so far has already led to some practical applications, from things like allowing robots in factories to see what they’re assembling, to allowing tumours to be detected in medical images.

Now what if you had to solve the problem of pattern recognition and recognising objects in images? How would you go about doing it? Would you use stereo vision? Or would you look for shapes and colours? Or would you come up with a completely different approach?

And recognising objects is just one of many aspects of human intelligence that we can’t yet build into a machine. The challenge of digital intelligence is one of the most fascinating frontiers of computer science. It’s more than half a century since the first digital computers were built and yet, we’re still at the beginning of the digital revolution. Whatever advances the next 50 years bring, they will be at least as important and at least as exciting as those of the last 50 and it’s the scientists of your generation that will make it happen. Thank you.

Professor Chris Bishop...what I’d like you all to do please is to hold out your left thumb at...

Documents

Transcript of Professor Chris Bishop...what I’d like you all to do please is to hold out your left thumb at...