Lesson 1: Deep Learning 2019 – Image classification

Lesson 1: Deep Learning 2019 – Image classification

Okay so Welcome Practical deep learning for coders less than one it’s kind of lesson two because There’s a lesson zero in less than zero is is why do you need a GPU? And how do you get it set up? So if you haven’t got the GPU running yet? Then go back and do that. Make sure that you can access a jupiter notebook and And then you’re ready to start the real lesson one. So if you’re ready You will be able to see something like this and In particular, hopefully you have gone to notebook tutorial. It’s at the top. That’s right with zero zero here as this grows you’ll see more and more files, but will keep a notebook tutorial at the top and you will have used your Jupiter notebook to add one and one together getting the expected result Bigger And hopefully you’ve learned these four keyboard shortcuts so The basic idea is that your jupiter notebook Has pros in it it can have pictures, you know it can have Charts in it And most importantly it can have code in it. Okay, so the code is in python How many people have used Python before? So nearly all of you. That’s great. Um if you haven’t used Python, That’s totally okay. Okay It’s a pretty easy language to pick up. But if you haven’t used Python This will feel a little bit more intimidating because the code that you’re seeing will be unfamiliar to you. Yes, Rachel Oh, yeah No, because I’m trying to keep them up separate yeah. Yeah. Okay. We’re not the way here so as I say there are things like this where people In the room in person and this is one of those bits just like this is really for the book audience Not for you That’s I think this will be the only time like this in the in the lesson where we’ve assumed you’ve got this set up Thanks to a mother. Okay? All right, so yeah, this is you’re in the room or on foreign faster you’re alive you can go back after this and make sure that you can get this running using the information in course III go faster do Okay, okay Okay, so a Jupiter notebook is a really interesting Device for our data scientists because it kind of lets you run interactive experiments and it lets us give you not just a static piece of information but it let it lets ask you something that you can actually Interactively experiment with so let me explain how we Think works well to use these notebooks and to use this material and this is based on the kind of last three years of experience we’ve had with The students who have gone through this course first of all It works pretty well just to watch a lesson end to end. Okay? Don’t try and follow along because it’s not really designed to go to speed where you can follow along it’s designed to be something where you just Take in the information you get a general sense of all of the pieces how it all fits together, right? And then you can do it back and go through it more slowly Pausing on in the video And trying things out making sure that you can do the things that I’m doing and that you can try and Extend them to do it things in your own way. Okay, so don’t worry if things are zipping along Faster, then you can do them. That’s normal And also don’t try and stop and understand everything the first time if you do understand everything the first time good for you But most people don’t particularly as the lessons go on they get faster and they get more difficult, okay So at this point, we’ve got our notebooks going we’re ready to start doing deep learning and so the main thing that hopefully you’re going to agree at the end of this is that you Can do deep learning regardless of who you are now, I don’t just mean do we mean do at a very? High level, I mean world-class practitioner level deep learning, okay so Your main place to be looking for things is course b3 to fast AI Where you can find out how to get a GPU Other information and you can also access our forums You can also access our forums and on our forums you’ll find things like how do you build a Deep learning box yourself and that’s something that you can do after you don’t later on once you’ve kind of got going Who am I? So why should you listen to me? Well, maybe you shouldn’t but I’ll try and justify why you should listen to me I’ve been doing stuff with machine learning for over 25 years I started out in management consulting where actually initially I was I think Mackenzie and company’s first Analytical specialist and went into a general consulting ran number of startups for a long time eventually became The president of cattle but actually the thing I’m probably most proud of in my life is that I got to be the number one Ranked contestant in travel competitions globally So I think that’s a good Fact to call why can you actually train a predictive model that predicts things pretty important aspect of data science? I didn’t found a company called analytic which was the first kind of medical deep learning company Nowadays, I’m on the faculty at University of San Francisco and also co-founder with Rachel of fast AI so I used Machine learning throughout that time and I guess I’m not really although I am at usf for the University I’m not really an academic type. I’m much more interested in in using this tool to do useful things Specifically Through fast AI we are trying to help people use deep learning to do useful things through creating software To make deep learning easier to use at a very high level through education such as the thing you are watching now Through research which is where we spend a very large amount of our time, which is researching to figure out How can you make deep learning? easier to use at a very high level which ends up in as you’ll see in the software and the education and By helping to build a community, which has made me through the forums. So that practitioners can find each other and work together So that’s what we’re doing So this lesson practical deep learning for coders is kind of the starting point in this journey It contains seven lessons each one’s about two hours long We’re then expecting you to do about eight to ten hours of homework during the week So it’ll end up being something around 70 or 80 hours of work I will say there is a lot as to how much people put into this I know a lot of people who work full time on fast AI Some folks whose do the two parts can spend a whole year doing it really intensively. I know some folks Watch the videos on double-speed and never do any homework and come at the end of it with you know A general sense of what’s going on? So there’s lots of different ways you can do this but if you follow along with this kind of Ten hours a week or so approach for the seven weeks by the end. You will be able to build an image classification Model on pictures that you choose that will work at a world class level you’ll be able to classify Text again using whatever datasets you’re interested in you’ll be able to make predictions of kind of commercial applications like sales You’ll be able to build recommendation systems such as the one used by Netflix not Tory examples of any of these but actually things that can Come top ten and capital competitions that be everything that’s in the academic community Very very high-level versions of these things. So that might surprise you that’s slightly over the the prerequisite here is Literally one year of coding and high school math but we have thousands of students now who have done this and shown it to be true you Will probably hear a lot of naysayers Less now than a couple of years ago than we started but a lot of naysayers telling you that you can’t do it Or that you shouldn’t be doing it or the deep learnings got all these problems It’s not perfect, but these are all things that people claim about deep learning which are either Pointless or untrue? It’s not a black box as you’ll see it’s really great for interpretive interpreting. What’s going on? It does not need much data for most practical applications. You certainly don’t need a PhD Rate from house one so it doesn’t actually stop you from doing deep learning if you have a PhD I certainly don’t I have a philosophy degree and nothing else It can be used very widely for lots of different applications. Not just for vision, which is where it’s most well-known You don’t need lots of hardware, you know that thirty-six cent and our server is more than enough to get world-class results for most problems It’s true that maybe this is not going to help you to build a sentient brain, but that’s not our focus. Okay, so For all the people who say deep learning is not interesting because it’s not really AI Not really a conversation that I’m interested in. We’re focused on solving interesting real-world problems What are you going to be able to do by the end of lesson one? Well, this was an example from Nikhil who’s actually in the audience now cuz he was in last year’s course as well This is an example of something he did which is he downloaded 30 images of people playing cricket and people playing baseball and around the coach will see you today and build a nearly perfect classifier of riches, which So this kind of its kind of stuff that you can build with some fun hobby Examples like this or you can try stuff as we’ll see in the workplace that could be of direct commercial value So this is the idea we’re going to get to by the end of lesson one We’re going to start by looking at code which is very different to Many of the academic courses. So for those of you who haven’t kind of an engineering or math or computer science background This is very different to the approach where you start with lots and lots of theory and then eventually you get to a Postgraduate degree and you’re finally at the point where you can build something useful. We’re gonna learn to build the useful thing today Okay. Now that means that at the end of the day You want level of a theory? Okay, there will be lots of aspects of what we do that you don’t know why or how it works. That’s okay You will learn why and how it works over the next seven weeks But for now, we’ve found that what works really well is to actually get your hands dirty coding not focusing on theory because there’s still a lot of Addison ship in deep learning unfortunately, it’s still a situation where people who are good practitioners have a really good feel for how to work with code and how to work with the data and you can only get that through experience and So the best way to get that that that feel with how to get good models is to create lots of models through lots of coding and Study them carefully and it’s Jupiter. Notebook provides a really great way to study them. So Let’s try that Let’s try getting started. That’s so to get started. You will open your Jupiter notebook and You’ll click on lesson 1 lesson 1 yes, and it will pop open looking something like this and so here it is so you can Run a sail and a Jupiter notebook by clicking on it and pressing run but if you do so everybody will know that you’re not a real deep learning practitioner because real deep learning practitioners know the keyboard shortcuts and the Keyboard shortcut is shift enter given how often you have to run a cell don’t be Going all the way up here finding your clicking at just shift enter. Ok, so type like type shift enter don’t actually Up and down to move around to pick something to run shift-enter to run. Okay So we’re going to go through this quickly. And then later on we’re going to go back over it more carefully So here’s the quick version to get a sense of what’s going on So here we are in lesson 1 and these three lines is what we start every notebook with These things starting with percent are special directives to Jupiter notebook itself. They’re not Python code. They’re called magics Which is kind of a cool name and these three directives the details aren’t very important but basically it says hey if somebody changes the underlying library code while I’m running this place reloaded automatically if Somebody asks to plot something then please plot it here in this Jupiter mo book So just put those three lines at the top of everything The next two lines load up the fast AI library What is the faster a library? so it’s a little bit confusing fast AI with no dot is the name of our software and Then first dot AI with the dot is the name of our organization So if you go to dark start fast AI This is the fast a I might be okay Well learn more about it in a moment But for now just realize everything we are going to do is going to be using basically either fast AI Or the thing that fast AI sits on top of which is platform Height watch is one of the most popular libraries for deep learning in the world It’s a bit newer than tensorflow. So in a lot of ways, it’s more modern than tensorflow It’s extremely fast growing extremely popular and we use it because We used to use tensorflow a couple of years ago And we found we can just do a lot more a lot more quickly with paid watch And then we have this software that sits on top of plate watch unless you do Far far far more things that are far more easily than can with plate or alone. So it’s a good combination We’ll be talking about about it. But for now just know that you can use past AI by doing two things importing Star from past AI and then importing staff and fast AI dot Something where something is the application you want concurrently fast AI supports for applications computer vision natural language text tabular data and Collaborative filtering and we’re and we’re going to see lots of examples of all of those during the seven weeks So we’re going to be doing some computer vision at this point if you are a Python software engineer you are probably Feeling sick because you see me go import star which is something that you’ve all been told to never ever do Okay, and there’s very good reasons to not use import star in standard production code with most libraries But you might have also seen for those of you that have used something like MATLAB It’s kind of the opposite everything’s there for you all the time. You don’t even have to import things a lot of the time It’s kind of funny. We’ve got these two extremes of like how to write code You’ve got a scientific programming community that has one way and then you’ve got the software engineering community that has the other Both have really good reasons for doing things and with the faster a library We actually support both approaches, you know, you put a note block where you want to be able to quickly Interactively try stuff out You don’t want to be constantly going back up to the top and importing more stuff and trying to figure out where things are You want to be able to use lots of tab complete be, you know, very experimental So import start is great Then when you’re building stuff in production, you can do the normal Pepe style you know proper software engineering practices, so So don’t worry When you see me doing stuff, which at your workplace is found upon. Okay, it’s it’s this is a different style of coding It’s not that There are no rules in data science programming. It’s that the rules are different right when you’re training models The most important thing is to be able to interactively experiment quickly. And so you’ll see we use a lot of very different Processes styles and stuff to what you’re used to but they’re there for a reason And you’ll learn about them over time. You can choose to abuse a similar approach or not. It’s entirely up to you The other thing to mention is that the faster a library’s It designed in a very interesting modular way and you’ll find over time that when you do use import star there’s far less Clobbering of things and you might expect it’s all explicitly designed to allow you to pull in things and use them quickly without having problems Okay so we’re going to look at some data and there’s two main places that were pretending to get data from for the course one is from academic datasets Academic datasets are really important. They’re really interesting They’re things where academics spend a lot of time Curating and gathering a data set so that they can show how well different kinds of approaches work with that data though the end here is they try to design data sets that are Challenging in some way and require some kind of breakthrough to do them. Well, So we’re going to be starting with an academic data set called the pet data set the other kind of data set We’ll be using during the course is data sets from the categorical competitions platform Both academic data sets and cadwal data sets are interesting for us particularly because they provide strong Baselines that is to say you want to know if you’re doing a good job So with capital data sets that have come from a competition You can actually submit your results to Carol and see how well would you have gone in that competition and if you can get in About the top 10% that I’d say you’re doing pretty well for academic data sets academics write down in papers what the state of the art is So how well did they go with using models on that data set? So this is this is what we’re going to do. We’re going to try and create Models that get right up towards the top of capital competitions preferably actually in the top ten. What does the top 10% Or that meet or exceed academic state-of-the-art published results so the When you use an academic data set It’s important to cite it. So you’ll see here. There’s a link to the paper that it’s from you Definitely don’t need to read that paper right now But if you’re interested in learning more about it and why it was created and how it was created all the details there so in this case, this is a pretty difficult challenge the PEC datasets going to ask us to distinguish between 37 different categories of dog breed and cat breed. So that’s really hard. In fact Every course until this one We’ve used a different data set which is one where you just have to decide is something a dog or is it a cat? So you’ve got a 50-50 chance right away, right and dogs and cats look really different There are lots of dog breeds and cat breeds look pretty much the same. So why have we changed that dataset? We’ve got to the point now where deep wedding is so fast and so easy that the dogs versus cats problem which a few years Ago was considered extremely difficult 80% accuracy was state-of-the-art. It’s now too easy our models were basically getting everything right all the time without any tuning and so they want, you know, really a lot of Opportunities for me to show you how to do more sophisticated stuff. So we’ve picked a harder problem this year So this is the first class where we’re going to be learning how to do this difficult problem and this kind of thing where you have to distinguish between Similar categories it’s called in. The academic context is called fine-grained classification So we’re going to do the fine-grained classification task with Figuring out particular kind of pet and so the first thing we have to do is download and extract the data that we want We’re going to be using this function called ant our data which will download it automatically and will interact automatically AWS has been kind enough to give us lots of space and bandwidth for these datasets So they are download super quickly for you. And so the first question then would be how do I know what entire data? does So you could just type help and you will find out What my talk did it come from because since we imported staff, we don’t necessarily know that what does it do and Something you might not have seen before even if you’re an experienced programmer is what exactly do you pass to it? You’re probably used to seeing the names URL file name destination that you might not be used to seeing These bits these bits are tight. And if you’ve used a tight programming language, you’ll be used to seeing them But frankly programmers are less used to it But if you think about it, you don’t actually know how to use a function unless you know What type each thing is that you’re providing it? So we make sure that we give you that type information directly here in the help so in this case The URL is a string and the file name is either Union means either over a path or a string and it defaults to nothing and The destination is either a path or a string of defaults to nothing So we’ll learn more short me about how to get more documentation about the details of this But for now we can see we don’t have to pass in a file name or a destination It’ll figure that out for us from the URL. So and for all the data sets, we’ll be using in the course We already have constants defined For all of them, right? So in this URLs module or class actually You can see that’s where it’s going to grab it from. Okay, so it’s going to download that to some Convenient path and untie it for us and we’ll then return The value of path. Okay, and then in Jupiter map book, it’s kind of handy. You can just write a Variable on its own and semicolon is just it in the statement marker in Python So that’s the same as doing this you can write it on phone and it fits it You can also say print write but again, we’re trying to do everything fast and interactively There’s write it and here is the path Where it’s given us that data Next time you run this Since you’ve already downloaded it, it won’t download it again since you’ve already untied it it won’t untie or it again So everything’s kind of designed to be pretty automatic pretty easy There are some things in Python that are less convenient for interactive use and they should be for example when you do have a path object Seeing what’s in it actually is takes a lot more typing that I would like so sometimes we add Functionality into existing Python stuff. One of the things we do is we add an LS method to paths So if you go to path type LS Here is what’s inside This path so that’s what we just downloaded. So when you try this yourself You wait a couple of minutes for it to download Unzip and then you can see what’s in there If you’re an experienced Python programmer you may not be familiar with this approach of using a splash like this now This is a really convenient function. That’s part of Python three its functionality from something called path Lib These are path objects path objects are much better to use then strings that lets you basically create Sub paths like this. It doesn’t matter if you’re on Windows Linux Mac It’s always going to work exactly the same way So here’s a path to the images in that data set Alright, so if you’re starting with a brand new data set trying to do some deep learning on it, what do you do? Well, the first thing you would want to do is probably see what’s in there. So we’ve found that these are the Directories that in there. So what’s in this images? There’s a lot of functions in fast i/o for you There’s one called get image files that will just grab a array of all of the image files based on extension in a path And so here you can see We’ve got lots of different files. Okay, so this is a pretty common way to For image computer vision datasets to get passed around as that is just one folder with a whole bunch of files in it so the interesting bit then is How do we get the labels so? in machine learning the labels refer to the thing we’re trying to predict and if we just eyeball this We could immediately see that the labels are actually part of the file name. You see that right? It’s kind of like path slash label underscore number extension, so we need to somehow get a list of These bits of each file name and that will give us our labels Because that’s all you need to build a deep learning model. You need see pictures so files containing the images and you need some labels So in fast AI this is made really easy. There’s a Object called image data Bunch and an image data bunch represents all of the data You need to build a model and there’s basically some factory methods which try to make it really easy for you to create That data bunch We talked more about this role even a training set and the validation set with images and labels For you now, in this case we can see we need to extract the labels from the names Okay, so we’re going to use from name re so for those of you that use Python You know re is the module in Python that does regular expressions things. That’s really useful for extracting text I just went ahead and created the regular expression that would extract the Label from this text. Okay. So those of you who Are not familiar with regular expressions Super useful to be very useful to spend some time figuring out how and why that particular regular expression is going to extract the label From this text. Okay. So with this factory method we can basically say, okay. I’ve got this path containing images This is a list of file names. Remember I got them back here This is the regular expression pattern that is going to be used to extract the label from the file name Will talk about transforms later And then you obviously to say what size images do you want to work with? So that might seem weird Why do I need to say what size images I want to work with? because the images have a size we can see what size the images are and I guess honestly, this is a Shortcoming of current deep learning technology, which is that a GPU Has to apply the exact same Instruction through a whole bunch of things at the same time in order to be fast And so if the images are different shapes and sizes, you can’t do that Right, so we actually have to make all of the images the same shape and size In part one of the course, we’re always going to be making Images square shapes in part two. We’ll learn how to use rectangles as well. It turns out to be surprisingly Nuanced but pretty much everybody in pretty much all computer vision modeling nearly all of it uses this approach of square And 224 by 224 for reasons we learn about is an extremely common size That most models tend to use so if you just use size equals to 24 you’re probably going to get pretty good results most of the time and this is kind of The little bits of artists in the ship that I want to teach you folks, which is like what generally just works, okay So if you just use size equal to 24, that’ll generally just work for most things most of the time So this is kind of return a data bunch object And in fast AI everything you model with is going to be a data bunch object. We’re going to learn all about them And what’s in them and how do we look at them and so forth. They’re basically a data bunch object contains two or three Data sets it contains your training data We’ll learn about this shortly. It’ll contain your validation data, and optionally it contains your test data and for each of those it contains your Your images and your labels or your texts and your labels or your tabular data and your labels or so forth And that all sits there in this one place something we’ll learn more about a little bit is Normalization, but generally in all nearly all machine learning tasks. You have to make all of your data about the same size They’re specifically about the same mean and about the same standard deviation So there’s a normalized Function that we can use to normalize our data bunch in that way Okay, rich or come and ask the question Thanks What is the function do an image size is not 224? great, so This is propaganda known about shortly basically This thing called transforms is is used to do a number of things and one of the things it does is to make something size 224 Let’s take a look at a few pictures. Here are a few pictures of Things from my digger from my data bunch so you can see data dot show batch Can be used to show me the contents of some of the contents of my data bunch So this is going to be three by three and you can see roughly what’s happened is that they all seem to have been kind of Zoomed and cropped in a reasonably nice way. So basically what it’ll do is something called by default center cropping Which means it’ll kind of grab the middle bit and it also resize it so we’ll talk more about the detail this because it turns out to actually be quite important but basically a combination of Cropping and resizing is used Something else we’ll learn about is we also use this to do something called data augmentation So there’s actually some randomization in how much and where it crops and stuff like that Okay, but that’s the basic idea is some cropping and some resizing that often We also also do some some padding So there’s also all kinds of different ways and it depends on data augmentation, which we’re going to learn about shortly And what does it mean to normalize the images So normalizing the images we’re going to be learning more about later in the course but in short it means that the the Pixel values we’re going to be learning more about pixel values the pixel values start out from naught to 255 and some pixel values might tend to be Really I Should say some channels because there’s red green and blue so some channels might tend to be Really bright and some might tend to be really not bright at all and some might be area large and some might not very much At all, it really helps train a deep learning model If each one of those red green and blue channels has a mean of 0 and a standard deviation of 1 ok we’ll learn more about that if you Haven’t studied or don’t remember means and standard deviations. We’ll get back to some of that later, but that’s the basic idea That’s what normalization does if your data and again, we’ll learn much more about details But if your data is not normalized, it can be quite difficult for your model to train Well, so if you do have trouble training a model one thing to check is that you’ve normalized it as GPU man will be in power up to doesn’t size 256 some more practical considering to be a little utilization So we’re going to be getting into that shortly, but the brief answer. Is that the Models are designed so that the final layer is of size seven by seven So we actually want something where if you go seven times to a bunch of times then you end up with something That’s a good size Yeah, all of these details we are going to we are going to get to but the key thing is I wanted to get you Training a model as quickly as possible But you know, one of the most important things to be a really good practitioner is to be able to look at your data Okay, so it’s really important to remember to go do batch and take a look. It’s surprising How often when you actually look at the data set you’ve been given that you realize it’s got Weird black borders on earth or some of the things have text covering up some of it or some of its rotated in odd ways So make sure you take a look, okay? and Then the other thing we want to do is not just look at the pictures But also look at the labels and so all of the possible label names Accord your classes, that’s where the data bunch. You can print out your data type classes and so here they are that’s almost the possible labels that we found by using that regular expression on the file names and We learnt earlier on in that prose. I wrote at the top that there are 37 Possible categories and so just checking length data classes. It is indeed 37 A data bunch will always have a property called C And that property called C. The technical details will kind of get to it later But for now, you can kind of think of it as being a number of classes For things like regression problems and multi-label classification and stuff. That’s not exactly accurate, but it will do for them it’s it’s important to know that data dot C is a really Important piece of information that is something like or at least for classification problems. It is the number of classes Okay, believe it or not we’re now ready to train a model and so a model is trained in fast AI using something called a learner and Just like a data bunch is a general fast AI concept for your data And from there there are subclasses for particular applications like image data bunch Alanna is a general concept for things that can learn to Fit the model and from that there are various subclasses to make things easier and a particular There’s one called con flora, which is something that will create a convolutional neural network for you And we’ll be learning a lot about that over the next few lessons But for now just know that to create a learner for a convolutional neural network You just have to tell it two things. The first is What’s your data and not surprisingly it takes a data bunch and the second thing you need to tell it is. What’s your model? Or what’s your architecture? So as I learned there are lots of different ways of constructing a convolutional neural network But for now, the most important thing for you to know is that there’s a particular kind of model called a res net Which works extremely well? Nearly all the time and so for a while at least you really only need to be doing choosing between two things Which is what size ResNet do you want depth is basically, how big is it? And we’ll learn all about the details of what that means But there’s that one quart of risen at 34 and there’s one quart of resin at 50 And so when we’re getting started with something up because small one because it’ll chain faster so That’s kind of it. That’s as much as you need to know to be a pretty good practitioner about architectures for now Which is that there’s two Architectures or two variants of one architecture that work pretty well present at 30 450 start with a smaller one and see if it’s good enough So that is all the information. We need to create a convolutional neural network learner There’s one other thing I’m going to give it though Which is a list of metrics metrics are literally just things that get printed out as it’s training So I’ve saying I would like you to print out the error rate, please Now you can see the first time I ran this on a newly installed box It downloaded something What’s it downloading? It’s downloading the rest net 30 for pre-trained weights Now what this means is that this particular model Has actually already been trained For a particular task and that particular task is that it was trained on looking at about one and a half million pictures of all kinds of different things a thousand different categories of things using an image data set called image net and So we can download those pre trained weights so that we start start with a model that knows nothing about anything But we actually start with a model that knows how to recognize there are thousand categories of things in image net now I don’t think I’m not sure but I don’t think all of these 37 categories of pet Or in image net but there was certainly some kinds of dog know certainly some kinds of cat. So this pre trained model Already knows quite a little bit about what pets look like And it certainly knows quite a lot about what animals look like and what photos look like. So the idea is that we don’t start With a model that knows nothing at all But we start by downloading a model that does Something about recognizing images already So it downloads for us Automatically the first time we use it a pre trained model and then from now on it won’t need to download it again It’ll just use the one we’ve got This is really important. We’re going to learn a lot about this It’s kind of the focus of the whole course, which is how to do this is called transfer learning How to take a moral that already knows how to do something pretty well and make it so that it can do your thing Really? Well, I Take a pre trained model and then we fit it so that instead of predicting Li a thousand categories of imagenet with image net data It predicts the 37 categories of pets using your pet data And it turns out that by doing this you can train models in 1/100 or less of the time of regular model training with 1/100 or less of the data the regular model training. In fact potentially many thousands of times less Remember I showed you the slide of nickels lesson one from last year. He used 30 Images and there’s not cricket and baseball images in imagenet But but it just turns out that image gets already so good at recognizing things in the world They’re just 30 examples of people playing baseball and cricket Was enough to build a nearly perfect classifier Okay now You would naturally be Potentially saying well, wait a minute How do you know that it was going to actually that it can actually recognize pictures of people playing cricket versus baseball in general Maybe it just learnt to recognize those 13 Maybe it’s just cheating. Right and that’s called overfitting. We’ll be going talking a lot about that during this course, right? But what a fitting is where you don’t learn to recognize pictures of say cricket versus baseball, but just these particular cricketers and these particular photos and these particular baseball players in these particular photos We have to make sure that we don’t move a fit and so the way we do that is using something called a validation set a validation set is a set of images that your model does not get to look at and so these metrics Like in this case error rate get printed out automatically using the validation set and sort of images that our model never got to see When we created our data bunch It automatically created a validation set for us Okay, and we’ll learn lots of ways of creating and using validation sets but because we’re trying to bake in all of the best practices we Actually make it nearly impossible for you not to use a validation set Because if you’re not using a validation set, you don’t know if you’re overfitting Okay, so we always print out the metrics on a validation set. We’ve always hold it out We always make sure that the model doesn’t touch it. That’s all done for you Okay, and that’s all built into this data bunch object So now that we have a corner We can fit it You can just use a method called fit But in practice you should nearly always use a method called fit one cycle we’ll learn more about this during the course, but in short one cycle learning is a paper that was released I’m trying to think few months ago. Listen a year ago Yeah, so a few months ago And it turned out to be dramatically better both more accurate and faster than any previous approach So again, I don’t want to teach you how to do 2017 deep learning right in 2018. The best way to fit models is to use something called one cycle Well learn all about it, but for now just know you should probably take my own fit one cycle. Okay? If you forget how to type, then you can start typing a few letters in hit tab Okay, and you’ll get a list of potential options? And then if you forget what to pass it you can press shift tab And it will show you exactly what to pass it So you don’t actually have to type help and again This is kind of nice that we have all the types here because we can see cycle length that we’ll learn more about what that is shortly is an integer and then next learning rate could either be the flow for reflection or whatever and So forth and you can see that the mentions will default to this couple And so forth. Okay, so For now just know that this number four basically decides how many times? Do we go through the entire data set? How many times do we show the data set to the model so that it can learn from it each time? It sees a picture. It’s going to get a little bit better but it’s going to take time and It means it could over fit but sees the same picture too many times. It’ll just learn to recognize that picture not pets in general so we’ll learn all about how to Tune this number during the next couple of lessons but Starting out with four is a pretty good start just to see how it goes and you can actually see after four Epochs or four cycles. We put an error rate of 6% So a natural question is how long that took that took a minute and 56 seconds Yeah, so we’re paying you know 60 cents an hour now. We just pay for two minutes I mean, we actually pay for the whole time that it’s on and running There’s two minutes of compute time, and we’ve got an error rate of 6% So 95% of the time we correctly picked the exact right one Of those 94 dog and cat breeds which feels pretty good to me But to get a sense of how good it is. Maybe we should go back and look at the paper Just remember I said the nice thing about using academic papers or capital data sets is we can compare our solution to whatever the best people in Cabell did or whatever the academics did so this particular data set of pet breeds is from 2012 and If I scroll through the paper, you’ll generally find in any academic paper There’ll be a section called experiments about 2/3 of the way through and if you find the section on experiments Then you can find the section on accuracy and they’ve got lots of different models And their models as you’re read about in the paper, it’s really kind of pet specific they learn something about how pet heads look and how pet body is broken and Techne which is in general look and they combine them all together and once they use all of this complex code and math they got an accuracy of 59% Okay, so in 2012 This highly pet specific Analysis got an accuracy of 59% These were the top researchers from Oxford University today in 2018 with basically, if you go back and look at actually how much code we just wrote it’s about Three lines of code. The other stuff is just printing out things to see what we’re doing we got 94% so 6 percent error. So like that gives you a sense of You know how far we’ve come with deep learning and particularly with pay torch and fast AI how easy things are Yeah, so, um before we take a break, I just want to check to see if we’ve got any And just remember if you’re in the audience and you see a question that you want asked please click them up heart next to it So that Rachel knows that you want to hear about it Well, if there is something with six likes and Rachel didn’t notice it which is quite possible Just just quote it in a reply and say hey Rachel this one’s got six legs. Okay. So what we’re going to do is we’re going to take a Eight minute break so we’ll come back at five past eight So where we got to was we just we just trained a model We don’t exactly know what that involved or how it happened. But we do know that we’re three or four lines of code we’ve built something which smashed the accuracy of the state-of-the-art of 2012 6% arrow certainly sounds like pretty impressive for something that can recognize different dog breeds and cat breeds But we don’t really know why it works that we will that’s, okay. All right and In terms of getting the most out of this course We very very regularly here after the course is finished the same basic feedback Which this is literally copy and paste it for them forum. I Fell into the habit of watching the lectures too much and googling too much about concepts without running the code At first I thought I should just read it and then research the theory And we keep hearing people saying my number one regret is I just spent 70 hours doing that And at the very end, I started running the code and oh it turned out I learned a lot more. So please run the code Really run the code. I should have spent the majority of my time on the actual code and the notebooks running at seeing what goes in and Seeing what comes out So your most important skills to practice our learning and we going to show you how to do this in a lot more detail But understanding what goes in And what goes out? So we’ve already seen an example of looking at what goes in which is data dot show batch and that’s going to show you examples of labels and images and So next we’re going to be seeing how to look at what came out So that’s the most important thing to study as I said the reason we’ve been able to do this so quickly is heavily because of the fostered a library now if I stay a Library is pretty new but it’s already getting an extraordinary amount of direction as you’ve seen all of the major cloud Providers either support it or are about to support it A lot of researchers are starting to use it. It’s it’s – remaking a lot of things a lot easier, but it’s also making new things possible and so Really understanding the faster I Software is something which is going to take you a long way and the best way to really understand the faster your software Well is by using the fast AI? Documentation and we’ll be learning more about the fast a documentation shortly So, how does it compare I mean There’s really only one major other piece of software like fast AI that is something that tries to make deep learning easy to use And that’s chaos chaos is a really terrific piece of software We actually used it for the previous courses until we switch to fast AI It runs on top of tensorflow It was kind of the gold standard for making deep learning easy to use before but life is much easier with bostero So if you look for example at the last year’s course Exercise which is getting dogs vs. Cats Fast AI lets you get more much more accurate less than half the error on a validation set, of course Training time is less than half the time Lines of code is About a six of the lines of code and the lines of code Are more important than you might realize because those 31 lines of Karis code involved you making a lot of decisions Setting lots of parameters during list of configuration So that’s all stuff where you have to know how to set those things to get kind of best practice results or else these five lines of code Anytime we know what to do for you we do it for you anytime. We can pick a good default We pick it for you. Okay, so Hopefully your is a really useful library Not just for learning deep learning but for taking it a very long way. How far can you take it? Well as you’ll see all of the research that we do at past AI Uses the library and an example of the research we did which was recently featured in Wired describes a new breakthrough in a natural language processing Processing which people are calling the image net moment, which is basically we broke a new state of the art resolved in text classification Which open AI then built on top of our paper to do with more computing more data into different tasks to take it even further And like this is an example of something that we’ve done in the last six months in conjunction actually with my colleague Sebastian Reuter An example of something that’s being built in the fassio library and you’re going to learn how to use this brand-new model in three lessons time And you’re actually going to get this exact result from this exact paper yourself Another example one of our alums ml Hussain Who you’ll come across on the forum plenty because he’s a great guy very active built a new system for natural language Semantic code search you can find an on github where you can actually type in English sentences and find snippets of codes that do the thing you asked for and again, It’s being built with the FASTA a library using the techniques. You’ll be learning in the next seven weeks In production. Yeah. Well, I think this stage is a part of their experiments platform. So it’s kind of pre-production I guess and so the best place to Learn about these things and get involved from these things is on the forums where as well as categories for each part of the course, and there’s also a general category for deep learning where people talk about Research papers applications. So on and so forth so Even though today, we’re kind of got to focus on a small number of lines of code to a particular thing, which is image classification And we’re not learning much math or theory or whatever over these seven weeks and then part two another seven weeks We’re going to go deeper and deeper and deeper And so where can that take you I want to give you some examples that there is Sarah hooker She did our first course a couple of years ago her background was Economics didn’t have a background in coding math computer science. I think she started learning to code two years before she took our costs She helped develop something at she started a nonprofit called Delta analytics they helped build this amazing system where they attached old mobile phones to trees in the Kenyan rain forests and Used it to listen For chainsaw noises and then they used deep learning to figure out when there was a chainsaw Being used and then they had a system set up to alert Rangers to go out and stop illegal Deforestation in the rainforests, so that was something that she was doing Well, she was in the course as part of her kind of class projects What’s she doing now? She is now a Google brain Researcher which I guess is one of the top if not the top place to do deep learning She’s just been publishing some papers now. She is going to Africa to set up a Google brains first deep learning Research Center in Africa now, I’ll say like she worked her ass off. You know, she really really invested in this course not just doing all of the assignments but also going out and Reading in Goodfellows book and doing lots of other things, but it really shows Where somebody who has no computer science or math background at all? Can be now one of the world’s top deep learning researchers and doing very valuable work Another example from our most recent course Christine Payne she Is now at open AI and you can find her post and actually listen to her music samples of she actually built something to Automatically create chamber music compositions you can play and you can listen to online. And so again, it’s her background math and computer science Actually That’s her there classical pianist Now I will say she is not your average classical pianist She’s a festival pianist who also has a master’s a medical researcher in Stanford and studied neuroscience And was a high-performance computing expert at Ian’s shore and was valedictorian at Princeton Anyway, she you know, very annoying person who did everything she does But you know, I think it’s really cool to see how I kind of a domain expert in this case the domain of playing piano Can go through the fascinator course and come out the other end I guess open AI would be You know of the three top research institutes bugle playing or open a would be two of them probably along with Diamond And interesting Lee actually one of our other students or alumni of the course recently interviewed her for a blog post series He’s doing on top AI researchers and she said one of the most important pieces advice She got was from me and she said the piece of advice was kick one project Do it really well make it fantastic Okay, so that was the piece of advice She found the most useful and we’re going to be talking a lot about you doing projects and making them fantastic during this course Having said that I don’t really want you to go to AI or Google brain What I really want you to do is go back to your workplace or your passion project and apply these skills There right like let me give you an example MIT Released a deep learning course and they highlighted in their announcement for this deep learning course this medical imaging example and One of our students Alex who is a radiologist said You guys just showed a model overfitting I can tell because I’m a radiologist and this is not What this would look like on a chest film This is what it should look like and this is a deep breading practitioner. This is how I know That this is what happened in your model. So alex is combining his knowledge of radiology and his knowledge of deep learning to assess Mi, t–‘s model from just two images very accurately, right? and so this is actually what I want most of you to be doing is to take your domain expertise and Combine it with the deep learning practical aspects that you’ll learn in this course and bring them together like Alex is doing here and so a lot of radiologists have actually gone through this course now and have built journal clubs and American Council of radiology practice groups There’s a data science Institute at the ACR now and so forth and Alex is one of the people who’s providing kind of a lot of leadership in this area I would love you to do the same kind of thing that alex is doing which is to really bring deep learning related leadership into your Industry and just your social impact project, whatever it is that you’re trying to do So another great example was this was Melissa fab bras who was a English literature PhD who studied like gendered language in? English literature or something and actually Wrench over the previous job taught her to code I think and then she came into the first day a course and she helped Kiva a micro lending a social impact organization to build a system that can recognize Faces, why is that necessary? Well, we’re going to be talking a lot about this. But because most a I Researchers are white men most computer vision software Can only recognize white male faces effectively in fact I think of as IBM system is like ninety-nine point eight percent accurate on common white face men versus sixty percent accurate sixty-five percent accurate on dark faith dark-skinned women so it’s like What is that like 30 or 40 times worse for black women versus white men? And this is really important because for chemo Black women, you know, perhaps the most common user base for their microlending platform so melissa after taking our course and again working in her ass off and being super intensive in her study and her work won this $1,000,000 AI challenge for her work for Kiva Karthik did our course and realize that the thing he wanted to do wasn’t at his company It was something else which is to help blind people to understand the world around them So he started a new startup. You can find it now. It’s called envision. You can download the app You can point your phone of things and it will tell you what it sees And I actually talked to a blind lady about these kinds of apps the other day and she confirmed to me This is a super useful thing for visually disabled users And it’s not it’s the Level that you can get to with with the content that you’re going to get over these seven weeks and with this software Can get you right to the cutting edge in areas, you might find surprising For example, I helped a team of some of our students and some collaborators On actually breaking the world record for training. Remember I mentioned the imagenet data set Lots of people want to train on the imagenet dataset. We smashed the world record for how quickly you can train it We do standard AWS cloud infrastructure cost of $40 of compute to train this model Using again faster library the techniques that we learn in this course So it can really take you a long way, so don’t be kind of put off by this What might seem pretty simple at first we’re going to get deeper and deeper You can also use it for other kinds of passion project. So Helene esaron Actually, you should definitely check out her Twitter account like ELISA This art is a basically a new style of art that she’s developed Which combines her? painting and drawing with generative adversarial models to create these extraordinary Results and so I think this is super cool. She’s not a professional artists. She is a professional software developer but she just keeps on producing these beautiful results and when she started You know Her art had not really been shown anywhere I discussed anywhere now There’s recently been some quite high-profile articles describing how she is creating a new form of art again. This is come out of the FASTA a course that she developed these skills or equally important bred counselor who figured out how to make a picture of Kanye out of pictures of Patrick Stewart’s head. Also something you will learn to do if you wish to This particular style this particular type of what’s called style. Transfer was a really interesting tweak it allowed him to do some things that hadn’t quite been done before and This particular picture helped him to get a job as a deep learning specialist at AWS. So Another interesting example another alumni actually worked at Splunk as a software engineer and He’d signed an algorithm after like lesson three Which basically turned out its plant to be fantastically good at identifying fraud and we’ll talk more about it shortly If you’ve seen Silicon Valley the HBO series the the hot dog hot dog app. That’s actually a real app You can download and it was actually built by a team on Glade as a fast AI student project So there’s a lot of cool stuff that you can do I’m like, yes, it wasn’t very nominated. So I think we only have one any nominated fast day alumni at this stage So, please help change that Alright The other thing, you know is is is the forum threads can kind of turn into these really cool things So Francisco was actually here in the audience. He’s are really Boring McKinsey consultant like me. It’s a Francisco and I both have this shameful past that we were McKinsey consultants but we left and we’re okay now and He started his threat saying like oh this stuff we’ve just been learning about building NLP in different languages, let’s try and do lots of different languages We started this thing with the language model zoom and add that there’s now been an academic Competition was one in Polish that led to an academic paper tie state-of-the-art German state of the art Basically as students have been coming up with new study that results across lots of different languages and this all is entirely being done By students working together through the forum. So please get on the forum, but Don’t be intimidated Because remember and one of the people everybody you see on the forum the vast majority posting post all the damn time right they’ve been doing this a lot and they do it a lot of the time and so at first it can feel Intimidating because it can feel like you’re the only new person there, but you’re not right all of you people in the audience everybody Who’s watching? Everybody? Who’s listening? You’re all new people, right? And so when you just get out there and say like Okay, nor your people getting these state-of-the-art results in German language modeling If I can’t start my server I try to click the notebook and I get an error What do I do people will help you? Okay, just make sure you provide all the information. This is the you know, I’m using paper space. This was the particular instance I try to use here’s a screenshot of my error People will help you. Okay. Well if you’ve got something to add so if people were talking about Crop yield analysis and you’re a farmer and you think you know, oh, I’ve got something to add so please Mention it even even if you’re not sure. It’s exactly relevant It’s fine, you know just get involved and because remember everybody else in the forum’s started out Also intimidated right? We all start out not knowing things and so just get out there and try it Okay, so Let’s get back and do some more coding Yes, Rachel, do we have some questions About why you’re using breast net is opposed to this session So the question is about this architecture So there are lots of architectures to choose from And it would be fair to say there isn’t one best one but if you look at things like the Stanford dawn bench benchmark Or imagenet classification you’ll see in first place in second place in third place in fourth place is faster. I Jeremy Hatton first a hydrometer plus the irony response from the Department of Defense innovation team Google RIS net ResNet ResNet ResNet. Listen, it’s good enough. Ok, so it’s fun There are other architect is the main reason you might want a different architecture is if you want to do inch computing So if you want to create a model that’s gonna sit on somebody’s mobile phone having said that even their most of the time I reckon the best way to get a model onto somebody’s mobile phone is to run it on your server and Then have your mobile phone app talk to it It really makes life a lot easier and you get a lot more flexibility But if you really do need to run something on a low powered device, then there are some special architectures for them So the particular question was about inception that’s a particular another architecture which tends to be pretty memory intensive and yeah resident I’m for inception tends to be pretty memory intensive but it’s it’s ok. It’s also like It’s not terribly resilient. One of the things we try to show you is like stuff which just tends to always work Even if you don’t quite ruin everything perfectly So Reznor tends to work pretty well across a wide range of different Kind of details around choices that you might make so I think it’s pretty good So we’ve got this trained model and so what’s actually happened as we’ll learn is it’s basically Creating a set of weights. If you’ve ever done anything like a linear regression or logistic regression, you’ll be familiar with coefficients We basically found some coefficients and parameters that work pretty well And it took us a minute and 56 seconds So if we want to start doing some more playing around and come back later, we probably should save those weights We can save that minute and 56 seconds. So you can just go and learn got save and give it a name It’s going to put it in a model subdirectory In the same place the data came from so if you save different models or different data bunches from different data sets They’ll all be kept separate, so don’t worry about it All right So we’ve talked about how the most important things that add on learn what goes into your model what comes out We’ve seen one way of seeing what goes in now, let’s see what comes out This is the other thing you need to get really good at so to see what comes out we could use this class for classification interpretation and We’re going to use this factory method from learner. So we pass in a loan object. So remember a learn object from those two things What’s your data? And What is your model? It’s now. I’m not just an architecture It’s actually a trained model inside there and that’s all the information we need to interpret that model So if this pass in the learner, and we now have a classification interpretation object and So one of the things we can do it perhaps the most useful things to do is called plot top losses So we’re going to be learning a lot about this idea of loss functions shortly but in short a loss function is something that tells you how good was your prediction and So specifically that means if you predicted one class of cat With great confidence. You said I am very very sure that this is a BER man But actually you were wrong then then that’s going to have a high loss because you were very confident about the wrong answer Okay, so that’s what it basically means to have a high loss So by putting the top losses we are going to find out What were the things that we were the most wrong on are the most confident about what we got wrong? So you can see here It prints out three things German Shorthaired before things beat all 7.0 for 0.92 Well, what do they mean? Perhaps we should look at the document So if you we’ve already seen help but and help just prints out a quick little summary but if you won’t really see how to do something use doc and doc tells you the same information is help but it has this very important thing which is Show in Doc’s So when you click on showing dots It pops up the documentation for that method or class or function or whatever Starts out by showing us the same information about what is what are the parameters it takes? along with the doc string But then tells you more information. So in this case, I saw the thing that tells me the title of eight shows the prediction the actual The loss and the probability that was predicted So for example, and you can see there’s actually some code you can run so the documentation always has working code and so in this case, it was trying things with handwritten digits and So the first one it was predicted to be a seven. It was actually a three The loss is five point four four and the probability of the actual class was 0.07. Okay, so I You know, we did not have a high probability associated yet for class I can see why I thought this was a seven unless it was wrong. So this is the documentation Okay, and so this is your friend when you’re trying to figure out how to use these things the other thing I’ll mention is if you’re a Somewhat experienced Python programmer. You’ll find the source code of faster. I’m really easy to read We’re trying to write everything in just a small number of you know Much less than half a screen of code generally four or five lines of code If you click source, you can jump straight to the source code, right? So here is The plot top losses and this is also a great way to find out How to use the faster I’m I agree because every line of code here nearly every line of code is calling stuff in the faster You library Okay, so don’t be afraid to look at the source code I’ve got another really cool trick about the documentation that you’re going to see a little bit later Okay, so that’s how we can look at these top losses and these suppress the most important image classification Interpretation tools that we have because it lets us see What are we getting wrong and quite often? like in this case If you’re a dog and cat expert you’ll realize that the things that’s getting wrong Breeds that are actually very difficult to tell apart and you’d be able to look at these and say oh I can see why They’ve got this one wrong So this is a really useful tool another useful tool kind of is to use something called a confusion matrix, which basically shows you for every actual Type of dog or cat. How many times was it predicted to be that dog? Okay, but unfortunately in this case because it’s so accurate This diagonal basically says how it’s pretty much right all the time and you can see this in slightly darker ones like a five here It’s really hard to read exactly what their combination is So what I suggest you use is instead of if you’ve got lots of classes don’t use a classification confusion matrix But this is my favorite named function in faster. I are very proud of this. You can call most confused and Most confused will simply grab out of the confusion matrix the particular Combinations have predicted and actual that got wrong the most often So this case the Staffordshire Bull Terrier? Was what it should have predicted and instead it predicted an American Pitbull Terrier and so forth. It should have ridiculous I mean actually predicted Burma that happened four times this particular combination happens six times So this is again a very useful thing because you can look and you can say like with my domain expertise Does it make sense that that would be something that was confused about so these are some of the kinds of tools? You can use to look at the upload Let’s make our model better So how do we make the bottle better? We can make it better using fine tuning So far we fitted For epochs and it ran pretty quickly and the reason it ran pretty quickly is that there was a little trick we used these deep learning models these convolutional networks they have Lanes, they learned a lot about exactly what layers are but but now just know it goes through a computer computational computation or computational computation What we did was we added a few extra layers to the end and we only trained votes We basically left most of the model exactly as it was. So that’s really fast And if we’re trying to build a model at something that’s similar to the original Pre-trained model. So in this case similar the imagenet data that works pretty well But what we really want to do is actually go back and train the whole model So this is why we pretty much always use this two-stage process. So by default When we call fit or fit one cycle on a con, Florida It’ll just fine tune these few extra layers add up to the end and it’ll run very fast. It’ll basically never over fit But to really get it good you have to call an crits and unfreeze is the thing that says please train the whole model and then I can call fit one cycle again and Of the error got much worse Okay, why in Order to understand why we’re actually going to have to learn more about exactly what’s going on behind the scenes So let’s start out by trying to get an intuitive understanding of what’s going on behind the scenes And again, we’re going to do it by looking at pictures We’re gonna start with this picture these pictures come from a fantastic paper by Nets, Iowa who nowadays is CEO of clarify Which is a very successful computer vision start and His supervisor is PhD Rob Fergus And they kind of papers showing how you can visualize the layers of a convolutional neural network So a convolutional neural network will learn mathematically about what the layers are shortly but the basic idea is that you’re red green and blue Pixel values that are numbers from nought to 255 go into the simple computation The first layer and something comes out of that and then the result of that goes into a second layer back the third layer and so forth and There can be up to a thousand layers of a neural network President 34 has 34 layers. There’s no 50s 50 layers But that’s not that layer one. There’s this very simple computation. It’s a convolution if you know what they are We’ll learn more about them shortly What comes out of this first layer? Well, we can actually visualize these specific coefficients the specific parameters by drawing them as a picture There’s actually a few dozen of of them in the first layer So we won’t draw all of them and let’s just look at mine at random. So here are my examples of the actual coefficients from the first layer and so these operate on groups of pixels that are next to each other and So this first one basically finds groups of pixels that have a little horizontal diagonal line in this direction This one finds diagonal lines in the other direction Despite ingredients that go from yellow to blue in this direction This one finds greated to go from pink to green in this direction and so forth. That’s a very very simple little filters Let’s layer one of a imagenet pre-trained convolutional neural net Layer two Takes the results of those filters and does a second layer of computation and it allows it to create so here at nine examples of kind of a way of visualizing this one of the second layer features and you can see it’s basically learned to Create something that looks for Connors top left corners and This one is learn to find things that find right-hand curves This one is learn to find things that find little circles Right so you can see how Maya – like this is the easiest way to see it in layer one We have things that can find just one line and lay it – we can find things that have two lines turned up or one line repeated If you then look over here These nine show you nine examples of actual bits of actual photos that activated this filter a lot That’s what other words this little bit of Function math function here was good at finding these kind of window corners and stuff like that This little surly one was very good at finding bits of photos that had circles it. Okay so this is the kind of stuff you’ve got to get a really good intuitive understanding for slightly the start of my neural nets gonna find simple Very simple gradients lines the second layer can find very simple shapes the third layer can find combinations of votes So now we can find repeating patterns of two-dimensional objects, or we can find kind of things that joins that join together Or we can find well, what are these things? Well, let’s find out. What is this? Let’s go and have a look at some bits of picture that activated this one highly Oh Mainly they’re bits of text although sometimes windows, so it’s nice to be able to find kind of like four petered horizontal patterns and this one here since we have a find kind of edges of fluffy or flowery things This one here is kind of finding geometric patterns So layer three was able to take all the stuff in layer two and combine them together layer four Can take all the stuff from layer three and combine them together by layer four. We put something that can find dog faces and Let’s see what else we’ve got here Yeah, various kinds of oh here we have bird legs So you kind of get the idea. So by layer five, we’ve got something that can find the eyeballs of birds and wizards Or faces of particular breeds of dogs and so forth so you can see how by the time you get to layer 34 You can find Specific dog breeds and cat breeds, right? This is kind of how it works. So when we first Trained when we first fine-tune that pre-trained model We kept all of these layers that you’ve seen so far And we just trained a few more layers on top of all of those sophisticated features that are already being created Okay, and so now we’re fine-tuning. We’re going back and saying let’s change all of these Rookies that we’ll start with them where they are, right, but let’s see if we can make them better Now it seemed very unlikely that we can make these lay lively features Better like is there I am likely that the kind of the definition Of a diagonal line is going to be different when we look at dog and cat breeds versus The image net data that this is originally trained on so we don’t really want to change layer one very much if at all or else the last layers You know this thing of like types of dog face Seems very likely that we do want to change that, right? so you kind of want this intuition is understanding that the different layers of a neural network represents different levels of kind of semantic complexity So this is why our attempt to find through this model didn’t work is because we actually By default it trains all the layers at the same speed right Which is to say it will update those like things representing diagonal lines and gradients Just as much as it tries to update the things that represent the exact specifics of what a my ball looks like So we have to change that. Okay, and so, um To change it. We first of all need to go back to where we were before Okay, we did we just broke this model right just much worse than it started out So if we just go load this brings back the model that we saved earlier. Remember we saved it as Stage one. Okay. So let’s go ahead and Load that back up. So that’s now our models back to where it was before we killed it and Let’s run Learning rate finder we’re learning about what that is next week, but for now just know this is the thing that figures out What is the fastest I can train this neural network at? without Making it zip off the rails and get blown apart Okay so we can call it low ll find and Then we can go and learn don’t recorded a plot and that will plot the result of our LR finder and what this basically shows you is this is T parameter that we’re going to learn all about called the learning rate and the learning rate basically says how quickly am I updating the parameters in my model and you can see that what happens is as I in this this bottom one here shows me what happens as I increase the learning rate and This one here. Show what? Hapless and so you can see once the learning rate gets past ten to the negative four my last gets Worse. Okay, so It actually so happens. In fact, I can check this if I press shift tab here my learning rate defaults to 0.003 so my default loading rate is about here so you can see where I lost got worse, right? Because we kind of fine-tune things now. We can’t use such a high learning rate So based on the learning rate finder. I tried to pick something, you know Well before it started getting worse So I decided to pick one Enix’s so I decided I got to train at that rate But there’s no point trading all the layers of that rate because we know that the latent layers work just fine before when we were training much more quickly again, it was the default which was to remind us 0.003. So what we can actually do is we can pass a range of learning rates to learn theater and we do it like this You pass and use this keyword. In fact in Python you may have come across fourth called slice and that can take a start value in a stock value and basically what this says is trained the very first players at a learning rate of 1e make 6 and The very last layers at a rate of 1 enoch 4 and then kind of distribute all the other layers across that, you know between those two values equally So we’re going to see that in a lot more detail, but basically for now This is kind of a good rule of thumb is to say when you after you unfreeze This is the thing that’s going to train the whole thing Past. Hey max learning rate parameter. Pass it a slice Make the second part of that slice about 10 times smaller than your first stage So our first stage defaulted to about 1 in Dec 3 So let’s use about what I knew for and then this one should be a value from your learning rate finder Which is well before things started getting worse and you can see things adding to get worse Maybe about here. So I picked something. That’s at least ten times smaller than that. So if I do that then I get 0.05 788 So I don’t quite remember what we got before Now bit better. All right. So we’ve gone down from a six point one percent To a five point seven percent. So that’s about a 10 percentage point relative improvement with another 58 seconds of training so I would perhaps save for most people most of the time these two stages are enough to get Pretty much a world-class model you won’t win a Carol competition particularly because now a lot Faster. I am on liar are competing on Carol and this is the first thing that they do But it’ll in practice you’ll get something that’s you know about as good in practice as the vast majority of practitioners can do We can improve it by using more layers and we’ll do this next week by basically doing a ResNet 50 instead of ResNet 34 And you can try running this during the week if you want to you’ll see it’s exactly the same as before But I’m using resident 50 instead of resident 34 what you’ll find is it’s very likely if you try to do this you will get an error and The error will be your GPU is ran out of memory and the reason for that is that resident 50 is bigger than resident 34 and Therefore it has more parameters and therefore it uses more of your graphics card memory. Just totally separate to your normal computer Ram This is GPU Ram If you’re using the kind of default salamander AWS and so forth suggestion, then you will be having a 16 gig of compute The pad I use most the time has 11 gig GPU memory. The cheaper ones have 8 gig of GPU memory That’s kind of the main range you tend to get if you also has less than 8 gig of GPU memory It’s going to be frustrating for you Anyway, so you’ll be somewhere around there And it’s very likely that we’re trying to run this you’ll get out of memory error and that’s because it’s just trying to do too much too many parameter updates for the amount of RAM you have and That’s easily fixed This image data bunch constructor Has a parameter at the end Batch size. Yes for batch size. And this basically says how many images do you train at one time? If you run out of memory just make it smaller. Okay? So this worked for me on an 11 gig card? It probably won’t work for you. If you’ve got an 8 gig card if you do just make that 32 It’s fine to use a smaller batch size it just it might take a little bit longer That’s all okay, if you’ve got a big oak like a 16 gig you might be able to get away with 64 Okay, so that’s just one number. You’ll need to try it during the week and again, we filled it for awhile and We get down 44.4% Early, so this is pretty extraordinary. You know, I was pretty surprised because I mean when we first did in the first course does cats versus dogs really kind of getting Somewhere around a three percent error for something where you’ve got a fifty percent chance of being right and the two things work totally different so that we can get a four point four percent error of assad’s for such a Fine grain thing. It’s quite extraordinary in This case I unfroze it and fit it a little bit more than from 4.4 to 4.3 five. It’s a tiny improvement Basically risen it 50 is already a pretty good model It’s interesting because again, you can call the most confused here and you can see the kinds of things that it’s Getting wrong, and I actually depending on when you run it you’re going to get slightly Different numbers but you’ll get roughly the same kinds of things so quite often I find that rag doll and bir-men of things that it gets confused and I actually have never heard of either of Those things so I actually looked them up on the internet and I found a page on the cat site called is this Superman or rag doll, and there is a long spread of cats it’s like Arguing intentionally about which it is. So I feel fine that my computer had problems I thoughtfully similar I think was this pitbull versus Staffordshire Bull Terrier Apparently the main difference is like the particular Kennel Club guidelines as to how they are assessed But some people think that one of them might have a slightly read in those So this is the kind of stuff. We’re actually even if you’re not a domain expert it helps you become one Right because I now know More about which kinds of pet breeds are hard to identify than I used to So muddled interpretation works both ways. So what I want you to do this week is to run This notebook, you know, make sure you can get through it but then what I really want you to do is to get your own image data set and actually Um Francisco who I mentioned earlier He started the language to model thread and he’s you know now helping to TA the costs. He’s actually putting together a dye It will show you how to download data from Google Images so you can create your own data set to play with but before I do I want to Before I do I want to show you Because how to create labels in lots of different ways because your data set wherever you get it from won’t necessarily Be that kind of regex based approach. It could be in lots of different formats So it was telling you how to do this. I’m going to use the feminist sample embolus is pictures of hand drawn numbers I’m just because I want to show you different ways of Creating these data sets The The Emnes simple Basically looks like this so I go path LS And you can see it’s got a training set in the validation set already So basically the people that put together this data set have already decided what they want you to use as a validation set Okay, so if you go path slash train dot LS You’ll see there’s a Farva quadtree in a folder called seven Now this is really really common way to just to give people labels It’s basically to say Oh everything that’s a three. I’ll put in a folder called three everything. That’s a seven I’ll put in a folder called seven. This is a muffin cordon Imagenet style data set. This is the self-image net is distributed So if you have something in this honor where the labels just whatever the folders called you can say from folder Okay, and that will create an image data bunch for you and as you can see 3/7 it’s created the labels just by using the folder names Another possibility and as you can see we can train there at 99.5% accuracy buh buh buh Another possibility and for this M list sample I’ve got both it might come with a CSV file. That would look something like this for each file name What’s its label now in this case the labels are three or seven? They’re 0 or 1 which is basically is it a 7 or not? So that’s another possibility So if this is how your labels are you can use from CSV and if it’s called labels dot CSV You don’t even have to pass in a file name If it’s called anything else then you can call pass in the CSV labels bar there Okay, so that’s how you can use a CSV. Okay. There it is. This is now is it a 7 or not? And not the possibility and then you can coordinated up classes to see what them another possibility is as we’ve seen this you’ve got paths That look like this and so in this case, this is the same thing. These are the folders that I could actually grab the The label by using a regular expression and so here’s the original expression So we’ve already seen that approach and again, you can see that our classes is founded So what if you it’s something that’s in the file name of a path, but it’s not just a regular expression. It’s more complex You can create an arbitrary function That extracts a label from the file name or path and in that case you would say from name and function Another possibility is that Even you need something even more flexible on there. And so you’re going to write some code to create an array of labels And so in that case you can just pass him from lists So here as I’ve created an array of labels through my labels is from lists. Okay, and then I just pass in that break So you can see there’s lots of different ways of creating labels. So so during the week Try this out. Now. You might be wondering How would you know to do all these things? Like where am I going to find? This kind of information right now would I how do you possibly know to do all this stuff? So I’ll show you something incredibly cool. Let’s grab this function and Do you remember to get documentation we type doc and here is the documentation for the function and I can click show in dots and It pops up the documentation So here’s the thing Every single line of code, I just showed you. I took it this morning and I copied and pasted it from the documentation So you can see here the exact Code that I just used so the documentation for fast AI doesn’t just tell you what to do but step to step how to do it and Here is perhaps the coolest bit if you go too fast AI Fast AI underscored drops and Click on drop sauce It turns out that all of our documentation is actually just stupid about books. So in this case I was looking at vision data So here is the vision data notebook You can download this repo you can get clone up and if you run it you can actually run every single line of the documentation yourself okay, so so all of our Doc’s is also code and so like this is the kind of the ultimate example to me of Of experimenting right is that You can now Experiment and you’ll see in in github. It doesn’t quite render properly. This github doesn’t quite know how to render notebooks properly But if you get plowing this and open it up in Jupiter You can see it and so now Anything that you read about the documentation? nearly everything of the Documentation has actual working examples in it with actual data sets that are already sitting in there in the repo for you and so you can actually Try every single function in your browser. Try seeing what goes in and try seeing what comes out There’s a question and can will the library use multi GPU and parallel by default? The library will use multiple CPUs by default, but just one GPU by default We’ve probably what you’re looking at maka GPU into your pot true. It’s easy to do and you’ll find it on the forum, but Most people won’t be needing to use that now and the second question is whether the library can use 3d data centers in IR Yes it can and there is actually a forum thread about that already Although that’s not as developed as 2d yet. But maybe by the time the MOOC is out. It will be So before I wrap up I’ll just show you an example of the kind of interesting stuff that you can do by Doing this kind of exercise Remember earlier. I mentioned that one of our alums who works at Splunk, which is a nasdaq listed big successful company Created this new ad fraud software this is actually how he created it as part of a fast AI part one class project He talked the telemetry of the of users who had Splunk analytics installed And watched their mouse movements and included pictures of the mouse movements He converted speed into color and right and left clicks into splotches he then took the exact code that we saw with an earlier version of the software and trained a CNN in exactly the way we saw and Use that at a train his fraud model So he basically took something which is not obviously a picture and he turned it into a picture I’ve got these fantastically good results for police overall analysis software. So it they’re pleased to think creatively so if you’re wanting to study sounds a lot of people that study sounds do it by actually creating a spectrogram image and Then sticking that into a confident. So there’s a lot of cool stuff you can do with this. So during the week Yeah, get your jet your GPU going try and use your first notebook Make sure that you can use lesson one and work through it and then see if you can repeat the process On your own data set get on the forum and tell us any little success You had it’s like oh I spent three days trying to get my GPU running and I finally did any Constraints you hit, you know, try it for an hour or two, but if you get stuck, please ask And if you’re able to successfully build a model with a new data set let us know and I will see you next week

41 thoughts on “Lesson 1: Deep Learning 2019 – Image classification

  1. so the "!curl https://course-v3.fast.ai/setup/colab" | bash is giving an error on colab? Do I just wait and start the course later cause the link pointed by the link is also unavailable.

  2. Timestamps:

    00:00 – 12:25 Introduction to Jupyter, Jeremy's background, world class applications after 7 weeks, false statements.
    12:25 – 17:55 Jumping into the Jupyter notebook.
    17:55 – 48:30 Oxford-IIT Pet Dataset, untar_data, get_image_files & regex, ImageDataBunch, image size, normalising images, inspecting data, downloading weights and validation sets.
    48:30 – 1:05:50 Success stories about alumni & fast.ai.
    1:05:50 – 1:14:45 Why resnet, saving a model, analysing results with a) top_losses, b) confusion_matrix & c) most_confused.
    1:14:45 – 1:38:15 Finetuning with unfreezing, visualising CNN layers, learning rate, lr_find, max_lr, loading models, different kinds of datasets, fast.ai docs practical examples available as Jupyter notebooks.
    1:38:15 – 1:40:11 More examples of interesting stuff that can be done.

    Thanks again for sharing!

  3. To save you from finding the playlist: youtube-dl 'https://www.youtube.com/watch?list=PLfYUBJiXbdtSIJb-Qd3pw0cqCbkGeS0xn'

  4. Very useful video. One question. If we are building a world class image classifier should not we use a capsule network instead of a CNN??

  5. I have not watched the 2018 one. Just wondering if anyone has seen both can give some insight on how much this one is different than the other one

  6. 1:02:49 About Splunk, if a fraud detection problem is recast as an image recognition, could this suffer from adversarial attack? I.e. do whatever bad deeds bad guys do, then just do one little thing, that result in a few pixel or noise added, such that the network is fooled. Would be interested for security and adversarial researchers to find out.

  7. I found this video after spending a week reading through tensorflow. . I am someone who just wants to use machine learning as a tool like a hammer for some projects and fastai seems exactly that. I can't wait to see what this becomes

  8. Is there a version with better audio? The compression creates many high pitch artifacts, and is pretty hard/uncomfortable to listen to in general.

  9. If someone encounters error like Name ConvLearner() not defined, then use cnn_leaner(). Earlier u could have also used create_cnn().

  10. What do I need to know to build a deep learning framework? please tell me the courses and books. please answer

  11. I came here to complain about the sound, figuring I would be the only one as it was a small detail. Forgot this video's target market are developers 😀

  12. Why is the audio so poor? next time try to ket a better microphone, this really depreciates the learning experience.

  13. I am currently working on the construction and study of neural networks. So I was thinking if an algorithm like this could be able to generate music. For this reason I have created 3 different Ai models (a Feed Forward LSTM, an autoencoder and an autoencoder with attention) with different parameters (cell size) and with different training datasets (one form classical piano, one from guitar and one with both instruments) and I have created a game(on this link: http://geofila.pythonanywhere.com/vote ) where you have to listen to pieces and try to figure out which one is made from a computer and which one from a human. All of the code is free on github. Separation in my opinion is not so simple. I look forward to see your results.

  14. I am wondering why the Convlayer doesn't even ask for the number of hidden layers. If it's taking some default values, what are those?

Leave a Reply

Your email address will not be published. Required fields are marked *