I moved this discussion here, so as not to confuse the people using a different technology.
I'm sending this to you since the comments box seems to have a character limit below what I wanted to post. Hopefully that's not too much hassle!
In our CS class, we did our handwritten digit processing using about 1800 images, but I get the point: a lot of data is needed to train neural nets. The thing about the data we used is that we didn't collect it ourselves: it was preprocessed data that was freely available on the web. I'm curious if any such data might be available for our project -- if not online, perhaps from a company with experience in this. Maybe Polysync? Even outside of that, it might be useful to talk to them to see what their software stack is like, roughly.
Another cool thing about the data we used was that it was preprocessed for us. Part II discussed some of the processing steps we'd need to go through -- neural net or no -- before digesting the data. It seems that writing that processing ourselves would likely be time consuming and not that much fun. You linked to a video by OpenCV (a library I'm somewhat familiar with from robotics); the tools it provides are from my experience quite versatile and useful with minimal overhead (including the preprocessing and grabbing webcams with very little hassle.) They've also got OpenCL/CUDA support for GPU processing. AMD and NVIDIA have both peen pushing their GPUs for neural nets; I can't say I'm experienced enough to know if there'd be a performance bottleneck there though.
I think in general, using existing tools will allow us to do cooler things faster. Of course, that isn't to say we shouldn't understand how those tools work: how else would we utilize them properly?
Tom Pittman replied [2017 June 16]:
Jonathan has raised for us the question of neural nets (NNs) and using existing code. It's a good question. Jonathan is absolutely correct in observing that the same amount of effort can assemble from a kit of pre-made parts a much cooler product than anybody could build from scratch in the same time. It helps if all the parts exist, as they did for his handwritten numbers CS project. Me, I think it's more fun to "boldly go where no man has gone before." But YMMV, and you need to decide for yourself.
This summer program is about what you want to do. We have a large enough team so we can attack the problem with both a NN implementation and an ad-hoc code approach. They have different risks and rewards and while I've given more thought to the challenges of ad-hoc code, the same for using NN are not entirely opaque. So let's what they are to determine if they are manageable.
Let's start with the math. What I found on NNs that do images, they typically have one perceptron neuron for each pixel of image. 640x480 video would be about 300K neurons in the first layer. I've never actually done this but Jonathan probably can confirm it, all the websites describing NNs show a hidden layer with more neurons than the input layer. Let's assume, for simplicity, that both layers are the same size, and that there is only one hidden layer (deep learning, which you probably need for pedestrian identification, would require several hidden layers). So between the first and second layer you have N2 synapses, about 1011. If you look at the published NN code or explanations, each synapse is a multiply+add, times 1011 synapses times 10 frames per second is about 2 peta-flops. That's slightly more than the fastest GPU I could find -- maybe a half-dozen game boxes, but nothing mobile came even close. There might be some optimizations you could do, and I have a lot of experience cramming more computations into smaller and slower processors than anybody thought could do that much. It's an N-squared problem, not NP-complete (mathematically impossible in real time), but usually one tries for N=linear (like the ad-hoc solution I would prefer) or N-log-N (like a sort).
Coding the NN itself seems to be trivial -- I found tutorials showing 11 lines of Python or 30 lines of C (=Java) -- the hard part of programming a NN is programming the training data, which is assembling and tagging the images. The idea is that a virgin NN knows nothing until it is programmed by feeding it training data -- in our case images -- for each of which the NN is told whether that image meets the criteria or not. Or rather, it makes a guess, and then you tell it right or wrong, and it adjusts its internal weights to make the next guess better. That means you need lots of images that have been tagged with the correct answer.
It may not feel like "programming" at all, but that's what it is (in a very high-level language). The government paid to assemble and normalize and tag the handwritten digits Jonathan used in his CS project, but AFAIK nobody has done that for pedestrians. If we are going to use NNs to find pedestrians, somebody needs to create thousands or hundreds of thousands of tagged and normalized pedestrian images. So either we need to do this ourselves or find someone -- perhaps one of the corporations with an autonomous vehicle projects -- to give it to us.
Polysync and Intel both have autonomous projects. Both are also in Portland. So the first order of business should be that if any of you have contacts at either of these we should use them right away to make contact and see if they have and would be willing to give us the images we need to train the NN.
Another alternative is to create our own training set. Maybe we can construct in the time available, a couple dozen properly tagged and normalized images of pedestrians (and non-pedestrians). The problem is that won't be enough because unlike stop signs and handwritten numerals, pedestrians vary widely in appearance. Unfortunately, the limited amount of training the NN would get from such a small data set is not going to result in very good accuracy.
Alternately we might decide that there is a different, but just as important problem that we also need to solve in order to achieve our goal of an autonmous car. It's a problem that we had thought would have to be put off until next summer but that we could jump-start now. That's controlling an actual vehicle around a track. A track is nothing more than an idealized road -- with straight-aways, curves and curbs, but no cross streets, traffic lights, stop signs etc. Driving a track is a stepping stone toward driving on a road. It's also a problem for which NN are better suited. Getting that problem started this summer would accelerate our progress towards the ultimate goal.
The training data for it is far simplier than picking out pedestrians, just simultaneously record the track ahead and the steering wheel, as you drive through the track. The position of the (center-marked) steering wheel shortly after being presented with the track scene is the "correct" answer the NN is being trained for. I'm told Pat's Acres is an available track where this could be done. It's still a tough challenge, but a dedicated team of say 3-4 working the summer could likely pull it off, and it's viable and worthy goal because it not only leverages what NNs do well but also gets us a step further down the road a year ahead of what we wre planning.
In large projects the only way to get where you are going in a reasonable time is to divide the work into parallel tasks that progress independently. We didn't suppose we'd have the manpower to approach our project the way real project teams do, but Jonathan's post opens up new opportunities. However, using NN is going to require more advanced preparation that the ad-hoc pedestrian identication effort. The team working with NN will have to decide what's going to be necessary to implement the driving function between now and when the workshop starts and either get it from someone else or create it.
My purpose in this program is to provide you with whatever support you need to accomplish your goal(s). If dividing into two teams is what you want to do, I'm all for it, and I will do my best to make sure you have whatever technical support you need to do that. That will work better if we can get a couple more students to sign on to the NN approach.
Assuming the NN team starts immediately, when we gather on July 17 the two teams will work in parallel, the NN team finishing the second half of their project, and the other team working from start to finish. We'll gain experience with two different approaches to image recognition and with luck have even more to talk about and show off during our final presentation.
We've heard from Jonathan; who else is on the NN team? Post your replies here.
Jonathan wrote [2017 Jun 16]:
Comments: There are also other topographies for NNs. We used input/hidden/output, but that's not the only approach.
Regarding processing time: the number of hidden nodes required varied vastly dataset to dataset, and many problems could be "solved" using various numbers of nodes. It's unclear exactly what the compute power required would be, though I certainly hope it wouldn't be 2 PFLOPS, as the most powerful GPU I know of (Titan Xp) find is only 12 TFLOPS. Given that Tesla themselves use a neural net for their existing autopilot (how and what for is unclear), my guess is that it wouldn't be necessary to build a new datacenter to process the information.
With that said, neural nets IMO are a tool, not a solution. While fragmenting the team might be useful in a certain regard, we should still break down the problem into its component pieces and decide what tools are best to solve what problems.
I agree that gathering data is important. Do we have contacts at Intel or Polysync we can talk to?
Steve Edelman wrote [2017 June 17]:
This is a very exciting development because it's not only going to allow us to explore complementary technologies leading to the goal of fielding an actual car but do so substantially sooner than we had originally planned. The problem, if you can call it that, will be for each of you to decide which team to join.
While I think reaching out to Polysync and Intel are worthy goals I doubt their organizations are agile enough to respond in time to get us the images needed to get the NN operating this summer. What's required is to assemble the teams so that the groups can get going and be prepared when the workshop begins.
Students who want to work on NN project should email me by next Weds. I'll circulate this information to the group so that we can arrange our first meeting to discuss how to proceed.
Tom Pittman replied [2017 June 17]:
Steve asked that you decide which of two technologies to work on, and to decide in the next four days. You might feel like you don't have enough information to choose one or the other, so I'll try to fill in some of the gaps.
The two technologies are conventional (designed) code vs neural nets (NN).
Code means thinking about the problem of identifying pedestrians, then writing a program or programs in Java to do it.
NN uses a very tiny code engine, and most of your effort will be preparing the data images for training it, and tweaking the formulas and perhaps the configuration so it learns better, but the code is essentially unchanged. You would still be "programming" but it's at a very high level and doesn't feel like coding at all.
The first thing the Code group needs to do is decide how many pixels and what frame rate to run at, and agree on the algorithm to pick out of the images what might be pedestrians, then start coding. It's all software and the tools are available now and known to work together. The work will be similar to other programs you've written but with more attention given to how your choice of methodology affects performance, then dividing the project into sub tasks so groups of two or three can code and test their work in parallel before you put it all together into a system.
The first thing the NN group needs to do is decide whether you want to compete with the Code group, or take up Steve's suggestion to use the NN for track following (steering the car).
Jonathan seems to think -- correct me if I'm wrong -- that the pedestrian data is available. If that's true, the the option to compete and see which technology is best at identifying pedestrians may be viable. If not, the time and difficulty to create the required training image set makes that choice problematic.
Track-following image data you can pull off of a video camera while driving around the track, and you can tag it for training purposes automatically by recording the steering wheel in the same video. At least one other autonomous vehicle project, the guy admits to training his NN this way.
The NN group needs to decide before the end of June whether you are doing pedestrians or track-following, or you probably won't have much at the end of summer but good vibes to show for your efforts. That effort is not going to be a four-week frolic in the park, you will need all the time you can put into it, mostly doing non-coding things, but which done well can be very rewarding.
I'm here to support you, whichever group you choose. I understand the code issues along both paths, and Steve has the management experience to help point you in the right directions for the non-technical issues. Whichever way you decide, it's going to be challenging and fun.
If you still have questions, ask! Post them here (other people might have the same questions), or if you prefer, send me a private email (tell me it's private so I don't post it), and I'll answer in private. I'm here to support you.
Sofie wrote [2017 June 20]:
Are we going to be using any particular libraries for this project? Also, I want to know more about which programming concepts will be useful for this, I'd like to do a little research. I'm still a little bit unsure what is the best direction to take right now. Thank you,Sofie wrote [one hour later]:
Some questions I have about some the information you had:Sofie wrote [one hour later]:
I'd like to know more about how the human eye uses parallel processing, I found that note a little confusing. What allows us to spot patterns better than computers?
Could you explain data in the frequency domain more? And why does luminance need to be deleted from data analysis?
You relate color to sound when explaining Fourier Transformations, and I'm still a little bit confused on that relationship.
Could you explain how the design of DSPs lead to algorithms that lead to a lot of multiplications?
There are still some other things that I don't understand the language of completely. Is there a good place for understanding some of the terminology you use better?
I'm interested in rolling my own code, but I think it will be useful to look at other examples so it can be analyzed. This would be for identifying the pedestrians. I do want to get more familiar with the language though, so I'd like to know more about what will be important in the code. I think I'll follow your advice about surveillance cameras, if we can access one.
I am a little bit nervous about the idea of NN, but it seems to have many benefits. I'm not familiar with this idea, but would you be teaching us how to use it? What skills would be useful to have if we were to take on this project.
Tom Pittman replied [2017 June 21]:
Sofie asked some good questions, let's see if I can answer some of them.Are we going to be using any particular libraries for this project?The short answer: Only if you want to.
Except for complicated or esoteric stuff like math and user interface, it often takes almost as much effort to use somebody else's library as to write your own, but your mileage may vary.Also, I want to know more about which programming concepts will be useful for this, I'd like to do a little research. I'm still a little bit unsure what is the best direction to take right now.That depends mostly on what you and the rest of the team decide. Mostly I'm thinking the coding skills you already have -- arrays, for-loops, method calls -- will do you well. If you decide to go with NNs, I probably can't be much help. They look simple, but if you just do what the NN proponents tell you, it doesn't work, there's some secret sauce they aren't telling us about. I would blame it on entropy, but only because I couldn't make my attempt work. YMMV.I'd like to know more about how the human eye uses parallel processingMe too. And so also half of the biologists out there. I think I was taking psychology in college, and they told about opening up a frog and measuring the nerve impulses in the optic nerve, from the frog's eye to its brain. In the quiescent state there was a small amount of activity, perhaps just enough to let the brain know that there's a scene out there, nothing to worry about. The activity increased when something entered the scene, a lot more if a large dark object entered from above (somebody was about to step on the frog, time to leave), and a huge amount of nerve activity if a small dark object appeared directly in front (lunch!). All this was happening in the eye, before any signals got to the brain. AFAIK, nobody knows how that works.Could you explain data in the frequency domain more?Not really, but there is a lot of stuff on the internet. Follow the links I put out, then pick out terms you don't understand and Google them. You have to wade through a lot of irrelevant stuff, but the good stuff is there. Google puts a lot of research into making the good stuff filter up to the top of their hit list (right under the paid ads ;-)You relate color to sound when explaining Fourier TransformationsThat's frequency domain. Sound is a repetitive pressure wave, up and down, up and down. Some images are repetitive (either horizontally or vertically, or both), light and dark, light and dark, so you can use the same algorithms for analyzing them that the audio engineers use for sound. I think it's just an academic exercise, because nobody seems to find any use for it.Could you explain how the design of DSPs lead to algorithms that lead to a lot of multiplications?I think it's the other way around. There are a lot of multiplications in sound and image processing, so the people who design digital signal processors need to do the multiplications very fast. Then because the DSPs do fast multiplications, the software people feel comfortable doing a lot of multiplication. Parkinson's Law ("expenses tend to rise to meet or exceed income") applies: the more multiplications your DSP offers you, the more you will use them.Is there a good place for understanding some of the terminology you use better?You are doing it: Ask questions. The worst that can happen is I might say "I don't know." Then you might have to go ask Google. Google knows everything, but it's sometimes hard to think up the right way to ask. Half of all Google searches fail, even when experts are asking, even when the data is there. You just gotta keep trying. Sometimes what you did find includes some search terms you didn't think of. But I try not to use a term I cannot explain, except maybe to say so.I'm interested in rolling my own code, but I think it will be useful to look at other examples so it can be analyzed.I agree. I will try to make sure you have appropriate examples to look at when we get there. First we -- you and the rest of the team -- need to decide what you want to be doing. We talked a little about that at the meeting in March. We can do more here, in this forum.I am a little bit nervous about the idea of NN, ... would you be teaching us how to use it?I'm a lot nervous about NNs. I found on the internet how to code it, and I did that, but I did not find how to make it actually learn, just a lot of hand-waving (that's mathematician-speak for "I can't explain it, but I don't want to admit it").
Alec wrote [2017 July 4]:
While I look the idea of using neural networks, I don't think it's really a viable means of attacking this problem, since it doesn't look like we have the data. The other thing is that the metaperameter tuning for any machine learning algorithm is very difficult, and requires a fair amount of skill to get right. I don't think anyone here has done more than 0-2 such projects, and thus nobody would have the requisite knowledge to set up a good network even if we had the data available. I'm trying to look into other possible routes of attack right now, but I really don't think neural networks are a viable route of solving the problem. On the other hand, convolutional neural networks are probably a good tool (if anyone knows of a better network structure, please tell me, but from my knowledge, a convolutional net is probably the optimal network for the problem of recognizing pedestrians). Even if it doesn't lead to a working solution to the problem, practice building and training a conv net will definitely give some good experience.
Super important! If people want to use neural networks, we have data!!!
I'm reading through literature right now, and will post again in a few days with some good articles to read. While I don't think we, specifically, will be able to create a good working network, it looks like there's enough previous work so that we can probably get reasonable results by building off of literature. Hence, I would be interested in attempting to write a neural network for pedestrian recognition. I think we should figure out who would be interested in doing a neural network soon though, because a neural network approach definitely requires reading a fair amount of literature and a decent amount of discussion thereafter on how to design the network.
As for people that are nervous about neural networks---it's not difficult to explain the basics of them, and I'm sure I (or someone else that's worked with the basics of them) could teach about them pretty easily to someone who's interested. If you want to start now, a good place to go is the course notes here:
They're from Andrew Ng's course, which is a well known (and commonly used) starting point for learning about machine learning. Don't try to skip the early stuff-- the principles of gradient descent and such are precisely the principles used in basic neural networks.
A lively email discussion erupted on the topic of NNs, and it was suggested I should post the relevant parts here for everybody to read, starting with our APW-2 Teaching Assistant, Merlin Carson...
Merlin wrote [2017 July 7]:
My intuition tells me that this [Udacity] steering challenge is less complicated than the pedestrian detection. It seems like more of a linear problem, where the neural net just has to match the angle of the lines on the road to a steering angle. Also the scaling of where the camera focuses should be directly proportionate to the speed the car is traveling. With pedestrian detection there is a wide variety of sizes, angles and colors to worry about. There is also the issue of occultation, where the pedestrians are partial covered by other objects...
Steve Edelman replied [2017 July 7]:
I think Tom felt the same thing however we don't have training images for the driving problem, or at least we haven't found any. There is the option of synthetic images using the "game" that someone wrote to train NN for this exact purpose, or creating our own video, so it comes down to a choice between the lesser of two evils.
Pedestrian identification perhaps using cherry picked images with people who are easy to identify vs teaching the car to steer but then we either have to create our own training images unless we find some.
As I understand it, and anyone correct me if I'm wrong, that is the decision we are going to need to make. Anyone want to weigh in on what they think and why?
Alec Leng replied [2017 July 7]:
Steve's summarized the situation about steering pretty well, so I'll just add a few details. From what I understand, training using Udacity's game data is still pretty decent/if a group decides to do that, they get the benefit of being able to see where they would've ranked on Udacity's leaderboard. However, the only other dataset I've seen used for steering with NN is by Nvidia (used for this paper:
so unless someone has contacts there or feels like cold-emailing the authors, we'd have to get our own data.
The latter is significantly more challenging though, and since I'm unsure how much experience people have with pre-processing data, even if we can get the videos, it'd be quite a pain for someone to turn them into something you can train on. Since I anticipate myself as one of the people who'd be doing the pre-processing, I thus would prefer to use Udacity data if we go for an end-to-end neural network solution (we can always do steering with more conventional algorithms, and only use CNN for pedestrians).
On the other hand, the pedestrian detection problem isn't really *that* difficult, especially since in reality we wouldn't code the algorithm from scratch. It's quite common practice (when applying neural nets) to take structures that have already been coded and pretrained (usually on the ImageNet data), and then training them for one's specific task/optimizing the metaparameters for best performance. Especially since all papers on the topic explain both their network architectures, and the code they used (often they start with openly available code from github), the main difficulty then lies in modifying the existing work to our purposes. Furthermore, data is definitely much easier to find, given that there are multiple standard databases used for pedestrian detection (with the Caltech database being one of the most common).
And Merlin, while your intuition about pedestrian recognition being difficult makes sense, in reality convolutional networks (and other fancy algorithms like Histogram of Gradients) allow us to process images quite quickly and accurately (if you want to read about something quite cutting edge, go here:
Note that the prerequisite knowledge to understand the algorithms is quite high, but you can probably understand the sections about results---in particular, how their method is extremely fast and accurate). In contrast, steering isn't so simple, because you have real-life mechanical problems to deal with as well.
I think it might be worthwhile sharing all this with everyone, but I'm also wary of scaring people off (for example, I haven't been trying to simplify anything in my email, so to someone already wary of the difficulty of neural networks, it might turn them off from it even if they'd be able to get the hang of them). Furthermore, I anticipate the neural network group would have to have in person discussion about these types of question's anyway (since even explaining the different types of NN approaches takes some time), so there might be little difference between posing these emails now and talking about it with people interested in NN at the start of the workshop.
Finally, Merlin, what network architecture were you using for the Udacity game?
Tom Pittman replied [2017 July 7]:
I don't think you need to worry about the physical mechanics of actually steering the vehicle. If your training data is video of a vehicle actually being steered through a track with the steering wheel visible (a piece of white tape on the top to make its position more easily discerned), then being able to predict the steering angle (the position of the white tape) is all the NN needs to learn. The only data preparation needed is to extract the video (and the position of the white tape, for training purposes). Admittedly it's not as trivial as using existing data sets, but I came up with the idea before Alec told us about the CalTech pedestrian data.
Steve Edelman offered this link [2017 July 8]:
Alec wrote [2017 July 8]:
First, in light of the fact that the Udacity game data seems to generalize well (Merlin sent me this video), I don't think the choice of steering vs pedestrian recognition matters too much. The main thing I have in favor of pedestrian recognition is that I think it'd be easier to display a product (e.g. having it detect pedestrians in photos we take) when compared to any kind of actual steering. Furthermore, the literature on pedestrian recognition seems to be larger. However, I'm open to switching (we'd use the Udacity data) if anyone has a strong preference. In that case, most of the rest of this email is still applicable. That is, in either case, I strongly thing we should use a premade network, but if we do steering, then substitute one of the easier "Udacity challenge tutorial blog posts" (e.g. this post) for the Stanford CS231 report I'm about to discuss.
For pedestrian recognition, there is a student project from the 2016 offering of Stanford's CS231n: Convolutional Neural Networks for Visual Recognition which I think would be a good model. The report can be found here, and it shows that the project is realistic (it was done by single person in neural networks class, who's only "difficult" prerequisite was introductory machine learning). Now, that may seem like a lot, except for two things. First, not everyone needs to understand all the is and outs of the network, unlike in the class where they'd be expected to, and secondly, while they use both AlexNet and a more complex residual net, I think we'd only use the simpler AlexNet. Then, the, results in paper should be relatively easy to replicate (for example, they trained the network for a very small number of epochs but were still able to get decent results), especially given how they've outlined the process for us already. Indeed, it's my understanding that the entire project was completed by a single person over a period of 8 weeks---we'd have multiple people, trying to effectively do under half the work in four weeks (since we're not doing all the research, writeups and presentation that were in the course, and we'd only look at the simpler of the two networks they examined). Thus, I think it's realistic.
In particular, using AlexNet, there exist benchmarks already (for example, the Stanford student used one of the networks provided here: https://github.com/soumith/convnet-benchmarks). This is easier than in general, since not only do we not have to deal with coding a network, but also students will get exposure to one of the big machine learning libraries-- I'd default to Tensor Flow, unless anyone has opinions one way or the other. Furthermore, the practical absence of"coding" reinforces the idea that designing a good algorithm is basically everything.
Furthermore, given how commonly used these benchmarks and libraries are, they should work well (the repository was last updated a month ago, and the libraries are all well maintained). Even in the worst case, it's not like we'd need to dig though thousands of lines of code to find a bug. And as a final bit of support for AlexNet, I've looked in the literature, and it's a quite standard benchmark, while giving one of the best tradeoffs between accuracy and complexity. I could reasonably explain how AlexNet works in a week, but some of the more cutting edge algorithms (like this one by Google, which is really cool but quite complex) would be far more difficult to even give a basic sense of.
Finally, given all this, I get the feeling that it might even be too easy, which is always a good sign--and if it is too easy, we can either have them mess around with a basic XOR model or even a MNIST classifier so that they can get a better understanding of the algorithm. Or, of course, they can spend time tweaking and adding on to the basic network.
Separately, I do have a concern due to how networks typically take at least hours if not days (in practice weeks, but not in this case since we can always cut down how much training we're doing), which potentially leads to a lot of idle time during training (since you're waiting for the trained model to finish before you can see any results). I have a few ideas on how to deal with this. One is to have people also work on another project (either the conventional approach or on neural networks for steering), another is to teach them more about the ins and outs of the network (it'd be a little bit of either lecturing or having people read certain articles, plus having them code some basic networks from scratch, like the XOR or MNIST examples I mentioned before), and a final option is for them to discuss their network and possible changes to it/about where to go from there. I think if we allow them to choose from those examples, they'll have plenty to do.
Given those options though, I still have some managerial concerns about what to do if we don't have enough manpower to provide all the options. For example, without someone to guide them, several of the options mentioned, like doing a neural net for steering or learning more about neural networks algorithms, have the potential to turn into pits of despair given how much stuff there is out there, much of which can be intimidating to a newbie. However, this is a problem that can likely be dealt with later, once we get a better sense for what people are actually interested in. But I digress.
In essence, I think we should use AlexNet to classify pedestrians, a basic setup of which I will try to have running by the beginning of the workshop. However, if anyone has strong feeling that we should do steering, I could also do research and provide a network architecture that I think would make sense to use for steering--indeed, given the potential for free time while the network is training, it might (much emphasis on might) be possible to do both.
Merlin wrote [2017 July 9]:
In general I agree with what Alec is saying. I think ultimately we need to decide between whether the goal for these kids is to learn fundamentally how a neural network functions and they want to code or is it to learn how to train a neural net and have a feasible tool for eventually putting into a real car. Both are conceptually complicated and could takes weeks to accomplish.
If they are more interested in coding a neural network, my suggestion would be to start very simple. Have them build a vanilla neural net and program a simple 2D driving game. Then train the network to avoid pedestrians that are randomly crossing the road. I think this is a doable project, but would not generalize to a real world driving car.
If the goal is to hopefully use this in a real car, using a pre-built neural network is a much better idea. I think Alec's idea would allow the kids to focus more on a high level understanding of how neural networks function and how to train them, rather then getting stuck on the nitty-gritty of the low level implementation.
As a programmer I like the idea of coding something from scratch and the math isn't too complicated. But there is a lot of mathematical analysis related to fine-tuning the training which I haven't been able to get past yet and could be exceptionally challenging for the kids. Alec seems to have a good plan of attack for the other option and I defer to his expertise on this topic.
Alec replied [2017 July 9]:
Personally, I think it would make more sense to have them do a commonly used, non-driving related example like MNIST classification to learn about neural networks, and then separately see the high-level implementation in practice. I think the "simple 2d driving game" will be more work than that, and I think the extra time spent on it wouldn't be worthwhile.