Last weekend I’ve finally completed University of Toronto’s specialization on self-driving cars. It has been a rewarding journey so let me take a moment to review and reflect.

Generally I would recommend it to people with engineering background and anyone who’s interested in this field. The specialization has four courses, which cover introductions and a kinodynamic model, sensing and localization (LIDAR and IMU), visual perception and 3D modeling, and lastly motion planning and actuation. Most of the topics are useful and necessary in order to get into this field, and combined they provide a good overall feel of what it takes to build a self-driving car.

The courses

For 1st course, the bicycle model of a car appears amazing simple yet powerful when it comes to modeling the trajectory of a car. I liked how it shows one can approach an engineering problem by layering and abstractions, e.g. by simplifying four wheels into two without losing generality. Also as an introductory course, it contains many real-world engineers and entrepreneurs’ opinions on the industry, the project, the problem domain, and future expectations. I specifically like Paul Newman’s take on the industry and why building a self-driving car which can adapt to all variety of infrastructure instead of building an infrastructure to some specific spec is necessary.

For the 2nd course, as a CS-background I find it challenging yet very rewarding to get to understand and implement Kalman filtering and its variations. I even went out to compare the slides on Extended Kalman Filter against the real open-sourced code in Baidu’s Apollo project, and you might be surprised that the formulas almost get plainly translated line-by-line into C++ code.

3rd course is the easiest one for me because most of the deep learning part was already covered in deeplearning.ai’s specialization. What was new to me was mainly the OpenCV part where some legendary algorithms like SIFT still play a shining role. Also I find this course most comprehensive and thorough where it starts with a classical pinhole camera model and works all the way towards a recent solution to image semantic segmentation problem using VGG-net.

4th course builds up a top-down approach, where the high level problem of route planning is solved using Dijkstra’s algorithm, and then a mid-level behavior planning problem tackled using finite state machines, and then finally a local-level maneuver planning problem using parametric curve solving. The highlight is the final project where all the pieces are put together in order to successfully drive the vehicle to perform obstacle avoidance, lead vehicle tracking, as well as stop sign handling. This almost gives you a feeling of how a real self-driving car performs in action.

Room for improvement

I feel like the programming assignments could’ve been more well-rounded, because there are sometimes bugs in provided utility functions, a Python version mismatch that broke the Jupyter-Hub, and also the feedback from wrong submissions was very minimal - a better prepared assignment could’ve included more intermediate steps of submissions so that learners could sanity check their progress as check-ups.

Also the teaching staff were not responsive enough when it comes to answering questions in the forum. I feel like the forums were only used by students to help each others (although very useful as well).

Outside the scope of this specialization I also find that OpenCV has a relatively poor documentation. Many of the functions’ Python version has wrong return type and/or little explanation of the algorithm background.

Afterthoughts

This area is almost white-hot in recent years. I can count a few high-profile startups as well as big names (Pony.ai, Tesla, Zoox, drive.ai, Waymo, Momenta, Tusimple, Baidu’s Apollo, Uber’s ATG, Cruise, comma.ai, Mobileye, etc. just to name a few without a specific order), each with a different approach and focus area. They are also taking in huge amount of investment money and resources and racing to build a larger fleet of autonomous cars by the day.

I think this specialization gives a glimpse of what the autonomous driving future will be like. Indeed, “any sufficiently advanced technology is indistinguishable from magic”. I think the moment of that magical future is not yet near. Like Kaifu Lee and Rodney Brooks challenged, it’s not anywhere near 2020. My (unqualified) opinion is that it’s not going to come within the next five years, but in mid-/long-term it’ll be possible in our lifetime (and luckily much sooner).

My short-term pessimism comes from understanding of the problem and design domain we need to tackle as learned from the cource material. And these are (like anything else) threefold: technology, talents, and capital.

Technology-wise I believe the current trend of boost in the industry is largely driven by the software and hardware upgrades from dissemination of deep learning (near-realtime object detection algorithms like Yolo, and cheaper GPU/TPUs). It certainly gets us near the goal of L4/L5 autonomy but to really get there it’s not enough. Tesla hasn’t fully convinced everyone that Lidar is optional, nor hasn’t the accumulated number of vehicles of all the fleets driven down the price significantly enough (I think). On behavior planning the usage of reinforcement learning is still early stage, and it has to deal with explainability of the agent’s policy (to pass regulatory and media’s scrutiny). As for the access/sharing of data, I haven’t yet seen the “ImageNet” moment (e.g. like what Bert is likely perceived in NLP community) of high-precision map and many other areas.

Predictions like those could easily be wrong but it stands that deep learning itself hasn’t moved everything in the industry just yet, and there are plenty of such technicalities that need be solved, which is hard within the next 5 years.

Talents, both in engineering and management, are lacking. I think some of the founders of the startups can raise tons of money because of their track of records in Internet and software industry. But to tackle this problem one needs more than that. Managing hardware supplies, OEMs, risk assessing, quality control, etc. is (probably) harder than pushing accuracy rate up 1% or fixing a software continuous delivery pipeline. Good engineers are also hard to be found. Earlier definition of full-stack engineers (in the Internet industry) might involve CSS all the way down to database and dev-ops, but to build a self-driving car it brings a whole new level (you’ll know how to calibrate cameras, analyze point clouds data, understand gears and transmissions, and also how to program RL agents). If we don’t need that type of full-stack-ness, we still need engineering leaders to cover the whole lot. Training such people takes time and a lot of failures. The progress probably will take many deaths of companies which train talents in the meantime. Hopefully the it is slow but steady. I have no much authority in this area but as I will happily be proven wrong.

Given the timespan will take likely > 5 years, it’s also a challenge for venture capitals. Large corps like Google can be supported by its board to invest in this type of moon-short projects, but for financial VCs, their LPs might not be that patient. Maybe some of the startups either have to pivot and re-focus on something more achievable than L4/L5 autonomy or they have to persuade either their investor or acquirers. For high profile startups, both ways are exponentially more difficult when you have already raised hundreds of millions of USD.

Having written all that I think still the problem is hard but solvable. It might take another example of “PayPal mafia” story where companies appear, growth, burst, and them disseminate seeds of talents, industry know-hows, etc. and then just behind the horizon a new level of advance can be made. The whole society is actively excited about this area and both US and China (and they are not the only ones) are pouring financial and policy resources. Maybe a new form of venture capitalism can be formed to adapt to this industry which can foster better cross-industry corporation, etc. I don’t know. As an engineer background I think this might just be the Apollo project of our time, and it’s exciting.

Jiayu Liu

Rumination on tech, start-ups, and life.

Review of University of Toronto’s Coursera Specialization on Self-Driving Cars

The courses

Room for improvement

Afterthoughts