How I passed Google’s TensorFlow certificate

I passed the test for Google’s TensorFlow developer certificate back in April and got the certificate in a few weeks. This is a relatively new addition to Google’s developer certificates program, and I would love to share a bit more context on it without too much spoiler, hoping it’ll be useful to anyone who’s interested in taking it.

Motivation 

As an engineering manager, my own or my team’s daily job does not involve machine learning per se, so learning TensorFlow and deep learning in general is purely motivated by my personal interest. Throughout my career so far, I have not yet been in a position where I have to personally train a model to tackle a problem, but I’ve done some work with Kubernetes that empowers TensorFlow, as well as Coursera courses that involves using deep learning to empower self-driving cars, so that can explain where my interest comes from.

Also I’ve been reading textbooks, taking courses, running open sourced Colab notebooks, etc. for the last few years, so I feel like I can challenge myself and get my skills tested. Reading the test FAQ I can totally see that this is by no means a test for SOTA knowledge and skills, but fundamentals are important, right?

Preparation

Since this is a test, like all other types of test in life, getting prepared is important. The most relevant information is on the official website, specifically the public FAQ document, and you should read it at least twice, because it basically covers what’s to be expected in the exam, and also a major hint on the shape and form (e.g. number of questions).

If you are like me who have taken the TensorFlow in Practice specialization by deeplearning.ai before, the test coverage should strike a familiarity here. If it’s not the case, it is strongly suggested that you take that specialization as a preparation, or just as an introduction on how to get onboard with the new TensorFlow 2.0 API. I spent about 8 hours in total going through the Colab notebooks in the course materials and it turned out to be most helpful.

Taking the test

The test is taken on your own computer (i.e. remote) with PyCharm and a plugin. I cannot go too much into the details, but an important tip is: basically all the rules on taking tests and interviews apply here: find a quite and comfortable place with AC power and easy access to water and toilet, etc.

My initial estimate was that I don’t need to use up the full 5 hour time window, because I was pretty confident that training a few toy neural nets won’t take that long. I mentioned toy neural nets because if I wore the test designer’s hat, it would be unnecessarily difficult to evaluate people’s work based on very complex models and small improvements made to basic models, i.e. getting everything working is more importantly than fine adjusting one model. (This actually applies to real production work, since establishing a baseline, even if you use just SVM or logistic regression, is also the first thing to do).

But then I was wrong. It turns out that I underestimated all the possible places where my code can go wrong. I believe I wasted more than 3 hours in e.g.:

  1. Fixing a cryptic exception message thrown from TensorFlow internals because I didn’t correctly specify input dtype (or do proper image reshaping), and that error message wasn’t helpful at all
  2. Failing to supply the correct combination of params to model.fit with image data generator so each epoch takes too much time (I am using my MacBook Pro so there’s no GPU)
  3. Trying to upgrade my model from a single layer of LSTM to two, but then my score went down from 3/5 to 2/5 (there’s no other feedback during tests so you can’t do error analysis), after which I can never get back to the original score for some reason

Luckily, after using up all the time I was table to pass the test! I really enjoyed the process since all the test contents were within my expectation and there was no surprise, meaning all my past learning and preparation really meant something. I did run into several problems but it did teach me a good lesson on how things can actually go wrong in real ML work.

What’s next

After finishing the test, the result came in one or two hours, and you are then prompted to list your information on the developer directory. I imagine this would be useful for people doing ML job hunting but for me it is more of a social purpose. The certificate came in after more than two weeks later, but then there’s no other meaning than just a reminder to yourself.

This is by no means an end to the journey, as the criteria and content covered in this test is really just fundamentals. It helps to serve as a checkpoint, and a starting point to learn something more SOTA. For me the single biggest takeaway is (during my preparation) that I found out how useful Colab is and got used to working on it by default! In fact, I believe most people should be sufficed to work on Google Colab without buying any physical GPUs: you get data center level network speed with zero setup GPU/TPU support basically for free. Unless you want to train your models for days, e.g. retrain BERT from scratch, which you most definitely shouldn’t, it is definitely a more efficient choice.

So, is it really worth it?

To wrap up, I want to explore the question of whether it is really worth it, after all there’s a 100 US$ fee.

For people looking for ML jobs: basically you can think of it similar to buying a LinkedIn premium account: you’ll get more exposure, save some market discovery time and cost, but then again it’s you that they are looking for, the whole package. So getting that won’t buy you a job, but merely an entrance maybe.

Other than that, I find it a good excuse to push you to study for the materials, which is the truly meaningful part. If you have done that, then I wouldn’t worry if I didn’t take this.

Coder at work (from home)

In the past few months, life has turned upside down for more than half of the people around the globe, yet we here in Beijing are close to the end of it (hopefully).

It’s Sunday afternoon so let me try to recap on some of the moments that went by…

Staying at home, watching and observing

After returning from the (abruptly interrupted) spring festival holiday, we returned home in Beijing in early February, and then for the next four months we were basically locked up at home, working or not.

I can still remember early on we were quite eagerly following up with how situations unroll, first in China and then spread out across the world (Italy, Japan, U.S., etc.). Then I think the video from 和之梦 (【感染者为0的城市—南京】日本导演镜头下最真实的南京防疫现场) blew up and marked that transitional moment (or transitional week) where things are stabilizing in China while the world outside starts to turn upside down.

From there, we also see a train of outside blames against China, and also some of the influential opinions that are criticizing the blames themselves. Amongst them Daniel Dumbrill and Stratechery. I encountered Daniel’s talks via a strange path - since 和之梦 interviewed the bar owner Ben who speaks only Chongqing dialects, and then YouTube recommended his interview by Daniel Dumbrill. I specifically enjoyed his sit down interview in American PhD Student in China - A Discussion about China & More where I was amazed on how well read that PhD student was. Ben Thompson has always been an inspiration (and I recently was sold on his Dithering.fm with John Gruber) but in that very episode he and James did a great job giving an independent voice upon what did Beijing, Taiwan, and the US do right and wrong.

Sometimes it gets a bit overwhelming to care about the virus 24x7, so while jogging or walking my dog I mostly listen to other Podcasts. I continue to find Lex’s Artificial Intelligence Podcast a gold mine, where he interviews in person (amongst many others) with Jack Dorsey, Andrew Ng, Ilya Sutskever, David Silver, Donald Knuth, Michael Jordan, etc. The topics are all very interesting but what I find most unique about were the ones that you don’t usually get to hear on other occasions, anecdotes that you can only hear in a fireside chat style conversations, e.g. Andrew Ng mentioned that some of the early videos on Coursera (the Machine Learning course) were recorded after he finished weekend dinners with friends and returned to the lab after 9pm; and that Michael B. Jordan did meet with the other famous Jordan, etc. It was also a pleasant surprise when Lex did the interview with Melanie Mitchell only a week after I finished her book on AI, a perfect timing and you’ll feel lucky to have the chance to shadow talk to the author with your questions in mind while reading her book.

Sebastian Thrun is one badass entrepreneur and researcher

I especially enjoyed Karpathy's talk on how Tesla handle the long-tailness of different stop signs

Also I believe I’ve done a tone of binge watching on Wang Gang’s channel, Liziqi’s channel, and even the funny TechLead. Many people watch their videos for their soothing music and elements but I find myself enjoying most of how people outside China talk about seeing many things for the first time while totally understanding the culture and food and relating to them. Well, not for the tech lead videos, which I was just for the many “as a millionaire” humor.

Layoff, and how that affects work

Airbnb did a large layoff, amongst many other companies (Uber did two so far, before and after our round, and there was also Lyft and Cruise, etc.), and many people were affected. Airbnb China was also equally affected. Luckily I’m not one of them but I do see many talented people had to be let go. 

Saying goodbye is hard, being put into that situation was harder, but as someone who had had a startup before and had been through a layoff (by ourselves), the experience is a refresher to that memory and this time I get to be more focused on non-emotional parts, i.e. learnings.

I believe Airbnb’s and Brian’s actions during the layoff are well executed if not impeccable. Sure there was some miscommunications that could’ve been done better, but in general I would rate the execution a humane, considerate, and meaningful one. Being part of the organization I am surely not seeing the full picture but people around me did say that the severance package, the fact that our recruiting team did go extra miles to set up a talent directory, and all those smaller things like letting people keep the computers is a good choice. What’s more, you can’t cover all the negativities (since people who got impacted were impacted) but the co-founders put a lot of effort on cheering people up and adjusting the focus of the company, and that is indeed a more important post layoff. I think one of my colleagues was right on point: it takes a layoff to see the difference between companies run by founders versus hired executives. It really does.

Things are still unrolling, and there would be more repercussions as the world gradually gets back to its foot, but I think that was a good lesson and hard-to-get experience to learn.

Work life balance is shifted, tilted, and redefined

Throughout the lockdown I’ve probably gulped down 3 kilogram of grounded black coffee, bought in via 4-5 batches, each bagged in 150g, some dark roast and some more light ones. I’ve learned somewhere that coffee is better filtered, so I’ve tried to drink pour over (drip coffee) instead of using, e.g. my old French press. I think at least my taste has improved to the point where I can taste the difference of myself on a good day v.s. a bad one.

Book reading as a habit is back and thriving!

There were many discussions on e.g. a16z and Stratechery podcasts of remote work, new startup ideas, and the new paradigm of work. Now that companies like Facebook, Google, Shopify, Box, etc. are releasing policies to allow employees to work from home permanently or for the next few years, I am getting to see more of those “I told you so” tweets from e.g. BaseCamp’s DHH and Jason, and digital nomad lifestyle promoters. I personally think this is a wonderful thing to happen (of course after all it was a sad thing that many lives were lost due to the virus), that the world now can embrace this new paradigm with at least less doubt. Of course the WFH thing isn’t for everyone, as there are just types of work that can’t be done so. If we restrict the discussions on tech related jobs, at least it is much more approachable.

I personally think people need to take this trend more carefully, as you have to test the water first and decide whether this is or isn’t working for you. Our Airbnb Beijing office has been partially open since last week and as a cautious measure we are split up into A/B groups, each is allowed in the office every other week. I went to the office every single day last week, partly because I have been away from the office for too long (4 months! and magically my rubber tree is still alive!) and more importantly I just found that Zoom’s bandwidth just isn’t comparable to meeting people in person. People should really be educated that long term working from home requires tremendous communication skills as well as regular breaks where you do get to meet in person - after all people are social animals and there isn’t any technology immersive enough to give you all that. On that front, I think Facebook did right in requiring that only senior level and above have more freedom where people in their early career are still required to go to the office.

Having said all that, I’m glad that I get to work from home this next week and go back to the office the week after - every other week is a good balance. But also being a manager I’m still a bit worried on how my team members adjust to this new pattern. Work life balance was not well defined before (and even ridiculed in China’s 996 context) but now that it is no longer being redefined, rather it is tilted, reshaped, and the whole concept is no longer the same.

读皮凯蒂「二十一世纪资本论」

最近几周读了皮凯蒂的大部头书「二十一世纪资本论」。

这是一本前两年占据了好久书店门口排行榜的书,不过其实并不算特别「流行」,而且作者也算是小有名气。全书基本都是充斥了图表,看上去非常唬人;不过细细读下来,发现其实并没有那么晦涩难懂。

增长率数字


这本书前半部分 demystify 了一个常见观点,就是百分之 5-6 及以上的经济和财富增长(比如中国过去几十年)速度并不是常态——人类在过去的好几百年内,长期的经济和财富增长其实是小于 3% 的,更确切的说,工业革命之前,长期都是 0.1% 的水平。

作者举了很多巴尔扎克和简奥斯汀的文学作品来侧面论证社会经济增长长期保持稳定较低水平的论点,核心还是大量的数据和图表:英国法国的居多,美国的战后多一些,亚洲等新兴国家的在近代才有——因为依赖税收的数据为主。直到近代,特别是战后,才有了 3% 以上的发展水平。不过这个差别不容小觑,因为 3% 的年均增长率意味着每 30 年(一代人)财富水平就会翻倍,这在以前是不可想像的。

这里面核心的一个指标就是财富水平和年国民收入的比值,这个数字一般是在 5-6 的水平,数字越小代表历史积累的资本越少,这种社会相对来说「食利者」也少,更多出现在战后、新兴发展国家等市场。

其中一个相关的话题是公共财富的定义以及比例——这里面相当多的欧美国家公共财富是很少的,甚至是负的水平:大概这就是「藏富于民」的定义吧。另外在这种社会里面,如果国家要大举兴建,就必须举债了,而政府的债务水平又是一个定时炸弹,很考验央行的财政水平。

不平等的来源:r > g


作者举例说明了基尼系数如何的没有代表性,而他更推崇的数字是前 10% 人群的收入和后 10% 人群的收入比。

提到收入或者收入的不平等,书中区分开了劳动收入和资本收入(及其不平等),后者更多是指的是不需要自己亲自劳动的收入(比如贷款,机器,股权,遗产等)。一系列的数字分析指出劳动收入的不平等大概是小于 10 的,甚至很多地方只有 2-3(这里指的是最高 10% 劳动收入的人比上最低的情况),而反观资本收入,这个不平等往往就是几十,甚至几百——简而言之,人人可以工作,收入差距 CEO 不会比贫穷老百姓高那么多,但是看收租的人的资本收益,就难以想象了——因为中下阶层的人几乎一无所有,入不敷出的情况下还哪里有结余。

虽然这个规律放之四海,古往今来皆准,但是还是有一些区别变化的。比如这个比值在 20 世纪 70-80 年代的斯堪的纳维亚就相对比较小,而欧美国家在巴尔扎克的年代却是很大的(所谓的「美好年代」),以及虽然在两次世界大战之后这个收入差距缩小,随后又反弹,但是这个反弹并没有回到最开始的高水平:作者的解释是因为有一批靠劳动致富的中高层管理者,靠着劳动收入(高工资)而不是仅靠食利(资本收入)富有起来,以及因此产生出来一批「世袭中产阶级」。

作者总体来说对这个趋势的判断是会升高的,背后的本质原因是资本收益率 r 长期来说没有怎么低于过 5-6%,而社会的自然增长率 g 却是很少能够高于 2-3%,未来甚至还会回到 1% 的水平(亚洲国家逐渐步入发达国家行列,收入增长放缓)。前者是因为人对未来不确定性的排斥以及「不耐心」的天性。长此以往,资本收入差距自然而然还是会回到较高水平。

书里面提到的最彻底的解决方法是累进税率,以及一系列涉及遗产税、财政政策等的法律政策制约。他呼吁各国的政策制定者参与到这个环境的建设里面,这样才会让高企的收入差距回归正轨。当然,现实情况可能更糟,因为最富有的人往往会绕过税收限制,而他做研究的数据根据就是税收数据,所以现实情况可能被严重低估了。

塔勒布的批评


比较有意思的时候我接下来还读了 Nassim N. Taleb 的 Skin in the game,他对皮凯蒂还有保罗克鲁格曼做了无情的批评(甚至还有点人身攻击,而且引以为傲,so typical NNT):他觉得这些经济学家是屁股决定脑袋,仰望着比自己收入高一个层级、在 pecking order 上一级的人酸着他们的收入,而并不是真切的为中下层人民考虑——这个且按下不表。另外一个观点我是基本同意的,就是作者基本没有探讨「知识经济」的出现和影响,以及它可能才是真正收入不平等一定程度缓解的原因。总的来说,历史学家看过去是很准确的,但是面对未来和当下,还是需要多一些动态的眼光。(当然 Taleb 批评太多图表的点我也是赞同的…… 真的有点过了)。

总的来讲,挺有意思值得一读的。

COVID-19 宅家的某个周日下午

因为新冠肺炎,整个二月都在家办公了。旅游行业受较大影响,我们也因而产生了很多额外的工作。

好不容易轻松下来,度过了 all hands on deck 的时期,下一步工作上的重点就和各地政府要面临的问题一样了:如何 recover。

好在这个周日下午可以歇歇,而暂时不用考虑这个问题。


老婆给我买了一个 Lametric,正好放到餐桌上做时钟,这才发现在家时候有个时间参考是挺重要的。今天天气很好,莫扎特比较适合此刻的心情。


新买的咖啡豆,但比不上之前在墨尔本 Proud Mary 买的,豆粒小很多。不过香味总体还是不错。算了下如果每次冲泡都是 15g 的话,大致每杯成本都在六七块:而这还算便宜的豆子。好奇麦当劳是如何做到每杯咖啡那么便宜的。(当然质量完全没法比就是了)


烤了红薯🍠,本身质量好不需要任何调味就好吃。


看茨维格讲巴西的故事,让我想起当年看林达的「西班牙旅行笔记」的感觉,除了憧憬文化和风景之外,也有点垂涎当地的食物。


周六上完了最新的 deeplearning.ai 的 Cousera 课程,也算是学到了一些挺有用的信息。目前来说最让我印象深刻的就是他们居然这么认真的在考虑 federated learning 和对个人隐私数据的保护;回头看国内这些科技独角兽,有几个能有这样的觉悟呢。换句话说,觉悟是一方面,私企和资本生存的政治和法律土壤是怎么样看重个人权利的,也会隐隐约约的体现在产品方向上。

感谢这些小事,让我有一个心情愉悦的周日下午。

How I Discovered the Cause of Slowness in My Express App

Please note that this is a back-ported post from 2016.

Recently I discovered some slowness in my express app.

A bit background here: in the platform we are building here in Madadata, we are using an external service to provide user authentication and registration. But in order to test against its API without incurring the pain of requesting from California (where our CI servers are) to Shanghai (where our service providers' servers are), I wrote a simple fake version of their API service using Express and Mongoose.

We didn't realize the latency of my service until our recently started load testing, where it shows that more than half of requests didn't return within 1 second and thus failing the load test. As a simple Express app using Mongoose there is hardly any chance of getting it wrong, at least not anywhere near 1 second of latency.

v040

The screenshot above for running mocha test locally revealed that there is indeed some problem with the API service!

What went wrong?

From the screenshot I can tell that not all APIs are slow: the one where users log out and also the one showing current profile is reasonably fast. Also, judging from the dev logs that I printed out using morgan, for the slow APIs, their response time collected by Express is indeed showing a consistent level of slowness, (i.e. for the red flagged ones, you are seeing a roughly sum of latency of two requests above them, respectively).

This actually rules out the possibility that the slowness comes from connection, rather than within Express. So my next step is to look at my Express app. (N.B. this is actually something worth ruling out first, and I personally suggest trying one or two other tools rather than mocha, e.g. curl and even nc before moving on, because they almost always prove to be more reliable than the test code you wrote).

Inside Express

Express is a great framework when it comes to web server in Node and it has come a long way in terms of speed and reliability. I thought it is more likely due to the plugins and middlewares that I used with Express.

In order to use MongoDB as session store I used connect-mongo for backing my express-session. I also used the same MongoDB instance as my primary credential and profile store (because why not? it is a service for CI testing after all). For that I used Mongoose for ODM.

At first I suspected that it might be because of the built-in Promise library shipped by default in Mongoose. But after changing it with ES6 built-in one the problem wasn't solved.

Then I figured it is worth to check the schema serialization and validation part. There is only one model and it is fairly simple and straightforward:

const mongoose = require('mongoose') const Schema = mongoose.Schema const isEmail = require('validator/lib/isEmail') const isNumeric = require('validator/lib/isNumeric') const passportLocalMongoose = require('passport-local-mongoose') mongoose.Promise = Promise const User = new Schema({ email: { type: String, required: true, validate: { validator: isEmail }, message: '{VALUE} 不是一个合法的 email 地址' }, phone: { type: String, required: true, validate: { validator: isNumeric } }, emailVerified: { type: Boolean, default: false }, mobilePhoneVerified: { type: Boolean, default: false }, turbineUserId: { type: String } }, { timestamps: true }) User.virtual('objectId').get(function () { return this._id }) const fields = { objectId: 1, username: 1, email: 1, phone: 1, turbineUserId: 1 } User.plugin(passportLocalMongoose, { usernameField: 'username', usernameUnique: true, usernameQueryFields: ['objectId', 'email'], selectFields: fields }) module.exports = mongoose.model('User', User)

Mongoose Hooks

Mongoose has this nice feature where you can use pre- and post- hooks to interact and investigate document validation and saving process.

Using console.time and console.timeEnd we can actually measure the time spent during these processes.

User.pre('init', function (next) { console.time('init') next() }) User.pre('validate', function (next) { console.time('validate') next() }) User.pre('save', function (next) { console.time('save') next() }) User.pre('remove', function (next) { console.time('remove') next() }) User.post('init', function () { console.timeEnd('init') }) User.post('validate', function () { console.timeEnd('validate') }) User.post('save', function () { console.timeEnd('save') }) User.post('remove', function () { console.timeEnd('remove') })

and then we are getting this more detailed information in mocha run:

pre-post

Apparently document validation and saving doesn't take up large chunks of latency at all. It also rules out the likelihood a) that the slowness comes from connection problem between our Express app and MongoDB server, or b) that the MongoDB server itself is running slow.

Passport + Mongoose

Turning my focus away from Mongoose itself, I start to look at the passport plugin that I used: passport-local-mongoose.

The name is a big long but it basically tells what it does. It adapts Mongoose as a local strategy for passport, which does session management and registering and login boilerplate.

The library is fairly small and simple, so I start to directly edit the index.js file within my node_modules/folder. Since function #register(user, password, cb) calls function #setPassword(password, cb), i.e. specifically this line, I started to focus on the latter. After adding some more console.time and console.timeEnd I confirmed that the latency is mostly due to this function call:

pbkdf2(password, salt, function(pbkdf2Err, hashRaw) { // omit }

PBKDF2

The name itself suggested that it is a call to cryptography library. And a second look at the README show that the library is using 25,000 iterations.

Like bcryptpbkdf2 is also a slow hashing algorithm, meaning that it is intended to be slow, and that slowness is adjustable given number of iterations, in order to adapt against ever-increasing computation power. This concept is called key stretching.

As written in the Wiki, the initial proposed iteration number was 1,000 when it first came out, and some recent updates on this number reached as hight as 100,000. So in fact the default 25,000 was reasonable.

After reducing the iterations to 1,000, my mocha test output now looks like:

iter-1000

and finally it is much acceptable in terms of latency and security, for a test application after all! N.B. I did this change for my testing app, it does not mean your production app should decrease the iterations. Also, setting it too high will also render the app vulnerable to DoS attack.

Final thoughts

I thought it would be meaningful to share some of my debugging experience on this, and I'm glad that it wasn't due to an actual bug (right, a feature in disguise).

Another point worth mentioning is that for developers who are not experts on computer security or cryptography, it is usually a good idea not to homemake some code related to session/key/token management. Using good open source libraries like passport to start with is a better idea.

And as always, you'll never know what kind of rabbit hole you'll run into while debugging a web server - this is really the fun part of it! 