Maybe you’re a junior data scientist, maybe you’re a software developer who wants to go into data science, or perhaps you’ve dabbled in data for years in Excel but are ready to take the next step.
If so, this post is all about you, and an opportunity I offer every year.
You see, life has been very good to me in terms of training as a data scientist. I have been spoiled, really – I had the chance to learn from some of the best data scientists, work with some exceptional epidemiologists, experience some unusual challenges and face many of the day-to-day hurdles of working in data analytics. I’ve had the fortune to see this profession in all its contexts, from small enterprises to multi-million dollar FTSE100 companies, from well-run agile start-ups to large and sometimes pretty slow dinosaurs, from government through the private sector to NGOs: I’ve seen it all. I’ve done some great things. And I’ve made some superbly dumb mistakes.
And so, at the start of every year, I have opened applications for young, start-of-career data scientists looking for their Mr. Miyagi. Don’t worry: no car waxing involved. I will be choosing a single promising young data scientist and pass on as much as I can of my so-called wisdom. At the end, your skills will shine like Mr. Miyagi’s 1947 Ford Deluxe Convertible. There’s no catch, no hidden trap, no fees or charges involved (except the one mentioned below).
To be eligible, you must be:
- 18 or above if you are taking a gap year or not attending a university/college.
- You do not have to have a formal degree in data science or a relevant subject, but you must have completed it if you do. In other words: if you’re in your 3rd year of an English Lit degree, you’re welcome to apply, but if you’re in the middle of your CS degree, you have to wait until you’re finished – sorry. The same goes if you intend to go straight on to a data science-related postgrad within the year.
- Have a solid basis in mathematics: decent statistics, combinatorics, linear algebra and some high school calculus are the very minimum.
- You must be familiar with Python (3.5 and above), and either familiar with the scientific Python stack (SciPy, NumPy, Pandas, matplotlib) or willing to pick up a lot on the go.
- Be willing to put in the work: we’ll be convening about once every week to ten days by Skype for an hour, and you’ll probably be doing 6-10 hours’ worth of reading and work for the rest of the week. Please be realistic if you can sustain this.
- If, as recommended, you are working on an AWS EC2 instance, be aware this might cost money and make sure you can cover the costs. In practice, these are negligible.
- You must understand that this is a physically and intellectually strenuous endeavor, and it is your responsibility to know whether you’re physically and mentally up for the job. However, no physical or mental disabilities are regarded as automatically excluding you of consideration.
- You must not live in, reside in or be a citizen of any of the countries listed in CFR Title 22 Part 126, §126.1(d)(1) and (2).
- You must not have been convicted of a felony anywhere. This includes ‘spent’ UK criminal convictions.
When assessing applications, the following groups are given preference:
- Persons with mental or physical disabilities whose disability precludes them from finding conventional employment – please outline this situation on the application form.
- Honourably discharged (or equivalent) veterans of NATO forces and the IDF – please include member 4 copy of DD-214, Wehrdienstzeitbescheinigung or equivalent document that lists type of discharge.
What we’ll be up to
Over the 42 weeks to follow, you will be undergoing a rigorous and structured semi-self-directed training process. This will take your background, interests and future ambitions into account, but at the core, you will:
- master Python’s data processing stack,
- learn how to visualize data in Python,
- work with networks and graph databases, including Neo4j,
- acquire the correct way of presenting results in data science to stakeholders,
- delve into cutting-edge methods of machine learning, such as deep learning using keras,
- work on problems in computer vision and get familiar with the Python bindings of OpenCV,
- scrape data from social networks, and
- learn convenient ways of representing, summarizing and distributing our results.
The programme is divided into three ‘terms’ of 14 weeks each, which each consist of 9 weeks of directed study, 4 weeks of self-directed project work and one week of R&R.
What you’ll be getting out of this
In the past years, mentees have noted the unusual breadth of knowledge they have acquired about data science, as well as the diversity of practical topics and the realistic question settings, with an emphasis on practical applications of data science such as presenting data products. I hope that this year, too, I’ll be able to convey the same important topics. Every year is a little different as I try to adjust the course to meet the individual participant’s needs.
The programme is not, of course, accredited by any accreditation body, but a certificate of completion will be issued to any participant who wishes so.
Simply fill in the form below and send it off by 14 January 2018. The top contenders will be contacted by e-mail or telephone for a brief conversation thereafter. Finally, a lucky winner will be picked by the 21st January 2018. Easy peasy!
Q: What does ‘semi-self-directed’ mean? Is there a fixed curriculum?
A: No. There are some basic topics (see list above) that I think are quite likely to come up, but ultimately, this is about making you the data scientist you want to be. For this reason, we’ll begin by planning out where you want to improve – kinda like a PT gives you a training plan before you start out at their gym. We will then adjust as needed. This is not an exam prep, it’s a learning experience, and for that reason, we can focus on delving deeper and getting the fundaments right over other cramming in a particular curriculum.
Q: Can I bring your own data?
A: Sure. In general, we’ll be using standard data sets, because they’re well-known and high-quality data. But if you have a dataset you collected or are otherwise entitled to use that would do equally well, there’s no reason why we couldn’t use it! Note that you must have the right to use and share the data set, meaning it’s unlikely you’re able to use data sets from your day job.
Q: Will this give me an employment advantage?
A: I don’t quite know – it’s impossible to predict. The field of data science degrees is something of a Wild West still, and while some reputable degrees have emerged, others are dubious. Employers still don’t know what to go by. However, you will most definitely be better prepared for an employment interview in data science!
Q: Why are you so keen on presenting data the right way?
A: Because as data scientists, we’re expected to not merely understand the data and draw the right conclusions, but also to convey them to stakeholders at various levels, from plant management to C-suite, in a way that gets the right message across at the first go.
Q: You’re a computational epidemiologist. Can I apply even if my work doesn’t really involve healthcare?
A: Sure. The principles are the same, and we’re largely focusing on generic topics. You might be exposed to bits and pieces of epidemiology, but I can guarantee it won’t hurt.
Q: Why do you only take on one mentee?
A: To begin with, my life is pretty busy – I have a demanding job, a family and – shock horror! – I even need to sleep every once in a while. More importantly, I want to devote my undivided attention to a worthy candidate.
Q: How come I’ve never heard of this before?
A: Until now, I’ve largely gotten mentees by word of mouth. I am concerned that this is keeping some talented people out and limiting the pool of people we should have in. That’s why this year, I have tried to make this process much more transparent.
Q: You’re rather fond of General ‘Mad Dog’ Mattis. Will there be yelling?
Q: There seems to be no upper age limit. Is that a mistake?
Q: I have more questions.
A: You can ask them here.