2018年8月28日：开源日报第173期

28 8 月, 2018
每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，欢迎关注开源日报。交流QQ群：202790710；微博：https://weibo.com/openingsource；电报群 https://t.me/OpeningSourceOrg

今日推荐开源项目：《图片上传工具 PicGo》传送门：GitHub链接

推荐理由：顾名思义，这是一个图片上传工具……它真的就是一个图片上传工具而已，可以把图片上传到现在已经支持的图床，对于每一种图床的操作方法已经在手册里面写有了，如果还有问题的话兴许看看 FAQ 可以帮你解决，顺带一提，等它的2.0版本发布之后，将会支持第三方图床插件来上传到所需的图床上，兴许现在可以静观其变。

今日推荐英文原文：《Wanna be a developer? Here is what you need to take into account.》作者：Vinh Le

原文链接：https://medium.freecodecamp.org/wanna-be-a-developer-here-is-what-you-need-to-take-into-account-7f59a059f39

推荐理由：一些从作者的经验中得来的作为开发者需要知道的东西

Wanna be a developer? Here is what you need to take into account.
A common myth about software developers is that they’re boring and dry geeks, who were math geniuses at school, that spend hours in front of computer screens writing code.

Yes, developers may spend their life in front of computer screens writing code. However, there is much more to it than just coding like a machine the whole day. For me personally, being a developer means you have the chance to build cool stuff by yourself.

I started self-learning front-end development my sophomore year. My journey has been filled with self-doubt and hurdles, along with joy and extreme excitement. I never imagined I could experience all that while learning something.

Sometimes, even now, I still wonder whether I am following the right path. However, by telling myself that there is nothing more pleasurable than being able to do what I love, I keep putting my head down and I continue working.

It has been around two years since I started writing my first lines of code. After many hours of practicing, with sometimes feeling like giving up followed by temporary discontinuity, I would like to share with you a few things that I wish I knew from the beginning.

Don’t treat coding as a leisure interest

First and foremost, if you think that you need to be serious while coding, you’re 100 percent right. It is definitely true that you will probably not spend time doing something you don’t like.

However, doing it just on a hobby basis — that is, you only code when you feel like doing it without any specific commitment or schedule — will probably not lead you to the career that you have always wanted.

In addition, when you encounter obstacles and difficulties, are you sure that you will have enough patience to keep your little interest alive? Rather, you might end up giving up or potentially going through a long stagnation in the middle.

Therefore, you should be incredibly dedicated to your passion, my friends. Yes, I am sure that all of us developers have a great interest in coding and technology in general.

However, passion is nothing without the right execution. By committing to a specific goal along with an appropriate schedule, you are building milestones along your journey. Put in a huge commitment to your given timeframe. Specify which skills and technologies you want to learn over a certain period of time. Then you’ll be closer than ever to making learning how to code an imperative part of your life.

Figure out which technologies you need to focus on

Once you’ve started taking coding seriously, the next step is to be honest with yourself. What kind of developer do you want to be?

Start by asking yourself what interests you the most. Are you passionate about building user interfaces which control the way users interact with your product? If yes, then front-end technologies should be your main focus. Or maybe designing is not in your DNA and you’re interested in how the server side works — then back-end stuff should be your focus.

Giving yourself a clear idea of what you need to learn, based on your interests, is a key element. If you’re still not sure which side is for you, Google them to figure it out or try out a bit of each. Each of us have our own preferences and skills — the things we do best. So answering this question might be simpler than you’ve thought.

Start with the easy things

In the beginning, you might be confused by almost every single task, regardless of difficulty level. From choosing a proper text editor to setting up an environment for a project, it will surely cause you more troubles than you ever expected.

Therefore, if you are a complete beginner who is trying out their first language, I highly recommend starting with the easy things. Focus on platforms that provide interactive coding playgrounds, such as Codecademy.

That’s where I began, too. These platforms help you focus solely on being familiar with the programming languages without worrying about initialization. You will need to learn these things later on, of course. However, I believe that beginning with writing code will not only excite you, but also help you avoid being overwhelmed.

What learning resources are out there?

There are different paths that you can choose in order to be a software developer. You could either enroll for a computer science degree, participate in coding bootcamps, or even teach yourself. Either way, you’ll always need to constantly update your learning materials. As I belong to the last category, I would like to share how I filter out my learning resources.

Begin with coding playgrounds

At the very first step, starting with easy-to-understand-and-learn platforms such as Codecademy. It offers a place where you are able to read the instructions, and then practice the knowledge right away thanks to the built-in web-based text editor. The result is shown on the screen as well. Just sign in for free, pick up what technologies you are interested in, enter the designated learning track, then you’re good to go.

Another very useful resources especially for newbies is freeCodeCamp. Unlike Codecademy where you have to pay for more premium courses (which are, however, very useful), freeCodeCamp offers totally FREE courses and learning tracks. They even give you certificates when you complete each major section.

Their tutorials also include detailed instructions, a built-in text editor, and clear explanations too. Additionally, there are projects available where you can use the skills you’ve learned to solve various problems.

Choosing the right learning resources

This process is actually quite challenging. It’s not because there are too few reliable and well-documented sources. There are actually too many tutorials, which potentially overwhelm you at first. Deciding on which way to go can be tough, as you will probably spend a certain period of time following along each path you try. Therefore, a bad tutorial might not only cost you time but also demotivate you from moving forward.

Before asking anyone else or Googling where should you learn, please do me a favor, my friends: ask yourself first! Why? Because there are various types of tutorials out there — videos, e-books, textbooks, and online or in-person bootcamps. Only you’ll be able to tell what type of resources you can effectively learn the most from.

For me personally, I enjoy watching video tutorials and coding along while watching them. That’s why I treat it as my primary learning method. But you might like reading instead so you can entirely control the pace of learning. In that case, you’d be better off going go for well-known books.

Ultimately, you may realize that it is necessary to combine different learning methods. However, in each case you’ll perhaps spend a lot of your time on Medium, where you’ll find many useful resources that you’re most comfortable with.

And so, just like the way you’ll figure out what technologies you decide to learn, take a step back, give your mind some space, and determine what type of learning resources you’d like to consume. When you’ve found something that it is right for you, then go for it!

Here are a few great categorized tutorials that I found super useful:

Video

LearnCode.academy Tutorials

Traversy Media Tutorials

Academind Tutorials

The New Boston Tutorials

LearnWebCode Tutorials

Rally Coding Tutorials

LevalUpTuts Tutorials

DevTips Tutorials

Coding Tech Tutorials

freeCodeCamp Tutorials

Coding Tech Tech conference

MOOCs (Paid online courses)

Udemy per-course subscription

TreeHouse monthly subscription

Books

In-depth knowledge

The Eloquent JavaScript

You Don’t Know JavaScript

Tech & Design

The Phoenix Project

Don’t Make Me Think

The Design of Everyday Things

Surround yourself with tech

As mention above, whatever resource you chose to start with, you’ll probably need to rely on different mediums. And that’s the interesting part of being a developer. By surrounding yourself with tech stuff, you will be “learning while relaxing”.

Imagining that…

You wake up early and start the day by continuing your online tutorial. After almost an hour or so concentrating, you decide to take a break. A Netflix episode? No. You realize that there’s no way you would spend an hour watching TV, and instead open YouTube. You decide to spend time on a 30 minute talk on Coding Tech.

The video you viewed received over a hundred thousand views. The guy was talking about the future of CSS thanks to Grid. Interesting! “The time for remembering or checking documentation for Bootstrap grid classes is over”, you murmur. Let’s see how it works!

You Google CSS Grid, then go for a blog post published in the freeCodeCamp Medium publication. Thanks to this blog, you grasp some key points and can’t wait to open VSCode to try it out. It is amazing! Oops, something goes wrong. You probably go through a few more questions by folks in StackOverFlow or some more tutorials on CSSTricks. You go around and then finally get it to work.

During lunch, you open a podcast and listen to the latest freeCodeCamp episode, which is about how a self-taught developer landed his first tech job. After lunch, you decide to continue with the React tutorial on Udemy. You suddenly find a problem that you’re not clear about, and the Q&A section doesn’t help.

Tired of being stuck for half an hour, you decide to temporarily give up and hope to resolve it later. Then you go to surf through the Dev community on Codeburst to see tips and trends from fellow tech enthusiasts. This is truly a place where people join to share their knowledge and discuss with others.

You then think: “May be I should start writing something, whatever it is that I have observed and learned throughout my journey…then I can share it with everyone”. Opening a Google doc page, you excitedly type: “Do you want to be a developer….”?

Does this story somehow motivate you? If yes then what are you waiting for? Let’s jump into the world where all of us are developing technological applications to make the world a better place.

Practice, practice and practice

Okay so now that you have some idea of where to start, it is probably a good idea to start right now. However, being good at something really requires time. To be great, you need to put in tons of work. It is impossible to fill the gap between being a developer that is just starting out and being an experienced developer without sweat and tears.

In other words, to be proficient in a programming language, you’d need to put in hours — years — of practice. How, you ask?

Follow along with tutorials and actively Google or StackOverFlow bugs that you might meet along the way.

Dedicate a certain period of time per day only to coding.

If you’re tired, take a break and go surf around, visit forums and platforms where tech leaders and seasoned developers share what is happening in the tech world. Basically surround yourself by tech stuff.

Remember, you’re moving to the next important steps of the success ladder. The more work you put in, the more confident and enthusiastic you’ll probably feel. Just take into account that there is absolutely no shortcut. There is no language or libraries or extension that can help you achieve an overnight success. Keep hustling, learning from failures, being responsible and committed to your schedule and believe in yourself. The day your dream comes true might just be around the corner!
That’s the end of this blog! Thanks for reading! If you like it, please hit ???

Say hello on SM: Facebook, Twitter, LinkedIn, or my personal site.

Stay tuned for upcoming tech blogs???

See you soon!

每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，欢迎关注开源日报。交流QQ群：202790710；微博：https://weibo.com/openingsource；电报群 https://t.me/OpeningSourceOrg
2018年8月27日：开源日报第172期

27 8 月, 2018
每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，欢迎关注开源日报。交流QQ群：202790710；微博：https://weibo.com/openingsource；电报群 https://t.me/OpeningSourceOrg

今日推荐开源项目：《提升网页性能的方法 Front-End-Performance-Checklist》传送门：GitHub链接

推荐理由：这个项目中主要介绍的是如何在前端这一块提升网页的性能，主要从 HTML，CSS，字体，图片，JS，服务和 JS 框架方面展开，提升性能不应该总是后端的事情，前端写的好一样可以达到这个目的。

今日推荐英文原文：《Closures Explained, Simply》作者：Daniel Lempesis

原文链接：https://medium.com/@daniellempesis/closures-explained-simply-e83680793e4f

推荐理由：顾名思义，简单介绍了关于 JS 中闭包的概念

Closures Explained, Simply

One of the most widely-covered, important topics in JavaScript is the concept of closures. You certainly hear about them a lot. You may have even been asked to explain what they are in an interview. Of all the explanations I’ve read, though, none have been newbie-friendly, and many have been downright confusing.

I’ll be honest: When I first tried learning about closures, I had absolutely no idea what the explanations I was reading were trying to say. Many seem to be riddled with unclear pronoun references and suffer from gross overuse of the word “function”. I realized only much later I’d been using them for months; closures just seemed like a logical extension of what I’d already been doing, and I hadn’t given it a single thought, much less known there was a special name for what I’d been doing.

Diving In

Closures are actually an incredibly simple concept. If you already have a basic understanding of how JavaScript’s scope works, it’s going to be a breeze. If not, it will be a little more complicated, but still not too hard.

I like to use the word “enclosure” when explaining closures, because in a sense a closure both exists in an enclosure (its parent function) and is an enclosure itself: Housing its own variables and potentially its own closures, everything inside of it is invisible to the rest of your application.

A JavaScript closure is, very simply, any function that exists inside another function. It looks like any other function, and requires no special steps to ‘turn it into’ a closure, except that it must exist within another function. It may be declared with function, or with const, or if you’re feeling nostalgic, with var. (It even works with new Function if you so desire.)

A closure has, perhaps unsurprisingly, full access to any variables declared within itself. It also has, due to the nature of JavaScript’s scope, access to any variable or function which exists in either the global scope, or in the hierarchy of functions within which it is nested. Conversely, all variables and any functions inside the closure (this would make them closures as well) are inaccessible to any function the closure is inside of, in addition to being inaccessible to anything in the global scope.

Below is a non-generic example of a closure, finding all the prime numbers between 2 and 17; the closure itself is in bold.

Note that to keep things as simple as possible so brand new coders (or those coming from languages like Python or Ruby) can easily follow, I’m implementing the simplest (and slowest) prime solver I know how to, declaring the numbers 1–17 as an array, and using a while loop with more Python/Ruby-like syntax. This implementation is in no way recommended 🙂
```
function primesUpToSeventeen() {
  const numbers = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17];
  const primes  = [];
```
```
  function isPrime(number) { //this is our closure.
    if (number < 2) 
      return false;
    else if (number === 2) 
      return true;
```
```
    let divisor = 2;
```
```
    while (divisor < number) {
      if (number%divisor === 0) 
         return false;
      else
        divisor += 1
    };
    return true
  };
```
```
  numbers.forEach(number=> {
    if (isPrime(number)) //isPrime() closure is being called here
      primes.push(number)
  });
  return primes
};
```
```
//returns [2, 3, 5, 7, 11, 13, 17]
```
In the above example, isPrime() is a closure function housed within its parent function, primesUpToSeventeen(). Its parent doesn’t know or care what’s happening inside of isPrime(); it doesn’t know anything about its internal variables, what functions (closures!) it may contain (in this case, it doesn’t have any), or even if there are variables declared inside isPrime()which share names with variables in primesUpToSeventeen()‘s own scope. All it knows is what isPrime() tells it when it completes its work; in this case,isPrime() is going to return either true or false. That’s all its parent function really knows.*

So, so far so good; we’re getting somewhere. But the above function is actually a pretty unhelpful example. We could move isPrime() out of primesUpToSeventeen() like this:
```
function primesUpToSeventeen() {
  //primesUpToSeventeen() without isPrime() code here
}
```
```
function isPrime(number) {
  //isPrime code here
}
```
…and it would behave identically.

Let’s do just that, and then add another step to demonstrate one of the concepts we’ve covered so far: that a closure has access to any variable in any of the functions it’s nested in. Here, we’ll only return numbers if a.) they’re prime, and b.) the result of adding 6 to them is also prime. This is a bit contrived for the sake of keeping it simple and illustrating this concept, so just pretend our addition is some highly complicated function doing interesting work on the numbers it receives.
```
function isPrime(number) { //exists in global scope; not a closure
  //...code here
};
```
```
function plusSixPrimesUpToSeventeen() { //our outer function
  const numbers = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17];
  const primes  = [];
```
```
  numbers.forEach(number=> { //add primes we find to primes array
    if (isPrime(number))
      primes.push(number)
  });
```
```
  function findPlusSixPrimes() { //this is our closure
    const plusSixPrimes = [];
```
```
    primes.forEach(prime=> {
      const primePlusSix = prime + 6
```
```
      if (isPrime(primePlusSix)) {
        plusSixPrimes.push(prime)
      };
    });
    return plusSixPrimes
  };
```
```
  return findPlusSixPrimes() //we call our closure here
};
```
```
//returns [5, 7, 11, 13, 17] because 11, 13, 17, 19 and 23 are prime
```
Notice we didn’t pass any variables to our closure. We didn’t have to; isPrime() exists in the global scope, so our closure can use it whenever it wants, and the primes array we populated earlier at the top of our main (“outer”) function (plusSixPrimesUpToSeventeen()) exists in the same space (scope) our closure does.

And that’s pretty much it! Closures have many uses not covered here, but you now understand what they are and how they work.

*Note that this does not hold true for reassignment; declaring a variable within a closure, even a variable with a name already used outside of the closure will not result in any namespace issues; however, reassigning or mutating a variable (e.g. numbers = [] or numbers.length = 0) will modify that outer variable. In this particular case, numbers can’t be reassigned anyway as it’s a constant, and even if it weren’t, I used a forEach loop, so reassigning numbers wouldn’t actually affect the function’s output. But it’s important to remember that closures absolutely can modify any variable it has access to (which is a good thing!).

每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，欢迎关注开源日报。交流QQ群：202790710；微博：https://weibo.com/openingsource；电报群 https://t.me/OpeningSourceOrg

2018年8月26日：开源日报第171期

26 8 月, 2018

每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，欢迎关注开源日报。交流QQ群：202790710；微博：https://weibo.com/openingsource；电报群 https://t.me/OpeningSourceOrg

今日推荐开源项目：《面试图谱 InterviewMap》传送门：GitHub链接

推荐理由：有关于面试你需要用到的知识，包括前端，计算机通用型的知识以及你的职业技巧，即使不需要面试，为了补全自己的知识也可以来读一读这个。

今日推荐英文原文：《The Best Machine Learning Resources》作者：Vishal Maini

原文链接：https://medium.com/machine-learning-for-humans/how-to-learn-machine-learning-24d53bb64aa1

推荐理由：一个关于机器学习的资源汇总

The Best Machine Learning Resources

This article is an addendum to the series Machine Learning for Humans ??, a guide for getting up-to-speed on machine learning concepts in 2-3 hours.

General advice on crafting a curriculum

Going to school for a formal degree program for isn’t always possible or desirable. For those considering an autodidactic alternative, this is for you.

1. Build foundations, and then specialize in areas of interest.

You can’t go deeply into every machine learning topic. There’s too much to learn, and the field is advancing rapidly. Master foundational concepts and then focus on projects in a specific domain of interest — whether it’s natural language understanding, computer vision, deep reinforcement learning, robotics, or whatever else.

2. Design your curriculum around topics that personally excite you.

Motivation is far more important than micro-optimizing a learning strategy for some long-term academic or career goal. If you’re having fun, you’ll make fast progress. If you’re trying to force yourself forward, you’ll slow down.

We’ve included resources that we explored personally or came highly recommended. This list is not meant to be exhaustive. There are endless options, and too much choice is counterproductive. But if we’re missing a great resource that belongs here, please reach out!

Foundations

Programming

Syntax and basic concepts: Google’s Python Class, Learn Python the Hard Way.

Practice: Coderbyte, Codewars, HackerRank.

Linear algebra

Deep Learning Book, Chapter 2: Linear Algebra. A quick review of the linear algebra concepts relevant to machine learning.

A First Course in Linear Model Theory by Nalini Ravishanker and Dipak Dey. Textbook introducing linear algebra in a statistical context.

Probability & statistics

MIT 18.05, Introduction to Probability and Statistics, taught by Jeremy Orloff and Jonathan Bloom. Provides intuition for probabilistic reasoning & statistical inference, which is invaluable for understanding how machines think, plan, and make decisions.

All of Statistics: A Concise Course in Statistical Inference, by Larry Wasserman. Introductory text on statistics.

Calculus

Khan Academy: Differential Calculus. Or, any introductory calculus course or textbook.

Stanford CS231n: Derivatives, Backpropagation, and Vectorization, prepared by Justin Johnson.

Machine learning

Courses

Andrew Ng’s Machine Learning course on Coursera (or, for more rigor, Stanford CS229).

Data science bootcamps: Galvanize (full-time, 3 months, $$$$), Thinkful (flexible schedule, 6 months, $$).

Textbook

An Introduction to Statistical Learning by Gareth James et al. Excellent reference for essential machine learning concepts, available free online.

Deep learning

Courses

Deeplearning.ai, Andrew Ng’s introductory deep learning course.

CS231n: Convolutional Neural Networks for Visual Recognition, Stanford’s deep learning course. Helpful for building foundations, with engaging lectures and illustrative problem sets.

Projects

Fast.ai, a fun and hands-on project-based course. Projects include classifying images of dogs vs. cats and generating Nietzschean writing.

MNIST handwritten digit classification with TensorFlow. Classify handwritten digits with >99% accuracy in 3 hours with this tutorial by Google.

Try your hand at a Kaggle competition. Implement a deep learning paper that you found interesting, using other versions on GitHub as reference material.

Reading

Deep Learning Book, a.k.a. the Bible of Deep Learning, authored by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

Neural Networks and Deep Learning, a clear and accessible online deep learning text by Michael Nielsen. Ends with commentary on reaching human-level intelligence.

Deep Learning Papers Reading Roadmap, a compilation of key papers organized by chronology and research area.

Reinforcement learning

Courses

John Schulman’s CS 294: Deep Reinforcement Learning at Berkeley.

David Silver’s Reinforcement Learning course at University College London.

Deep RL Bootcamp, organized by OpenAI and UC Berkeley. Applications are currently closed, but it’s worth keeping an eye out for future sessions.

Projects

Andrej Karpathy’s Pong from Pixels. Implement a Pong-playing agent from scratch in 130 lines of code.

Arthur Juliani’s Simple Reinforcement Learning with Tensorflow series. Implement Q-learning, policy-learning, actor-critic methods, and strategies for exploration using TensorFlow.

See OpenAI’s requests for research for more project ideas.

Reading

Richard Sutton’s book, Reinforcement Learning: An Introduction.

Artificial intelligence

Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig.

Sebastian Thrun’s Udacity course, Intro to Artificial Intelligence.

Fellowships: Insight AI Fellows Program, Google Brain Residency Program

Artificial intelligence safety

For the short version, read: (1) Johannes Heidecke’s Risks of Artificial Intelligence, (2) OpenAI and Google Brain’s collaboration on Concrete Problems in AI Safety, and (3) Wait But Why’s article on the AI Revolution.

For the longer version, see Nick Bostrom’s Superintelligence.

Check out the research published by the Machine Intelligence Research Institute (MIRI) and Future of Humanity Institute (FHI) on AI safety.

Keep up-to-date with /r/ControlProblem on Reddit.

Newsletters

Import AI, weekly AI newsletter covering the latest developments in the industry. Prepared by Jack Clark of OpenAI.

Machine Learnings, prepared by Sam DeBrule. Frequent guest appearances from experts in the field.

Nathan.ai, covering recent news and commenting on AI/ML from a venture capital perspective.

The Wild Week in AI by Denny Britz. The title says it all.

Advice from others

“What is the best way to learn machine learning without taking any online courses? — answered by Eric Jang, Google Brain

What are the best ways to pick up deep learning skills as an engineer?" - answered by Greg Brockman, CTO of OpenAI

A16z's AI Playbook, a more code-based introduction to AI

AI safety syllabus, designed by 80,000 Hours

“You take the blue pill, the story ends. You wake up in your bed and believe whatever you want to believe. You take the red pill, you stay in Wonderland, and I show you how deep the rabbit hole goes.” — Morpheus

Good luck!

If you're interested in sponsoring future work, we appreciate any amount you are able to contribute: paypal.me/ml4h

2018年8月25日：开源日报第170期

25 8 月, 2018
每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，欢迎关注开源日报。交流QQ群：202790710；微博：https://weibo.com/openingsource；电报群 https://t.me/OpeningSourceOrg

今日推荐开源项目：《Python 教学 Learn Python 3》传送门：GitHub链接

推荐理由：这是个需要使用 Jupyter 才能在本地打开的 Python 3 教学笔记，当然了如果你想要直接在浏览器上浏览也没有问题，不过你得确定你的梯子是开着的，对 Python 3 有兴趣的朋友可以来看一看。

今日推荐英文原文：《Machine Learning is Fun!》作者：Adam Geitgey

原文链接：https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471

推荐理由：这虽然是一篇非常老的文章了……但是你大可以接着把它当作机器学习的入门教程来看一看，如果看不下去英文的话，也有中文版的（实际上各种语言都有但是除了中文和日文其他的小编都看不懂了……）

Machine Learning is Fun!

Update: This article is part of a series. Check out the full series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 8! You can also read this article in 日本語, Português, Português (alternate), Türkçe, Français, 한국어 , العَرَبِيَّة‎‎, Español (México), Español (España), Polski, Italiano, 普通话, Русский, 한국어 , Tiếng Việt or فارسی.

Bigger update: The content of this article is now available as a full-length video course that walks you through every step of the code. You can take the course for free (and access everything else on Lynda.com free for 30 days) if you sign up with this link.

Have you heard people talking about machine learning but only have a fuzzy idea of what that means? Are you tired of nodding your way through conversations with co-workers? Let’s change that!

This guide is for anyone who is curious about machine learning but has no idea where to start. I imagine there are a lot of people who tried reading the wikipedia article, got frustrated and gave up wishing someone would just give them a high-level explanation. That’s what this is.

The goal is be accessible to anyone — which means that there’s a lot of generalizations. But who cares? If this gets anyone more interested in ML, then mission accomplished.

What is machine learning?

Machine learning is the idea that there are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data.

For example, one kind of algorithm is a classification algorithm. It can put data into different groups. The same classification algorithm used to recognize handwritten numbers could also be used to classify emails into spam and not-spam without changing a line of code. It’s the same algorithm but it’s fed different training data so it comes up with different classification logic.

This machine learning algorithm is a black box that can be re-used for lots of different classification problems.

“Machine learning” is an umbrella term covering lots of these kinds of generic algorithms.

Two kinds of Machine Learning Algorithms

You can think of machine learning algorithms as falling into one of two main categories — supervised learning and unsupervised learning. The difference is simple, but really important.

Supervised Learning

Let’s say you are a real estate agent. Your business is growing, so you hire a bunch of new trainee agents to help you out. But there’s a problem — you can glance at a house and have a pretty good idea of what a house is worth, but your trainees don’t have your experience so they don’t know how to price their houses.

To help your trainees (and maybe free yourself up for a vacation), you decide to write a little app that can estimate the value of a house in your area based on it’s size, neighborhood, etc, and what similar houses have sold for.

So you write down every time someone sells a house in your city for 3 months. For each house, you write down a bunch of details — number of bedrooms, size in square feet, neighborhood, etc. But most importantly, you write down the final sale price:

This is our “training data.”

Using that training data, we want to create a program that can estimate how much any other house in your area is worth:

We want to use the training data to predict the prices of other houses.

This is called supervised learning. You knew how much each house sold for, so in other words, you knew the answer to the problem and could work backwards from there to figure out the logic.

To build your app, you feed your training data about each house into your machine learning algorithm. The algorithm is trying to figure out what kind of math needs to be done to make the numbers work out.

This kind of like having the answer key to a math test with all the arithmetic symbols erased:

Oh no! A devious student erased the arithmetic symbols from the teacher’s answer key!

From this, can you figure out what kind of math problems were on the test? You know you are supposed to “do something” with the numbers on the left to get each answer on the right.

In supervised learning, you are letting the computer work out that relationship for you. And once you know what math was required to solve this specific set of problems, you could answer to any other problem of the same type!

Unsupervised Learning

Let’s go back to our original example with the real estate agent. What if you didn’t know the sale price for each house? Even if all you know is the size, location, etc of each house, it turns out you can still do some really cool stuff. This is called unsupervised learning.

Even if you aren’t trying to predict an unknown number (like price), you can still do interesting things with machine learning.

This is kind of like someone giving you a list of numbers on a sheet of paper and saying “I don’t really know what these numbers mean but maybe you can figure out if there is a pattern or grouping or something — good luck!”

So what could do with this data? For starters, you could have an algorithm that automatically identified different market segments in your data. Maybe you’d find out that home buyers in the neighborhood near the local college really like small houses with lots of bedrooms, but home buyers in the suburbs prefer 3-bedroom houses with lots of square footage. Knowing about these different kinds of customers could help direct your marketing efforts.

Another cool thing you could do is automatically identify any outlier houses that were way different than everything else. Maybe those outlier houses are giant mansions and you can focus your best sales people on those areas because they have bigger commissions.

Supervised learning is what we’ll focus on for the rest of this post, but that’s not because unsupervised learning is any less useful or interesting. In fact, unsupervised learning is becoming increasingly important as the algorithms get better because it can be used without having to label the data with the correct answer.

Side note: There are lots of other types of machine learning algorithms. But this is a pretty good place to start.

That’s cool, but does being able to estimate the price of a house really count as “learning”?

As a human, your brain can approach most any situation and learn how to deal with that situation without any explicit instructions. If you sell houses for a long time, you will instinctively have a “feel” for the right price for a house, the best way to market that house, the kind of client who would be interested, etc. The goal of Strong AI research is to be able to replicate this ability with computers.

But current machine learning algorithms aren’t that good yet — they only work when focused a very specific, limited problem. Maybe a better definition for “learning” in this case is “figuring out an equation to solve a specific problem based on some example data”.

Unfortunately “Machine Figuring out an equation to solve a specific problem based on some example data” isn’t really a great name. So we ended up with “Machine Learning” instead.

Of course if you are reading this 50 years in the future and we’ve figured out the algorithm for Strong AI, then this whole post will all seem a little quaint. Maybe stop reading and go tell your robot servant to go make you a sandwich, future human.

Let’s write that program!

So, how would you write the program to estimate the value of a house like in our example above? Think about it for a second before you read further.

If you didn’t know anything about machine learning, you’d probably try to write out some basic rules for estimating the price of a house like this:
```
def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
  price = 0
```
```
  # In my area, the average house costs $200 per sqft
  price_per_sqft = 200
```
```
  if neighborhood == "hipsterton":
    # but some areas cost a bit more
    price_per_sqft = 400
```
```
  elif neighborhood == "skid row":
    # and some areas cost less
    price_per_sqft = 100
```
```
  # start with a base price estimate based on how big the place is
  price = price_per_sqft * sqft
```
```
  # now adjust our estimate based on the number of bedrooms
  if num_of_bedrooms == 0:
    # Studio apartments are cheap
    price = price — 20000
  else:
    # places with more bedrooms are usually
    # more valuable
    price = price + (num_of_bedrooms * 1000)
```
```
 return price
```
If you fiddle with this for hours and hours, you might end up with something that sort of works. But your program will never be perfect and it will be hard to maintain as prices change.

Wouldn’t it be better if the computer could just figure out how to implement this function for you? Who cares what exactly the function does as long is it returns the correct number:
```
def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
  price = <computer, plz do some math for me>
```
```
  return price
```
One way to think about this problem is that the price is a delicious stew and the ingredients are the number of bedrooms, the square footage and the neighborhood. If you could just figure out how much each ingredient impacts the final price, maybe there’s an exact ratio of ingredients to stir in to make the final price.

That would reduce your original function (with all those crazy if’s and else’s) down to something really simple like this:
```
def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
 price = 0
```
```
 # a little pinch of this
 price += num_of_bedrooms * .841231951398213
```
```
 # and a big pinch of that
 price += sqft * 1231.1231231
```
```
 # maybe a handful of this
 price += neighborhood * 2.3242341421
```
```
 # and finally, just a little extra salt for good measure
 price += 201.23432095
```
```
 return price
```
Notice the magic numbers in bold — .841231951398213, 1231.1231231, 2.3242341421, and 201.23432095. These are our weights. If we could just figure out the perfect weights to use that work for every house, our function could predict house prices!

A dumb way to figure out the best weights would be something like this:

Step 1:

Start with each weight set to 1.0:
```
def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
  price = 0
```
```
  # a little pinch of this
  price += num_of_bedrooms * 1.0
```
```
  # and a big pinch of that
  price += sqft * 1.0
```
```
  # maybe a handful of this
  price += neighborhood * 1.0
```
```
  # and finally, just a little extra salt for good measure
  price += 1.0
```
```
  return price
```
Step 2:

Run every house you know about through your function and see how far off the function is at guessing the correct price for each house:

Use your function to predict a price for each house.

For example, if the first house really sold for $250,000, but your function guessed it sold for $178,000, you are off by $72,000 for that single house.

Now add up the squared amount you are off for each house you have in your data set. Let’s say that you had 500 home sales in your data set and the square of how much your function was off for each house was a grand total of $86,123,373. That’s how “wrong” your function currently is.

Now, take that sum total and divide it by 500 to get an average of how far off you are for each house. Call this average error amount the cost of your function.

If you could get this cost to be zero by playing with the weights, your function would be perfect. It would mean that in every case, your function perfectly guessed the price of the house based on the input data. So that’s our goal — get this cost to be as low as possible by trying different weights.

Step 3:

Repeat Step 2 over and over with every single possible combination of weights. Whichever combination of weights makes the cost closest to zero is what you use. When you find the weights that work, you’ve solved the problem!

Mind Blowage Time

That’s pretty simple, right? Well think about what you just did. You took some data, you fed it through three generic, really simple steps, and you ended up with a function that can guess the price of any house in your area. Watch out, Zillow!

But here’s a few more facts that will blow your mind:
1. Research in many fields (like linguistics/translation) over the last 40 years has shown that these generic learning algorithms that “stir the number stew” (a phrase I just made up) out-perform approaches where real people try to come up with explicit rules themselves. The “dumb” approach of machine learning eventually beats human experts.
2. The function you ended up with is totally dumb. It doesn’t even know what “square feet” or “bedrooms” are. All it knows is that it needs to stir in some amount of those numbers to get the correct answer.
3. It’s very likely you’ll have no idea why a particular set of weights will work. So you’ve just written a function that you don’t really understand but that you can prove will work.
4. Imagine that instead of taking in parameters like “sqft” and “num_of_bedrooms”, your prediction function took in an array of numbers. Let’s say each number represented the brightness of one pixel in an image captured by camera mounted on top of your car. Now let’s say that instead of outputting a prediction called “price”, the function outputted a prediction called “degrees_to_turn_steering_wheel”. You’ve just made a function that can steer your car by itself!
Pretty crazy, right?

What about that whole “try every number” bit in Step 3?

Ok, of course you can’t just try every combination of all possible weights to find the combo that works the best. That would literally take forever since you’d never run out of numbers to try.

To avoid that, mathematicians have figured out lots of clever ways to quickly find good values for those weights without having to try very many. Here’s one way:

First, write a simple equation that represents Step #2 above:

This is your cost function.

Now let’s re-write exactly the same equation, but using a bunch of machine learning math jargon (that you can ignore for now):

θ is what represents your current weights. J(θ) means the ‘cost for your current weights’.

This equation represents how wrong our price estimating function is for the weights we currently have set.

If we graph this cost equation for all possible values of our weights for number_of_bedrooms and sqft, we’d get a graph that might look something like this:

The graph of our cost function looks like a bowl. The vertical axis represents the cost.

In this graph, the lowest point in blue is where our cost is the lowest — thus our function is the least wrong. The highest points are where we are most wrong. So if we can find the weights that get us to the lowest point on this graph, we’ll have our answer!

So we just need to adjust our weights so we are “walking down hill” on this graph towards the lowest point. If we keep making small adjustments to our weights that are always moving towards the lowest point, we’ll eventually get there without having to try too many different weights.

If you remember anything from Calculus, you might remember that if you take the derivative of a function, it tells you the slope of the function’s tangent at any point. In other words, it tells us which way is downhill for any given point on our graph. We can use that knowledge to walk downhill.

So if we calculate a partial derivative of our cost function with respect to each of our weights, then we can subtract that value from each weight. That will walk us one step closer to the bottom of the hill. Keep doing that and eventually we’ll reach the bottom of the hill and have the best possible values for our weights. (If that didn’t make sense, don’t worry and keep reading).

That’s a high level summary of one way to find the best weights for your function called batch gradient descent. Don’t be afraid to dig deeper if you are interested on learning the details.

When you use a machine learning library to solve a real problem, all of this will be done for you. But it’s still useful to have a good idea of what is happening.

What else did you conveniently skip over?

The three-step algorithm I described is called multivariate linear regression. You are estimating the equation for a line that fits through all of your house data points. Then you are using that equation to guess the sales price of houses you’ve never seen before based where that house would appear on your line. It’s a really powerful idea and you can solve “real” problems with it.

But while the approach I showed you might work in simple cases, it won’t work in all cases. One reason is because house prices aren’t always simple enough to follow a continuous line.

But luckily there are lots of ways to handle that. There are plenty of other machine learning algorithms that can handle non-linear data (like neural networks or SVMs with kernels). There are also ways to use linear regression more cleverly that allow for more complicated lines to be fit. In all cases, the same basic idea of needing to find the best weights still applies.

Also, I ignored the idea of overfitting. It’s easy to come up with a set of weights that always works perfectly for predicting the prices of the houses in your original data set but never actually works for any new houses that weren’t in your original data set. But there are ways to deal with this (like regularization and using a cross-validation data set). Learning how to deal with this issue is a key part of learning how to apply machine learning successfully.

In other words, while the basic concept is pretty simple, it takes some skill and experience to apply machine learning and get useful results. But it’s a skill that any developer can learn!

Is machine learning magic?

Once you start seeing how easily machine learning techniques can be applied to problems that seem really hard (like handwriting recognition), you start to get the feeling that you could use machine learning to solve any problem and get an answer as long as you have enough data. Just feed in the data and watch the computer magically figure out the equation that fits the data!

But it’s important to remember that machine learning only works if the problem is actually solvable with the data that you have.

For example, if you build a model that predicts home prices based on the type of potted plants in each house, it’s never going to work. There just isn’t any kind of relationship between the potted plants in each house and the home’s sale price. So no matter how hard it tries, the computer can never deduce a relationship between the two.

You can only model relationships that actually exist.

So remember, if a human expert couldn’t use the data to solve the problem manually, a computer probably won’t be able to either. Instead, focus on problems where a human could solve the problem, but where it would be great if a computer could solve it much more quickly.

How to learn more about Machine Learning

In my mind, the biggest problem with machine learning right now is that it mostly lives in the world of academia and commercial research groups. There isn’t a lot of easy to understand material out there for people who would like to get a broad understanding without actually becoming experts. But it’s getting a little better every day.

If you want to try out what you’ve learned in this article, I made a course that walks you through every step of this article, including writing all the code. Give it a try!

If you want to go deeper, Andrew Ng’s free Machine Learning class on Coursera is pretty amazing as a next step. I highly recommend it. It should be accessible to anyone who has a Comp. Sci. degree and who remembers a very minimal amount of math.

Also, you can play around with tons of machine learning algorithms by downloading and installing SciKit-Learn. It’s a python framework that has “black box” versions of all the standard algorithms.

If you liked this article, please consider signing up for my Machine Learning is Fun! Newsletter:

Also, please check out the full-length course version of this article. It covers everything in this article in more detail, including writing the actual code in Python. You can get a free 30-day trial to watch the course if you sign up with this link.

You can also follow me on Twitter at @ageitgey, email me directly or find me on linkedin. I’d love to hear from you if I can help you or your team with machine learning.

每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，欢迎关注开源日报。交流QQ群：202790710；微博：https://weibo.com/openingsource；电报群 https://t.me/OpeningSourceOrg