开源日报 - Page 182 of 262

2019年1月19日：开源日报第317期

19 1 月, 2019
开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，坚持阅读《开源日报》，保持每日学习的好习惯。
今日推荐开源项目：《动可生静 awesome-dynamic-analysis》
今日推荐英文原文：《Will AI reduce the need for technical writers?》

今日推荐开源项目：《动可生静 awesome-dynamic-analysis》传送门：GitHub链接
推荐理由：动可生静，此乃常理也。这个项目就是一个对软件进行的动态分析——很实际的去执行它们。相比之下隔壁的静态分析列表就受欢迎的多……这边的列表里有不少都是关于内存错误的，如果有需要的话也可以用来检查自己的代码。同样的是，如果能够依靠自己的力量就能避免写出导致内存错误的代码就再好不过了。
今日推荐英文原文：《Will AI reduce the need for technical writers?》作者：James Scott
原文链接：https://medium.com/@scottydocs/will-ai-reduce-the-need-for-technical-writers-79ccfb53429f
推荐理由：AI 的出现会给作家带来什么样的影响？实际上，AI 肯定不能完全取代作家
Will AI reduce the need for technical writers?
The late Stephen Hawking famously said that artificial intelligence would be “either the best, or the worst thing, ever to happen to humanity.” As a technical writer documenting AI technology, I’d like to believe it would be the former and it’s fair to say there have already seen positive signs about how AI might shape and assist with documentation in the future.

A number of tech companies have already dipped their toes into the water, with some developing AI-assisted, predictive content generation and others harnessing machine learning to predict the help content the end-user is looking for.

AI-assisted content writing
Google introduced its natural language processing development, Smart Compose, to help Gmail users write emails in May 2018. They combined a bag-of-words (BoW) model with a recurring-neural-network (RNN) model to predict the next word or word sequence the user will type depending on the prefix word sequence previously typed.

The Google Smart Compose RNN-LM model architecture.

Smart Compose was trained with a corpus of billions of words, phrases and sentences, and Google carried out vigorous testing to make sure the model only memorised the common phrases used by its many users. The Google team admits it has more work to do and is working on incorporating their own personal language models that will more accurately emulate each individual’s style of writing.

Arguably one of their biggest challenges they face is reducing the human-like biases and subsequent unwanted and prejudicial word associations that AI inherits from a corpus of written text. Google cited research by Caliskan et al which found that machine learning models absorbed stereotyped biases. At the most basic level, the models associated floral words with something pleasant and insect words as something unpleasant. More worryingly, they also found the machine-learning models adopted racial and gender biases.

Caliskan et al found AI models absorbed gender and racial biases from a large body of text.

The researchers found that a group of European American names were more readily associated with pleasant than unpleasant terms when compared to a batch of African American names. They also found inherited biases included associating female names and words with family and the arts while male names were associated with career and science words.

Yonghui Wu, the principal engineer from the Google Brain team, said: “…these associations are deeply entangled in natural language data, which presents a considerable challenge to building any language model. We are actively researching ways to continue to reduce potential biases in our training procedures.”
AI-assisted spelling and grammar
With 6.9 million daily users, one of the most common tools people are using to assist with the accuracy of their spelling and grammar is Grammarly. The company are experimenting with AI techniques including machine learning and natural language processing so the software can essentially understand human language and come up with writing enhancements.

Grammarly has been training different algorithms to measure the coherence of naturally-written text using a corpus of text compiled from public sources including Yahoo Answers, Yelp Reviews and government emails. The models they have experimented with include:
- Entity-based model — Tracks specific entities in the text. For example, if it finds the word “computer” in multiple sentences it assumes they are related to each other.
- Lexical coherence graph — Treats sentences as nodes in a graph with connections (“edges”) for sentences that contain pairs of similar words. For example, it connects sentences containing “macbook” and “chromebook” because they are both probably about laptop computers.
- Deep learning model — Neural networks that capture the meaning of each sentence and are able to combine these sentence representations to learn the overall meaning of a document.
Although this is still a work in progress, Grammarly’s long term goal long term isn’t just to point out spelling and grammatical mistakes. They hope their machine-learning models will be able to inform you how coherent your writing is and also highlight which passages are difficult to follow.

AI-assisted help content
Some companies have also started to look at ways that AI can help with predicting and directing readers to the exact content they are looking for. London-based smart bank Monzo launched a machine-learning powered help system for their mobile app in August 2017.

Their data science team trained a model of recurring-neural-networks (RNNs) with commonly asked customer support questions to make predictions based on a sequence of actions or “event time series”. For example:

User logs in → goes to Payments → goes to Scheduled payments → goes to Help.

At this point, the help system provides suggestions relating to payments and as the user starts typing, returns common questions and answers relating to scheduled payments. Their initial tests showed they were able to reach 53% accuracy when determining the top three potential categories that users were looking for out of 50 possible support categories. You can read more about their help search algorithm here.
What does the future hold?
I think we will see more content composition tools like Smart Compose emerge but it will take a lot of time and work before they can be trained to effectively assist with the complex and often unpredictable user-oriented content that technical writers are tasked with producing on a daily basis.

I’m sure some technical writers are already using Grammarly to assist with their spelling and grammar. It can be a really powerful tool to ensure your text is not only accurate but in the future it will be able to measure whether your writing is actually coherent and readable. I’ve dabbled with Grammarly but found it either wasn’t compatible with certain tools or prevented some of my applications from working so it became a bit of hindrance rather than an assistant for me personally. No doubt these are kinks they will iron out at some point down the line.
I do see the benefits of AI-assisted help like Monzo have created so it would be awesome to see some more development in this area. It could potentially be something that saves customer support and documentation teams a lot of time in terms of predicting and directing end-users to answers before they’ve even asked a question.
So are we there yet? Not quite… but I think some very promising foundations have been laid. While some technical writers might be concerned, I think it will be a very long time before AI is advanced enough to supplant our role in the development teams. So don’t be afraid of AI, for the time being these tools are only going to make our lives easier!
下载开源日报APP：https://opensourcedaily.org/2579/
加入我们：https://opensourcedaily.org/about/join/
关注我们：https://opensourcedaily.org/about/love/
2019年1月18日：开源日报第316期

18 1 月, 2019

开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，坚持阅读《开源日报》，保持每日学习的好习惯。
今日推荐开源项目：《静可生动 awesome-static-analysis》
今日推荐英文原文：《Competing with AI for Your Design Job》

今日推荐开源项目：《静可生动 awesome-static-analysis》传送门：GitHub链接
推荐理由：静可生动，此乃常理也。这个项目就是一个对软件进行的静态分析——但是不实际去执行它们。如果想要找出自己代码中的一些可能的安全漏洞的话可能这个清单中的某物会派上用场。当然了，不要过度依赖它们，试着自己写出更好更没有缺陷的代码才是最好的。
今日推荐英文原文：《Competing with AI for Your Design Job》作者：Cosmin Serban
原文链接：https://blog.prototypr.io/competing-with-ai-for-your-design-job-8903e3cbd96e
推荐理由：如果有一天 AI 作为设计师工作了，设计师会怎么样呢
Competing with AI for Your Design Job

In opposition to natural intelligence demonstrated by (most) humans, AI — or artificial intelligence — is a series of cognitive processes (like understanding, learning, decision making and, if needed, self-correction) imitated by computer systems.

A robotic revolution has already started and most of us are enjoying the perks without necessarily thinking about all the ways this change could affect the way we make a living.

Drones, self-driving cars, Pandora, voice to text, smart personal assistants (like Cortana, Alexa, Siri or Google Now), these things are not a glimpse into the future anymore. Most of us are using some, if not all of them on a regular basis.

So, I guess it’s safe to say that the “future” is not near, but already here. Robots and AI are consistently used in most fields and are being prepared to take over even more tasks.

And because no one likes lengthy introductions, let me get straight to the point of this article and explain why some people have the right to feel uncomfortable regarding the speed AI is replacing humans by stealing their jobs and who can rest easy, at least for a while longer.

The short, easy, answer is this one: AI can easily take your job (and do it far better than you ever could), if said job consists of several repetitive, systematic steps or sequences. These steps can be simplified and taken over by AI, something that is already happening and is predicted to replace around two billion people by 2030.

Since design and art in general is what we’re interested in, we are lucky. We can relax for now. Art and creative thinking tend to be more complex and these areas are not something AI can deal with. Not in the near future, at least. What separates and puts us, human designers, well above AI is not our ability to think but how we do it. We’re emotionally driven creatures.

Sketching out ideas for the illustration.

An especially good day or a terrible one can make a huge difference in our work. The same exact idea can be translated in a million different ways by a million different designers just because of that — emotions.

That’s what makes us unpredictable and that’s exactly why it works.

As a designer, I can’t tell you how many times I added or deleted something by mistake, hit a wrong key that made me rethink the whole concept. Or how many times did I take an obvious mistake and made it the main focus of my campaign. Even if AI gets to a point where it can do our jobs, are we willing to give up the possibility of “happy accidents” as it so often happens in the creative field?

“When you ask creative people how they did something, they feel a little guilty because they didn’t really do it, they just saw something.” — Steve Jobs

AI needs rules to follow. It needs problems it can find solutions to. It can take over tasks linked to speed and optimization, taking care of basic prototypes, analyzing massive amount of data, suggesting design modification and even translating your work in multiple languages, if needed. Netflix in already using AI for jobs like artwork and banner translation. The system takes the main version and translates it almost instantly. All the designer team has to do is go through the graphics and approve or adjust them.

On a different approach, Nutella came up with the idea to launch a campaign with the intention of selling 7 million jars, every single one with a unique label. AI handled the task wonderfully and created the 7 million versions in a matter of hours, which then sold all Nutella jars in less than a month. What an insanely boring job for a designer and what a perfect job for AI, don’t you think?

The limitations we have as humans, AI can find a way to surpass them. But the limitations AI has in the artistic field… well, that’s a whole different story and one that might take a while to solve. That thing that makes you, you is not that easily replaceable. Instead of fearing the future and all the ways AI could change or, much worse, steal our jobs, how about if we try to look at it as team work? Like it or not, AI still needs us, much more than we need it.

In conclusion
I believe the healthiest way for now, is to look at AI as more of an assistant or collaborator than the enemy. For designers, there are some areas that are truly boring and unstimulating and they can transform a creative day into a creative blockage. And this is where AI will come in handy.

You can spend your time dealing with the innovative, exciting part of your job and let AI deal with the tedious, repetitive tasks. Wouldn’t that make your life much easier?
下载开源日报APP：https://opensourcedaily.org/2579/
加入我们：https://opensourcedaily.org/about/join/
关注我们：https://opensourcedaily.org/about/love/
2019年1月17日：开源日报第315期

17 1 月, 2019
开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，坚持阅读《开源日报》，保持每日学习的好习惯。
今日推荐开源项目：《0费 Linux 牌 super-inspire-end》
今日推荐英文原文：《How I document — 7 tips for starting, writing and maintaining your documentation》

今日推荐开源项目：《0费 Linux 牌 super-inspire-end》传送门：GitHub链接
推荐理由：让你得到一个简单快速的临时 Linux 系统，只需要一些基础的设置就能开箱即用。这个项目可以让你通过 web 临时使用一个干净的 Linux 系统，在你刚好暂时需要它们的时候相当管用——比如尝试 GitHub 上的项目或者为帮助暂时没装 Linux 系统的同学，当然它还有更多方便的用处，可使用的系统种类之后也会增加，如果期待后续发展的话不妨关注一下。
今日推荐英文原文：《How I document — 7 tips for starting, writing and maintaining your documentation》作者：Curtis Stanier
原文链接：https://medium.com/@crstanier/how-i-document-7-tips-for-starting-writing-and-maintaining-your-documentation-6e858af64c0
推荐理由：写好文档的 7 个小技巧
How I document — 7 tips for starting, writing and maintaining your documentation
ocumentation is one of those topics that elicits a groan whenever it’s mentioned. We all know we need to do it, we all know we should do it, and we will get around it — after we’ve finished this task, or tomorrow, probably.

Documentation is the thing that we will do eventually. But many teams are wasting time and focus by delaying it. Ask yourself how many times you’ve answered the same question via Slack? the ticket that was delayed because we had to wait for someone to get back from vacation? Or that service that went down in the middle of the night and no-one knew how to starting fixing it? How much time was lost through all those occurrences and think through how many times a week, month, year that is happening in your organisation.

Patrick Kua touched on the topic of documentation in one of his talks and he phrased it wonderfully:

Although, the agile manefesto preaches “Working software over comprehensivedocumentation” — it doesn’t preach it over no documentation.

Far too many organisations and teams continue to function through tribal knowledge. Tribal knowledget is unwritten information not widely known by many others and results in a slow pace delivery and a lower quality of product. Good documentation is not a silver bullet to all your problems but it is an underutilised building block to get there.

Documentation can be anything; API contracts, Initiative Requirements, Business Cases or Meeting Minutes — anything that has value or meaning to others inside (or outside the organsiation).

Documentation is one of those topics that elicits a groan precisely because we feel guilty for not doing it sooner. We feel guilty because we’re very aware that everything I’ve mentioned above is true. Often, the task seems overwhelming but it doesn’t have to be painful to acomplish. In this article I share 7 tips for producing more useful documentation, faster, and embedding it into your working routine.

1. Just start —I promise this is not an easy-out answer to open the list, I mean it. Starting can be as simple as creating a new page with the title — this is the most basic building block you need. The simple act of creating a starting point can be enough for you to start filling in basic details you need to get down. The file is already open, it’s no huge step to contribute something now. Your first small win can lead you on to another — giving you a feeling of acomplishment.

2. Start simple — people are often daunted by the scope of what needs to be produced. Documentation can be extensive with introductions, glossaries, detailed descriptions, diagrams and references. However, it doesn’t all have to be there from the start. One of the many mistakes people make is feeling they have to produce everything in one go. When we build products we aim to start with an MVP — the same should be done for documentation.

Someone needs to get from A to B. The first iteration isn’t perfect but it’s better than walking… Credit — Henrik Kniberg

Most Product Managers are now familiar with the above. It was produced by Henrik Kniberg in an attempt to explain what an MVP should actually be. A summary of the topic at hand is a good starting point — throw in a few links to other important resources (screenshots, designs) and you’ve got yourselve the basics of documentation. It may just be a skateboard — but it will get them on their way.

3. Update when someone asks you a question — this is probably my biggest piece of advice on the topic of documentation. Its requires a bit more discipline but I found it dramatically improved the quality documentation I was writing and made my life easier. I discovered this tip quite by accident. One day, I was in the process of updating a page when I was asked a question by a colleague. The question was a good one and the answer wasn’t documented anywhere. Incidentally, the page I was working was actually the best place for it so I quickly added a section and sent them a link. Unrelated, a few days later someone asked me a very similar question and instead of rewriting my answer I sent them the same link again. It was that moment that I realised if someone has a question — it is likely someone else with have it too. Updating content in response to questions has additional benefits — you’re getting direct feedback from the audience on what they need so you can tailor it. Once people know where to find your (awesome) documentation, they will likely share the links and even reference it before asking you. Double win!

4. Ask others to contribute — as the old saying goes, many hands make light work. You should not be expected to soley product every piece of documentation. Save in one-person operations, you’re going to be working in a team and there everyone has something to contribute. If there is a topic that needs adding but you don’t have the detail, ask one of your colleagues to add a paragraph or two of their expertise. I can assure you, they will likely do a better job because the content is being produced by the person that truly understands it. Similarly, if someone lets you know something is out of date — kindly ask them to amend the document with the right information. The majority of our documentation should be a living — you’re not the only life-support machine around.

5. Let it grow organically — this really is the outcome of the three pieces of advice above. If something needs adding, priortise adding that now — otherwise, keep coming back and contributing as things occur to you and when you find time. Updating documentation should not be a laborious, several-hour task but something you can do in small chunks of time. For example, If I have a meeting finish early I will try to use that to quickly update any docs that were affected by it. This approach means you’re constantly breaking down the task into easy and manageable chunks around your existing workflows. My only exception to this method is if I am producing a diagram or drawings. These are more involved and require a higher time investment but dramatically improve the effectiveness of the document. Be sure to carve out some time for them. This leads us nicely onto Tip #6…

6. Apply formatting — this becomes even more important as a document or page gets larger. A wall of text is difficult to scan for relevant information and simply result in people asking you directly (Again). We’re not writing Shakespear — your job is to be as efficient and effective at communication information as possible. You have a vast array of communication tools at your disposal beyond prose so be sure to use them:
- Sections & Headings — make the most of nesting content in a structure way. Add a Table of Contents to the start of the document (most tools allow you to generate these on the fly). It makes it really quick for someone to jump to the relevant section and discover their answer
- Images — a picture is worth a 1,000 words. Particularly when referencing designs or flows benefits the reader to see what you’re referencing
- Process diagrams are an excellent way at helping others understand a flow or decision tree in a way that paragraphs of text will not. Don’t forget to include a key and a clear scope so your reader knows that the diagram does or does not cover.
- Tables are perfect for displaying structured data. I often use them to support lists with multiple dimensions (e.g. outlining the project team — name, role, contact details and responsibilities).
7. Know when to prune —If you apply the tips above, your documentation is going to grow. As it does you need to know when to step in. Some sections will become outdated and need marking as such. I’m adverse from removing it entirely because older knowledge is the type most easily lost through organisational churn so I do my best to protect it. Other sections will become so large you should split them into their own documents (or sub-documents). It’s much hard to give specific examples of when you need to do this as it will depend very much on your particular case. A rough gauge, however, may be when you find people starting to ask you and the team questions about something that is already covered in your documentation — it may be there but your audience is no longer finder it accessible.

So that’s it, the 7 simple tips I us to improve the coverage and quality of the documentation for the Products I own. Since I started with this method, I have over 50 pages authored and more than300 edits on our team’s confluence space alone. But those aren’t the metrics that really matter. The real measure of success is the increasing numbers of people referencing our team’s documentation, fewer people asking us the same questions and most satisfyingly of all — individuals reaching out thanking us for the helpfulness of our docs.
下载开源日报APP：https://opensourcedaily.org/2579/
加入我们：https://opensourcedaily.org/about/join/
关注我们：https://opensourcedaily.org/about/love/
2019年1月16日：开源日报第314期

16 1 月, 2019
开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，坚持阅读《开源日报》，保持每日学习的好习惯。
今日推荐开源项目：《CSS 的各种课题 iCSS》
今日推荐英文原文：《Intro to Deep Learning》

今日推荐开源项目：《CSS 的各种课题 iCSS》传送门：GitHub链接
推荐理由：这是个使用 CSS 来完成各种课题，从而发现一些关于 CSS 的新细节的项目。其中有些相对有意思的课题，比如说用 CSS 实现斜线，实现导航栏的切换效果；当然了也有一些关于 CSS 属性的正儿八百的问题。通过完成一些任务来学习 CSS 的细节是个不错的选择，如果有想要深入学习 CSS 的朋友可以考虑一试。
今日推荐英文原文：《Intro to Deep Learning》作者：Anne Bonner
原文链接：https://towardsdatascience.com/intro-to-deep-learning-c025efd92535
推荐理由：推荐给深度学习新手的神经网络介绍
Intro to Deep Learning

Photo by ibjennyjenny on Pixabay

We live in a world where, for better and for worse, we are constantly surrounded by deep learning algorithms. From social network filtering to driverless cars to movie recommendations, and from financial fraud detection to drug discovery to medical image processing (…is that bump cancer?), the field of deep learning influences our lives and our decisions every single day.

In fact, you’re probably reading this article right now because a deep learning algorithm thinks you should see it.

Photo by tookapic on Pixabay

If you’re looking for the basics of deep learning, artificial neural networks, convolutional neural networks, (neural networks in general…), backpropagation, gradient descent, and more, you’ve come to the right place. In this series of articles, I’m going to explain what these concepts are as simply and comprehensibly as I can.

If you get into this, there’s an incredible amount of really in-depth information out there! I’ll make sure to provide additional resources along the way for anyone who wants to swim a little deeper into these waters. (For example, you might want to check out Efficient BackProp by Yann LeCun, et al., which is written by one of the most important figures in deep learning. This paper looks specifically at backpropagation, but also discusses some of the most important topics in deep learning, like gradient descent, stochastic learning, batch learning, and so on. It’s all here if you want to take a look!)

For now, let’s jump right in!

Photo by Laurine Bailly on Unsplash

What is deep learning?
Really, it’s just learning from examples. That’s pretty much the deal.

At a very basic level, deep learning is a machine learning technique that teaches a computer to filter inputs (observations in the form of images, text, or sound) through layers in order to learn how to predict and classify information.

Deep learning is inspired by the way that the human brain filters information!
Photo by Christopher Campbell on Unsplash

Essentially, deep learning is a part of the machine learning family that’s based on learning data representations (rather than task-specific algorithms). Deep learning is actually closely related to a class of theories about brain development proposed by cognitive neuroscientists in the early ’90s. Just like in the brain (or, more accurately, in the theories and model put together by researchers in the 90s regarding the development of the human neocortex), neural networks use a hierarchy of layered filters in which each layer learns from the previous layer and then passes its output to the next layer.

Deep learning attempts to mimic the activity in layers of neurons in the neocortex.

In the human brain, there are about 100 billion neurons and each neuron is connected to about 100,000 of its neighbors. Essentially, that is what we’re trying to create, but in a way and at a level that works for machines.

Photo by GDJ on Pixabay

The purpose of deep learning is to mimic how the human brain works in order to create some real magic.

What does this mean in terms of neurons, axons, dendrites, and so on? Well, the neuron has a body, dendrites, and an axon. The signal from one neuron travels down the axon and is transferred to the dendrites of the next neuron. That connection (not an actual physical connection, but a connection nonetheless) where the signal is passed is called a synapse.

Photo by mohamed_hassan on Pixabay

Neurons by themselves are kind of useless, but when you have lots of them, they work together to create some serious magic. That’s the idea behind a deep learning algorithm! You get input from observation, you put your input into one layer that creates an output which in turn becomes the input for the next layer, and so on. This happens over and over until your final output signal!

So the neuron (or node) gets a signal or signals (input values), which pass through the neuron, and that delivers the output signal. Think of the input layer as your senses: the things you see, smell, feel, etc. These are independent variables for one single observation. This information is broken down into numbers and the bits of binary data that a computer can use. (You will need to either standardize or normalize these variables so that they’re within the same range.)

What can our output value be? It can be continuous (for example, price), binary (yes or no), or categorical (cat, dog, moose, hedgehog, sloth, etc.). If it’s categorical you want to remember your output value won’t be just one variable, but several output variables.

Photo by Hanna Listek on Unsplash

Also, keep in mind that your output value will always be related to the same single observation from the input values. If your input values were, for example, an observation of the age, salary, and vehicle of one person, your output value would also relate to the same observation of the same person. This sounds pretty basic, but it’s important to keep in mind.

What about synapses? Each of the synapses gets assigned weights, which are crucial to Artificial Neural Networks (ANNs). Weights are how ANNs learn. By adjusting the weights, the ANN decides to what extent signals get passed along. When you’re training your network, you’re deciding how the weights are adjusted.

What happens inside the neuron? First, all of the values that it’s getting are added up (the weighted sum is calculated). Next, it applies an activation function, which is a function that’s applied to this particular neuron. From that, the neuron understands if it needs to pass along a signal or not.

This is repeated thousands or even hundreds of thousands of times!

Photo by Geralt on Pixabay

We create an artificial neural net where we have nodes for input values (what we already know/what we want to predict) and output values (our predictions) and in between those, we have a hidden layer (or layers) where the information travels before it hits the output. This is analogous to the way that the information you see through your eyes is filtered into your understanding, rather than being shot straight into your brain.

Image by Geralt on Pixabay

Deep learning models can be supervised, semi-supervised, and unsupervised.

Say what?
Supervised learning
Are you into psychology? This is essentially the machine version of “concept learning.” You know what a concept is (for example an object, idea, event, etc.) based on the belief that each object/idea/event has common features.

The idea here is that you can be shown a set of example objects with their labels and learn to classify objects based on what you have already been shown. You simplify what you’ve learned from what you’ve been shown, condense it in the form of an example, and then you take that simplified version and apply it to future examples. We really just call this “learning from examples.”

Photo by Gaelle Marcel on Unsplash

(Dress that baby up a little and it looks like this: concept learning refers to the process of inferring a Boolean-valued function from training examples of its input and output.)

In a nutshell, supervised machine learning is the task of learning a function that maps an input to an output based on example input-output pairs. It works with labeled training data made up of training examples. Each example is a pair that’s made up of an input object (usually a vector) and the output value that you want (also called the supervisory signal). Your algorithm supervises the training data and produces an inferred function which can be used to map new examples. Ideally, the algorithm will allow you to classify examples that it hasn’t seen before.

Basically, it looks at stuff with labels and uses what it learns from the labeled stuff to predict the labels of other stuff.

Classification tasks tend to depend on supervised learning. These tasks might include
- Detecting faces, identities, and facial expressions in images
- Identifying objects in images like stop signs, pedestrians, and lane markers
- Classifying text as spam
- Recognizing gestures in videos
- Detecting voices and identifying sentiment in audio recordings
- Identifying speakers
- Transcribing speech-to-text
Semi-supervised learning
This one is more like the way you learned from the combination of what your parents explicitly told you as a child (labeled information) combined with what you learned on your own that didn’t have labels, like the flowers and trees that you observed without naming or counting them.

Photo by Robert Collins on Unsplash

Semi-supervised learning does the same kind of thing as supervised learning, but it’s able to make use of both labeled and unlabeled data for training. In semi-supervised learning, you’re often looking at a lot of unlabeled data and a little bit of labeled data. There are a number of researchers out there who have found that this process can provide more accuracy than unsupervised learning but without the time and costs associated with labeled data. (Sometimes labeling data requires a skilled human being to do things like transcribe audio files or analyze 3D images in order to create labels, which can make creating a fully labeled data set pretty unfeasible, especially when you’re working with those massive data sets that deep learning tasks love.)

Semi-supervised learning can be referred to as transductive (inferring correct labels for the given data) or inductive (inferring the correct mapping from X to Y).

In order to do this, deep learning algorithms have to make at least one of the following assumptions:
- Points that are close to each other probably share a label (continuity assumption)
- The data like to form clusters and the points that are clustered together probably share a label (cluster assumption)
- The data lie on a manifold of lower dimension than the input space (manifold assumption). Okay, that’s complicated, but think of it as if you were trying to analyze someone talking — you’d probably want to look at her facial muscles moving her face and her vocal cords making sound and stick to that area, rather than looking in the space of all images and/or all acoustic waves.
Unsupervised learning (aka Hebbian Learning)
Unsupervised learning involves learning the relationships between elements in a data set and classifying the data without the help of labels. There are a lot of algorithmic forms that this can take, but they all have the same goal of mimicking human logic by searching for hidden structures, features, and patterns in order to analyze new data. These algorithms can include clustering, anomaly detection, neural networks, and more.

Clustering is essentially the detection of similarities or anomalies within a data set and is a good example of an unsupervised learning task. Clustering can produce highly accurate search results by comparing documents, images, or sounds for similarities and anomalies. Being able to go through a huge amount of data to cluster “ducks” or the perhaps the sound of a voice has many, many potential applications. Being able to detect anomalies and unusual behavior accurately can be extremely beneficial for applications like security and fraud detection.

Photo by Andrew Wulf on Unsplash

Back to it!
Deep learning architectures have been applied to social network filtering, image recognition, financial fraud detection, speech recognition, computer vision, medical image processing, natural language processing, visual art processing, drug discovery and design, toxicology, bioinformatics, customer relationship management, audio recognition, and many, many other fields and concepts. Deep learning models are everywhere!

There are, of course, a number of deep learning techniques that exist, like convolutional neural networks, recurrent neural networks, and so on. No one network is better than the others, but some are definitely better suited to specific tasks.

Deep Learning and Artificial Neural Networks
The majority of modern deep learning architectures are based on Artificial Neural Networks (ANNs) and use multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output of the previous layer for its input. What they learn forms a hierarchy of concepts where each level learns to transform its input data into a slightly more abstract and composite representation.

Image by ahmedgad on Pixabay

That means that for an image, for example, the input might be a matrix of pixels, then the first layer might encode the edges and compose the pixels, then the next layer might compose an arrangement of edges, then the next layer might encode a nose and eyes, then the next layer might recognize that the image contains a face, and so on. While you may need to do a little fine tuning, the deep learning process learns which features to place in which level on its own!

Photo by Cristian Newman on Unsplash

The “deep” in deep learning just refers to the number of layers through which the data is transformed (they have a substantial credit assignment path (CAP), which is the chain of transformations from input to output). For a feedforward neural network, the depth of the CAPs is that of the network and the number of hidden layers plus one (the output layer). For a recurrent neural network, a signal might propagate through a layer more than once, so the CAP depth is potentially unlimited! Most researchers agree that deep learning involves CAP depth >2.

Convolutional Neural Networks
One of the most popular types of neural networks is convolutional neural networks (CNNs). The CNN convolves (not convolutes…) learned features with input data and uses 2D convolutional layers, which means that this type of network is ideal for processing (2D) images. The CNN works by extracting features from images, meaning that the need for manual feature extraction is eliminated. The features are not trained! They’re learned while the network trains on a set of images, which makes deep learning models extremely accurate for computer vision tasks. CNNs learn feature detection through tens or hundreds of hidden layers, with each layer increasing the complexity of the learned features.

(Want to learn more? Check out Introduction to Convolutional Neural Networks by Jianxin Wu and Yann LeCun’s original article, Gradient-Based Learning Applied to Document Recognition.)

Recurrent neural networks
While convolutional neural networks are typically used for processing images, recurrent neural networks (RNNs) are used for processing language. RNNs don’t just filter information from one layer into the next, they have built-in feedback loops where the output from one layer might be fed back into the layer preceding it. This actually lends the network a sort of memory.

Generative adversarial networks
In generative adversarial networks (GANs), two neural networks fight it out. The generator network tries to create convincing “fake” data while the discriminator tries to tell the difference between the fake data and the real stuff. With each training cycle, the generator gets better at creating fake data and the discriminator gets sharper at spotting the fakes. By pitting the two against each other during training, both networks improve. (Basically, shirts vs. skins here. The home team is playing itself to improve its game.) GANs can be used for extremely interesting applications, including generating images from written text. GANs can be tough to work with, but more robust models are constantly being developed.

Deep Learning in the Future
The future is full of potential for anyone interested in deep learning. The most remarkable thing about a neural network is its ability to deal with vast amounts of disparate data. That becomes more and more relevant now that we’re living in an era of advanced smart sensors which can gather an unbelievable amount of data every second of every day. It’s estimated that we are currently generating 2.6 quintillion bytes of data every single day. This is an enormous amount of data. While traditional computers have trouble dealing with and drawing conclusions from so much data, deep learning actually becomes more efficient as the amount of data grows larger. Neural nets are capable of discovering latent structures within vast amounts of unstructured data, like raw media for example, which are the majority of data in the world.

The possibilities are endless!
下载开源日报APP：https://opensourcedaily.org/2579/
加入我们：https://opensourcedaily.org/about/join/
关注我们：https://opensourcedaily.org/about/love/