开源日报 - Page 102 of 262

开源日报第634期：《特殊手法 stickyfill》

9 12 月, 2019

开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，坚持阅读《开源日报》，保持每日学习的好习惯。
今日推荐开源项目：《特殊手法 stickyfill》
今日推荐英文原文：《Teach What You Know to Learn It Better》

今日推荐开源项目：《特殊手法 stickyfill》传送门：GitHub链接
推荐理由：看惯了平常的 CSS 布局法？兴许是时候使用一下新的布局方法了。这个项目是对 CSS 中 sticky 定位方法的强化，sticky 主要对页面的滚动做出反应，当页面滚动时，元素将粘在边框上以达到一些特定的显示效果。而以前支持这个的浏览器少之又少，所以这个项目通过模拟效果来假装让浏览器支持它们。在这个效果可以更好在其他浏览器中使用的现在，的确可以考虑将其应用于一些特定的页面中。

今日推荐英文原文：《Teach What You Know to Learn It Better》作者：Garrett Vargas
原文链接：https://medium.com/better-programming/teach-what-you-know-to-learn-it-better-9b6c8765964d
推荐理由：教导他人也是分享知识的一种途径，而分享会引发思想的交互来加深理解
Teach What You Know to Learn It Better

What I learned by leading an Alexa-development workshop
I’ve been building Alexa skills for the past few years, with over a dozen published skills in the Amazon Alexa store.

I’m a lifelong learner and first got involved with Alexa three years ago. It was a great way to learn a new programming language and play with emerging technology. And this love of learning has kept me involved as new voice features and patterns have emerged. A few months ago, I decided to share this passion by hosting an in-person course. I partnered with a local Seattle company, Mindspand, to list and promote it.

It can be intimidating putting on a workshop. There are plenty of free professional articles, tutorials, and videos available. Why should someone pay to learn from me in person?

What I kept reminding myself was I wanted to do something different to share my knowledge. There are many different ways that people learn. I learn through a combination of self-exploration and small in-person sessions. This lets me try things hands-on with someone who’s been through it themselves. It was this element I wanted to bring into my workshop — and that I felt I could teach in my own authentic style.

It was a lot of fun putting on this course. The class had a small-group setting to give people 1:1 attention. The loose presentation style allowed me to make it an interactive environment. I used a series of exercises to take people from a “Hello, World!” application to a full rental-car search skill. Along the way, I demonstrated some nuances I’d worked through and shared tools I’d used, like Jargon, to simplify content management.

One thing that surprised me was how much people helped each other during the session. There was a variety of skill levels in the audience. I had tailored the lessons for beginners but included some exercises to challenge more veteran developers. The small, let’s-focus-on-learning environment helped bring about that collaborative learning.

Always learning

Whatever your passion, teaching others can be not only be satisfying but be a way to learn more yourself. I got several questions about features or nuances that I’d long since worked around. But when asked from a fresh perspective, it forced me to investigate to explain why something was the way it was. In at least one case, I found a new solution as a result.

Seeing the questions students asked helped me refine the course for the next group. Last month, I modified this workshop for a business-focused hackathon at the University of Washington. Even when teaching, it’s a continuous learning process.

So what’s your passion? And what’s stopping you from sharing with others? It’s a rewarding way to teach others something you love, while giving you deeper insight and understanding.
下载开源日报APP：https://opensourcedaily.org/2579/
加入我们：https://opensourcedaily.org/about/join/
关注我们：https://opensourcedaily.org/about/love/
开源日报第633期：《众所周知……？ 0.30000000000000004》

8 12 月, 2019

开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，坚持阅读《开源日报》，保持每日学习的好习惯。
今日推荐开源项目：《众所周知……？ 0.30000000000000004》
今日推荐英文原文：《Python’s Advantages and Disadvantages Summarized》

今日推荐开源项目：《众所周知……？ 0.30000000000000004》传送门：GitHub链接
推荐理由：众所周知，1+2=3，10+20=30，那么理所当然的：0.1+0.2=0.3。这当然是正确的，但是计算机兴许不那么想，如果试一下，很容易就会发现 0.3 后面多了点奇奇怪怪的东西……这就是浮点数运算不精确的命运啊（叹息）。这个项目列举了很多会造成这种迷之算错的例子，兴许在使用浮点数时应当小心谨慎。

今日推荐英文原文：《Python’s Advantages and Disadvantages Summarized》作者：Jun Wu
原文链接：https://medium.com/better-programming/pythons-advantages-and-disadvantages-summarized-212b5fdf8883
推荐理由：对于 Python 的优缺点总结
Python’s Advantages and Disadvantages Summarized

Are you a Python programmer? What are your thoughts?
Python’s been gaining popularity year over the year for the past few years. In a 2019 Stack Overflow survey, Python was named the second-most beloved language of developers.

Python’s often cited as being multipurpose and easy to be productive in. Its domination in machine learning and data science is well-known.

In recent years, Python’s web-development frameworks — such as Django and Flask — are gaining popularity. For many developers, Python’s living up to its hype as it’s selected for more of a variety of projects in various organizations. For new developers, Python is increasingly becoming the first language to learn to enter the job market.

But for all of its hype, developers who’ve worked with Python for some time have noticed some limitations.

This article is a summary of some of the observations about Python from the development community without injecting my own experiences. Popular discussion threads in Quora, Stack Overflow, and various blog posts are the references for this article. For a complete list of references, see the end.
Advantages of Python
There are many advantages of Python. I’ve only listed the top few.
Python is multipurpose
You’ll find Python being used for front end, back end, data science, machine learning, web development, and mobile-app development. It’s one of the most multipurpose languages around. The fact it can be used for so many programming paradigms, from object-oriented to functional, makes it very versatile.

Python’s also often named as a great scripting language for people who don’t develop software but need to use scripts to retrieve information.
Python is used by many in education to teach programming basics
Python is increasingly used at the university level to teach programming basics to students.

This is mainly due to its ease of use and ease of learning. There’s a lower barrier to entry for new programmers to learn Python. This has also allowed many self-taught developers to transition into development roles.
Python has a huge, supportive community
One of the most cited advantages of Python is its inclusive community of programmers.

At the entry level, you can often find answers to your questions very easily in Python developer forums. There are various expert blogs online that are dedicated to spreading knowledge about Python — not only to learn it but to master it.

Python’s also open source, which allows it to continuously improve with the help of long-time community members.
Python’s dominance in data science and machine learning
With the integration of data science and machine learning into today’s systems, Python’s robust, mature libraries — such as scikit-learn, keras, pandas, etc. — are unmatched. These libraries allow engineers who work with data to be productive much more quickly.
Python’s a language of choice for prototyping
The simplicity of the Python language makes it a perfect choice to use for prototyping.

Often, when the prototyping is done with a lot of productivity and little effort, it’s then easier to select Python for developing the actual product.
Python code is simple, readable, and can be more maintainable
Python’s easy-to-use explicit syntax allows code to be more readable.

If used correctly, codebases for Python can be more maintainable than those done in other languages.
Python’s ability to integrate into other enterprise applications
Python’s extensible. You can write part of your project in Python and part of it in C++ or C. This is why many use it to plug into enterprise applications. You can also embed Python into the code of other programming languages.
Python’s matured
In the last few years, Python has matured into a language that’s chosen for large projects by Google, Yahoo, YoutTube, Dropbox, etc. — not to mention by nontechnology companies in finance, healthcare, education, etc.
Disadvantages of Python
Python has several disadvantages that developers often cite.
Python’s memory consumption and garbage collection
Python’s memory usage is high. Memory consumption has to be carefully tracked throughout a project. It’s often essential to follow best coding practices to sidestep potential memory issues.

Python uses reference counting in its garbage collection, which can be misunderstood.
Python’s dynamically typed
Many in data science and machine learning prefer statically typed languages.

Type error is the one thing you don’t want to be worried about when working with a lot of data. Using a statically typed language can potentially reduce a lot of bugs in the system.
Multithreading in Python is not really multithreading
Due to the global interpreter lock (GIL), Python’s multithreading model doesn’t truly have threads running at the same time. One thread can only hold the GIL at one time, which means you’re not achieving true multithreading.

Many use a different implementation of Python — such as IronPython, Jython, PyPy, or a C extension — to achieve true multithreading.
Python in functional programming
Python’s functional programming can be difficult to read, which defeats the purpose of using a language like Python that’s known to be simple and easy to read. Functional optimisations aren’t supported by the compiler. It also lacks some features of functional programming that need to be implemented manually.
Conclusion
With many obvious advantages over disadvantages, we’ll likely see Python grow further in the years to come.

Where will the usage of Python be headed is anyone’s guess.

Data science and machine learning are where Python has the potential to dominate. But in these areas, where large amounts of data will need to be processed often using functional programming, the disadvantages of using Python, such as speed, dynamic typing, multithreading, memory consumption, etc., will become more prominent.
下载开源日报APP：https://opensourcedaily.org/2579/
加入我们：https://opensourcedaily.org/about/join/
关注我们：https://opensourcedaily.org/about/love/
开源日报第632期：《每年一读 Annual-Reading-List》

7 12 月, 2019
开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，坚持阅读《开源日报》，保持每日学习的好习惯。
今日推荐开源项目：《每年一读 Annual-Reading-List》
今日推荐英文原文：《How To Write a Readable README》

今日推荐开源项目：《每年一读 Annual-Reading-List》传送门：GitHub链接
推荐理由：这个项目是作者计划每年都要读一遍的书籍列表，今年快结束了他也准备读完了。读书，可以说是用自己去理解他人思想的一种方式，对于那些传授知识的书来说，我们用自己的知识去理解，就能够获得书中的技能；相对应的，对于那些传授一种思想的书来说，我们自然是用思想去理解，这样的书随着时间的不断经过，用于理解的思想变得不同，自然会有不同的收获，这就是为什么有些书可以反复阅读的原因。
今日推荐英文原文：《How To Write a Readable README》作者：Jackson Z.
原文链接：https://medium.com/better-programming/how-to-write-a-readable-readme-590ae6124f69
推荐理由：欲看项目先读 README
How To Write a Readable README

Stop confusing developers with READMEs
A README is a project’s first impression for developers.

A well-written README can bring traction and support to the project, but the quality of a README, compared to code, is less emphasized. As a result, developers usually put the least effort into their README.

A README should achieve four goals with as few words as possible:
- State the objective: State the problem that the project is trying to solve.
- Define the audience: Define who can/should the project.
- Demo usage: Demonstrate how to start using the project.
- Clarify workflow (optional): Clarify how to collaborate and contribute.
Step 1. State the Objective
I suggest using one sentence of the following form:
- my awesome project is a utility/tool/framework/etc. to help my target audience do some task.
Here are a few examples from successful projects:
- PyTorch: An open-source machine learning framework that accelerates the path from research prototyping to production deployment. (If converted to our form, it will be: “PyTorch is a machine learning framework to help (everyone) accelerate the path from research prototyping to production deployment.”)
- React: React is a JavaScript library for building user interfaces. (React’s objective statement is almost the same as our format except their audience is everyone).
Step 2. Define the Audience
- Define the group of users who can use it: operating system, programming language, and framework limitations.
- Define the group of users that can/can’t benefit from the project.
Here is an example from project Moby:

credit: example target audience from Moby
Step 3. Demo Usage
- Give users intuition about how the project works.
- Help users get started using the project.
To achieve both, I prefer to use examples (code that works) close to real-world use cases with only the basics (leave out the fancy configurations).

The users can understand the project from the example code and they can copy-paste the code to get started using the project.

Here is an example from TensorFlow:

credit: tensorflow.com
Step 4. Clarify the Workflow (Optional, Only if the Project Accepts Contributors)
The directory/project structure:

credit: example directory structure from algorithm-visualizer

Developer setup:

credit: example developer setup from algorithm-visualizer

Best practices: Define the standard for the quality of work.

credit: example best practices from scikit-learn

Submission process: Define the process of submitting code/review/documentation.

credit: example submission process from scikit-learn
Bonus: A Tool I Built to Improve README Readability Automagically
The README is code, so it deserves linter and continuous integration too.

The readable-readme project（https://github.com/tianhaoz95/readable-readme） is a continuous integration tool based on GitHub Actions to control the readability/quality of READMEs.

When added to the workflow, readable-readme will generate a quality report for all the README files upon push/pull requests.

credit: example usage from readable-readme

This is what the generated report looks like:

credit: readme quality report from readable-readme

Note: The readable-readme project is at a super early stage. All kinds of contributions are welcome. Let’s make READMEs great again!

Opinions are my own and not the views of my employer.
下载开源日报APP：https://opensourcedaily.org/2579/
加入我们：https://opensourcedaily.org/about/join/
关注我们：https://opensourcedaily.org/about/love/
开源日报第631期：《游戏引擎GDvelop》

6 12 月, 2019

开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，坚持阅读《开源日报》，保持每日学习的好习惯。
今日推荐开源项目：《游戏引擎GDevelop》
今日推荐英文原文：《Is Learning Feasible?》

今日推荐开源项目：《游戏引擎GDevelop》传送门：GitHub链接
推荐理由：想要设计一款游戏吗，即使你没有编程基础？试试 GDvelop 吧。或者，如果你是资深程序员？没有关系， GDvelop 完全开源免费，你可以修改其源代码，从而实现更多的功能。其游戏制作界面相当易用，零代码，只需要基本的逻辑就能创作出各种有意思的游戏。 GDevelop 采用拖曳与事件方式，让你快速加入游戏中的物件并且为其安排事件行为，可以用非常视觉化的方法制作游戏，支持本机(SFML) 与 HTML5(网页) 两种游戏引擎。
今日推荐英文原文：《Is Learning Feasible?》作者：Eugen Hotaj
原文链接：https://towardsdatascience.com/is-learning-feasible-8e9c18b08a3c
推荐理由：即将步入2020，机器学习早已被大肆炒作，但是，机器学习真的已经发展到人们想象的那么先进了吗？Eugen Hotaj向我们提出了“机器学习真的可靠吗”的问题，并从从机器学习的原理出发进行阐述。
Is Learning Feasible?

A quick foray into the foundations of Machine Learning
It’s late 2019 and the hype around Machine Learning has grown to unreasonable proportions. It seems like every week a new state of the art result is reported, a slicker Deep Learning library surfaces on GitHub, and OpenAI releases a GPT-2 model with more parameters. With the mind-bending results we’ve seen so far, it’s hard not to get swept up in the hype.

Others, however, warn that Machine Learning has overpromised and underdelivered. They worry that such continued action could cause research funding to dry up, leading to another Artificial Intelligence winter. This would be bad news indeed. Therefore, in order to curb the enthusiasm around Machine Learning, and single-handedly prevent the inevitable AI winter, I will convince you that learning is not feasible.

This article was adapted from the book “Learning from Data” [1].
The Learning Problem
Fundamentally, the goal of Machine Learning is to find a function g which most closely approximates some unknown target function f.

For example, in Supervised Learning, we are given the value of f at some points X, and we use these values to help us find g. More formally, we are given a dataset D = {(x₁, y₁), (x₂, y₂), …, (xₙ, yₙ)} where yᵢ = f(xᵢ) for xᵢ ∈ X. We can use this dataset to approximate f by finding a function g such that g(x) ≈ f(x) on D. However, the goal of learning is not to simply approximate f well on D, but to approximate f well everywhere. That is, we want to generalize. To drive this point home, take a look at the figure below.

(Two different approximations to the function f.)
Both g and g’ perfectly match f on the training data (denoted by the “x”s in the figure). However, g is clearly a better approximator of f than g’. What we want is to find a function like g, not g’.
Why Learning is not Feasible
Now that we’ve set up the learning problem, it’s worth stressing that the target function f is genuinely unknown. If we knew the target function, we wouldn’t need to do any learning at all, instead we would just use it directly. And, since we don’t know what f is, regardless of what g we ultimately chose, there is no way for us to verify how well it approximates f. This may seem like a trivial observation, but the rest of this article will hopefully demonstrate its ramifications.

Suppose that the target function f is a Boolean function with a three-dimensional input space, i.e. f: X → {0, 1}, X = {0, 1}³. This is a convenient setup for us to analyze since it’s easy to enumerate all possible functions in the space [2]. We want to approximate f with a function g by making use of the training data below.

(Training data available to approximate f. Here, x is the input and y = f(x). For clarity, we use ○ to indicate an output of 0 and ● to indicate an output of 1.)
Since we want to find the best possible approximation to f, let’s only keep the functions in the space which agree with the training data and get rid of all the others.

(All possible target functions which agree with the training data.)
Great, now we’re down to only 8 possible target functions, labeled f₁ to f₈, in the figure above. For convenience we’ve also kept around the labels of the training data. Notice that we don’t have access to the out of sample labels since we don’t know what the target function is.

Now the question becomes, which function do we choose as g? Since we don’t know which f is the real target function, maybe we can try to hedge our bets by choosing a g which agrees with the most potential target functions. We can even use the training data to guide our choice (learning!). This sounds like a promising approach. But first, let’s define what we mean by “agrees with the most potential target functions.”

Learning the best hypothesis
In order to determine how good a hypothesis (i.e. a choice of g) is, we first need to define an objective. One straightforward objective is to give a hypothesis one point each time it agrees with one of the fᵢ on the out of sample input. For example, if the hypothesis agrees with f₁ on all inputs, it gets 3 points, if it agrees on 2 inputs, it gets 2 points, and so on for all fᵢ. It’s easy to see that the maximum number of points a hypothesis can score is 24. However, to score 24 points, the hypothesis would have to agree with all possible target functions on all possible inputs. This is, of course, impossible since the hypothesis would have to output both 0 and 1 for the same input. If we ignore impossible scenarios, then the maximum number of points is only 12.

The only thing left to do now is pick a hypothesis and evaluate it. Since we’re smart humans, we can look at our training data and “learn” if there is a pattern there. The XOR function seems to be a good candidate: output 1 if the input has an odd number of 1s, otherwise output 0. It agrees with the training data perfectly, so let’s see how well it does on the objective.

(Computing the objective value for the XOR hypothesis.)
If we go through the exercise, we see that, on the out of sample data, XOR agrees with one function exactly (+3 points), three functions on two inputs (+6 points), three functions on one input (+3 points), and does not agree at all with one function (+0 points), giving a grand total of 12 points. Perfect! We were able to find a hypothesis which achieves the highest possible score. There may be other hypotheses that are as good, but we’re guaranteed that these is no hypothesis which is better than the XOR. There may even be hypotheses that are worse which get less than 12 points. In fact, let’s find one such bad hypothesis just so we can sanity check that XOR is indeed a good candidate for the target function.

The worst possible hypothesis?
If XOR is one of the best hypotheses, an obvious candidate for a bad hypothesis is its exact opposite, ¬XOR. For each input,¬XOR outputs 0 if XOR outputs 1 and vice versa.

(Computing the objective value for the ¬XOR hypothesis.)
Going through the exercise again, we now see that, on the out of sample data, ¬XOR agrees with one function exactly (+3 points), three functions on two inputs (+6 points), three functions on one input (+3 points), and does not agree at all with one function (+0 points), giving a perfect score… again. In fact, any function we choose as our hypothesis will get a perfect score. This is because any function must agree with one — and only one — of the possible target functions exactly. From there, it’s easy to see that any fᵢ will match three of the other fᵢ on two inputs, three on one input, and one on no inputs.

What’s shocking is that we can choose a perfect hypothesis without even looking at the training data. Even worse, we can choose a hypothesis which completely disagrees with the training data, and it would still achieve a perfect score. In fact, this is exactly what ¬XOR did.

This means that all functions are equally likely to agree with the target function on the out of sample data, regardless of whether they agree on the training.

Put another way, the fact that a function agrees with the training data gives no information whatsoever on how well it will agree with the target function out of sample. Since all we actually care about is performance out of sample, why do we even need to learn? This isn’t just true for Boolean functions either, but for any possible target function whatsoever. Here’s a figure which illustrates the point more clearly.
(Valid values the function f could take for input > n.)
Knowing f on (-∞, n) tells us nothing about its behavior on [n, ∞). Any of the dotted lines could be valid continuations of f.

Probability, the Saving Grace
Maybe this is the part you expect me to say that it’s all been a huge joke and explain how I pulled the wool over your eyes. Well, it’s no joke, but clearly Machine Learning works in practice, so there must be something we’re missing.

The reason learning “works” is due to a crucial, fundamental assumption known as the i.i.d assumption: both the training data and the out of sample data are independent and identically distributed. This simply means that our training data should be representative of the out of sample data. This is a reasonable assumption to make: we can’t possibly expect what we learn on the training data to generalize out of sample if the out of sample data differs significantly. Put more simply, if all our training data comes from y = 2x, then we can’t possibly learn anything about y = -9x² + 5.

The i.i.d assumption also roughly holds in practice — or at least does not break down too significantly. For example, if a bank want to build a model using information from past clients (training data), they can reasonably assume that their new clients (out of sample data) won’t be too different. We didn’t make this assumption in the Boolean function example above, so we couldn’t rule out any fᵢ from being the true target function. If we do, however, assume that the output of the target function on the out of sample data is similar to the training data, then clearly f₁ or f₂ most likely to be the target function. But of course, we can never know for sure ?.

Eugen Hotaj November 29, 2019
下载开源日报APP：https://opensourcedaily.org/2579/
加入我们：https://opensourcedaily.org/about/join/
关注我们：https://opensourcedaily.org/about/love/