Top 5 Exercise Essentials for More Effective Crossfit Training

http://ift.tt/2tAhqKN

Exercise Essentials: Great Gadgets and Gear for More Effective Crossfit Training CrossFit is a strengthening and conditioning program that combines a series of varying range of movements at high intensity. It is popular for anyone who wants to boost their level of fitness and is widely used by people in the military, firefighters, and police. Not only will CrossFit improve […]

The post Top 5 Exercise Essentials for More Effective Crossfit Training appeared first on AskTheTrainer.com.



from AskTheTrainer.com http://ift.tt/1Cm42Yj
via IFTTT

A Libertarian Universal Basic Income

http://ift.tt/2suGHkR

Nobel prize winner Vernon Smith (our emeritus colleague at GMU) is a bold thinker. I have long proposed selling “government” land in the West but Vernon takes it a step further, privatize the highway network to create a permanent income fund. Essentially what Vernon is proposing is a libertarian method to fund a universal basic income. Can’t we all agree on that?

Even more than in the United States, there are many countries in the world today where the government holds trillions of dollars assets that are underutilized. Selling those assets to create a permanent income fund would be good for efficiency, liberty and equality.

…[T]he richly interconnected highway network really could be auctioned. Between major highway intersections there are alternative routes that could be auctioned to different bidders, assuring drivers of a choice of toll roads, along with state and local freeway alternatives. That competition would keep tolls affordable.

 Perhaps most important, surface transportation rights of way would be opened to new mass-transit innovations at a time when driverless vehicles are making their entrance. A few autobahns might also compete more effectively with short- to medium-haul airline routes, but you will need to resist airline opposition.

You should also consider auctioning off the Bureau of Land Management’s extensive grazing lands. Better incentives through ownership, or long-term leases, mean better stewardship and innovation. But neighboring farmers and ranchers won’t like the impact on their land prices.

How could you use the money from highway and land sales to benefit all Americans—and improve your own popularity? By creating a new Permanent Citizens Fund, invested in stocks, bonds and real estate world-wide. Every citizen would hold an equal share, with annual dividends paid in cash.

Better highways, more land for productive development plus a permanent fund sending checks to every citizen. A guaranteed basic income financed from public assets waiting to be monetized and put to work. You might even get the progressives’ vote. Have you ever made such a great deal?

If you think it’s pie in the sky, ask an Alaskan. The Alaska Permanent Fund, initiated in 1976 to distribute oil revenue, has a market value I estimate at $72,000 for each Alaskan citizen. Annual dividends began in 1982, when the public corporation that administers the fund cut the first checks for $1,000. Little wonder that Alaska is second among all the states in income equality.

The post A Libertarian Universal Basic Income appeared first on Marginal REVOLUTION.



from Marginal Revolution http://ift.tt/oJndhY
via IFTTT

How my student dropped 20 shots from her scores in 12 months

http://ift.tt/2s8wbjD

How would you like to go from shooting in the 90s to the 70s within one year? You might not believe me when I say that I have a student who accomplished this unlikely feat.

Most golfers don’t break 100, let alone break 90 in their entire lifetime,let alone in 12 months. My player did it and swept the field in a two-day tournament shooting 76-72 in windy and wet weather conditions… and she did it with less than 20 hours of golf instruction over the course of a year.

I’m going to give you the secret to her success.

As a full-time golf instructor, I work with a group of players that have a variety of different athletic aptitudes, body types, motivation levels, cognitive function, etc. I can say with confidence that no one formula works for everyone, but at the same time, I know there is a formula that works. If you can manage to get over yourself, or whatever excuses you create and just do it, you’ll see results.

My student followed the formula to a T, and because of this, she was able to pull off such a dramatic improvement in her game. I believe that everyone can improve, including you. Perhaps a 20-shot improvement in a tournament is a stretch, but it’s possible.

Here were the three key elements in the success of my student that are markers in every other success story that I’ve been a part of.

1. Find Weaknesses and Fix Them

My student was positive by nature and optimistic about her improvement, but she wasn’t delusional about her successes. She enjoyed playing well, and when she did, I recognized it. But her focus in our time together was largely on errors. We realized them, worked on ways to feel the difference between the error and the correction, and then I sent her off with drills to improve her patterns to eliminate the errors.

When her body was unable to produce the patterns due to lack of strength, I needed to outsource the job to a trusted fitness professional. She started regular sessions with Jason Meisch, my go-to for strength, conditioning, and 3D work (Jason is a TPI MP-3, GB-2, Owner of PEAK Golf Fitness). Jason and I coordinated our plan so that she was working specifically on the areas that I had outlined, beginning with his full 3D and movement evaluation. She joined his six-month, off-season program designed to structure a student’s workouts in five hours per week. The program also gave her access to a bio-feedback golf device from K-Vest called K-Player for unsupervised practice and workouts.

Her practice program was developed with her swing goals in mind. They were incorporated into workouts, which were built to work on weaknesses in her movement patterns as well as other performance limitations that were affecting her swing goals.

2. Practice with Feedback

It’s so easy to hit balls until you think you “have it,” but by then you’ve already rehearsed the wrong way for the entire time that it took you to figure it out. Some people get lucky and actually do figure it out (see: Ben Hogan or Bubba Watson), but for the rest of us mere mortals, figuring it out on our own is not a wise idea. The reason is simple; golf is a very complex movement with many required parts. The likelihood that you’re going to simply figure it out is low. You might, but in my 1,000+ hours of lessons a year, I rarely see it happen.

My student was not about figuring it out entirely on her own. Yes, she needed to feel the swing on her own, but she was open and interested in my guidance. She used 3D feedback through K-Vest that Jason and I set up for her, and she was at his gym several times a week working on her movements. Through this method of feedback, she could learn how to do some of the movements that she was supposed to do.

3. Leave Your Excuses at the Door

I can think of two-dozen reasons why I should eat more raw vegetables, but until I actually do it, it’s just a dream. The students who improve actually do what is on their game plan. They don’t make excuses for not working on it, and they don’t cut corners. My player didn’t push back along the way to her 20-shot drop. She was focused, accepted the tasks I gave her, didn’t find reasons why she couldn’t do it, and didn’t show frustration. I’m sure motivation or drive plays a big part in this piece, so perhaps the next time you’re wondering whether you really want to improve, ask yourself honestly if you’re ready to put some sweat into it. Nothing good comes for free, and that saying absolutely applies to golf.

I wish there was a magic pill that someone could take and his or her game would instantly improve. Sometimes, small technical improvements do seem like magic pills, and it’s fun when those breakthroughs happen. Keep in mind, however, that old habits die hard. Just because your swing is working in a lesson or on Sunday of your practice round doesn’t guarantee it will work all the time.

In order to keep the new movement or movements that you’re working on, you’ll have to repeat it with enough reps and in a way that your brain will remember and store as a recent memory. Often we forget, and we revert to the same old, same old. Be sure to keep an eye on whatever it is you’re supposed to be attending to in your game until you know you can repeat it under pressure in a relatively permanent way. If you do, a drop in your scores is sure to come.



from GolfWRX http://www.golfwrx.com
via IFTTT

Enthought: What’s New in the Canopy Data Import Tool Version 1.1

http://ift.tt/2t6vIzJ

New features in the Canopy Data Import Tool Version 1.1.0:
Support for Pandas v. 20, Excel / CSV export capabilities, and more

Enthought Canopy Data Import ToolWe’re pleased to announce a significant new feature release of the Canopy Data Import Tool, version 1.1.0. The Data Import Tool allows users to quickly and easily import CSVs and other structured text files into Pandas DataFrames through a graphical interface, manipulate the data, and create reusable Python scripts to speed future data wrangling. Here are some of the notable updates in version 1.1.0:

1. Support for Python 3 and PyQt
The Data Import Tool now supports Python 3 and both PyQt and PySide backends.

2. Exporting DataFrames to csv/xlsx file formats
We understand that data exploration and manipulation are only one part of your data analysis process, which is why the Data Import Tool now provides a way for you to save the DataFrame as a CSV/XLSX file. This way, you can share processed data with your colleagues or feed this processed file to the next step in your data analysis pipeline.

3. Column Sort Indicators
In earlier versions of the Data Import Tool, it was not obvious that clicking on the right-end of the column header sorted the columns. With this release, we added sort indicators on every column, which can be pressed to sort the column in an ascending or descending fashion. And given the complex nature of the data we get, we know sorting the data based on single column is never enough, so we also made sorting columns using the Data Import Tool stable (ie, sorting preserves any existing order in the DataFrame).

4. Support for Pandas versions 0.19.2 – 0.20.1
Version 1.1.0 of the Data Import Tool now supports 0.19.2 and 0.20.1 versions of the Pandas library.

5. Column Name Selection
If duplicate column names exist in the data file, Pandas automatically mangles them to create unique column names. This mangling can be buggy at times, especially if there is whitespace around the column names. The Data Import Tool corrects this behavior to give a consistent user experience. Until the last release, this was being done under the hood by the Tool. With this release, we changed the Tool’s behavior to explicitly point out what columns are being renamed and how.

6. Template Discovery
With this release, we updated how a Template file is chosen for a given input file. If multiple template files are discovered to be relevant, we choose the latest. We also sped up loading data from files if a relevant Template is used.

For those of you new to the Data Import Tool, a Template file contains all of the commands you executed on the raw data using the Data Import Tool. A Template file is created when a DataFrame is successfully imported into the IPython console in the Canopy Editor. Further, a unique Template file is created for every data file.

Using Template files, you can save your progress and when you later reload the data file into the Tool, the Tool will automatically discover and load the right Template for you, letting you start off from where you left things.

7. Increased Cell Copy and Data Loading Speeds Copying cells has been sped up significantly. We also sped up loading data from large files (>70MB in size).


Using the Data Import Tool in Practice: a Machine Learning Use Case

In theory, we could look at the various machine learning models that can be used to solve our problems and jump right to training and testing the models.

However, in reality, a large amount of time is invested in the data cleaning and data preparation process. More often than not, real-life data cannot be simply fed to a machine learning model directly; there could be missing values, the data might need further processing to remove unnecessary details and join columns to generate a clean and concise dataset.

That’s where the Data Import Tool comes in. The Pandas library made the process of data cleaning and processing has gotten easier and now, the Data Import Tool makes it A LOT easier. By letting you visually clean your dataset, be it removing, converting or joining columns, the Data Import Tool will allow you to visually operate on the data frame and look at the outcome of the operations. Not only that, the Data Import Tool is stateful, meaning that every command can be reverted and changes can be undone.

To give you a real world example, let’s look at the training and test datasets from the Occupancy detection dataset. The dataset contains 8 columns of data, the first column contains index values, the second column contains DateTime values and the rest contain numerical values.

As soon as you try loading the dataset, you might get an error. This is because the dataset contains a row containing column headers for 7 columns. But, the rest of the dataset contains 8 columns of data, which includes the index column. Because of this, we will have to skip the first row of data, which can be done from the Edit Command pane of the ReadData command.

After we set `Number of rows to skip` to `1` and click `Refresh Data`, we should see the DataFrame we expect from the raw data. You might notice that the Data Import tool automatically converted the second column of data into a `DateTime` column. The DIT infers the type of data in a column and automatically performs the necessary conversions. Similarly, the last column was converted into a Boolean column because it represents the Occupancy, with values 0/1.

As we can see from the raw data, the first column in the data contains Index values.. We can access the `SetIndex` command from the right-click menu item on the `ID` column.

Alongside automatic conversions, the DIT generates the relevant Python/Pandas code, which can be saved from the `Save -> Save Code` sub menu item. The complete code generated when we loaded the training data set can be seen below:

# -*- coding: utf-8 -*-
import pandas as pd


# Pandas version check
from pkg_resources import parse_version
if parse_version(pd.__version__) != parse_version('0.19.2'):
raise RuntimeError('Invalid pandas version')


from catalyst.pandas.convert import to_bool, to_datetime
from catalyst.pandas.headers import get_stripped_columns

# Read Data from datatest.txt


filename = 'occupancy_data/datatest.txt'
data_frame = pd.read_table(
filename,
delimiter=',', encoding='utf-8', skiprows=1,
keep_default_na=False, na_values=['NA', 'N/A', 'nan', 'NaN', 'NULL', ''], comment=None,
header=None, thousands=None, skipinitialspace=True,
mangle_dupe_cols=True, quotechar='"',
index_col=False
)


# Ensure stripping of columns
data_frame = get_stripped_columns(data_frame)


# Type conversion for the following columns: 1, 7
for column in ['7']:
valid_bools = {0: False, 1: True, 'true': True, 'f': False, 't': True, 'false': False}
data_frame[column] = to_bool(data_frame[column], valid_bools)
for column in ['1']:
data_frame[column] = to_datetime(data_frame[column])

As you can see, the generated script shows how the training data can be loaded into a DataFrame using Pandas, how the relevant columns can be converted to Bool and DateTime type and how a column can be set as the Index of the DataFrame. We can trivially modify this script to perform the same operations on the other datasets by replacing the filename.

Finally, not only does the Data Import Tool generate and autosave a Python/Pandas script for each of the commands applied, it also saves them into a nifty Template file. The Template file aids in reproducibility and speeds up the analysis process.

Once you successfully modify the training data, every subsequent time you load the training data using the Data Import Tool, it will automatically apply the commands/operations you previously ran. Not only that, we know that the training and test datasets are similar and we need to perform the same data cleaning operations on both files.

Once we cleaned the training dataset using the Data Import Tool, if we load the test dataset, it will intelligently understand that we are loading a file similar to the training dataset and will automatically perform the same operations that we performed on the training data.

The datasets are available at – http://ift.tt/2smby7Z


Ready to try the Canopy Data Import Tool?

Download Canopy (free) and click on the icon to start a free trial of the Data Import Tool today.

(NOTE: The free trial is currently only available for Python 2.7 users. Python 3 users may request a free trial by emailing canopy.support@enthought.com. All paid Canopy subscribers have access to the Data Import Tool for both Python 2 and Python 3.)


We encourage you to update the latest version of the Data Import Tool in Canopy’s Package Manager (search for the “catalyst” package) to make the most of the updates.

For a complete list of changes, please refer to the Release Notes for the Version 1.1.0 of the Tool here. Refer to the Enthought Knowledge Base for Known Issues with the Tool.

Finally, if you would like to provide us feedback regarding the Data Import Tool, write to us at canopy.support@enthought.com.


Additional resources:

Related blogs:

Watch a 2-minute demo video to see how the Canopy Data Import Tool works:

See the Webinar “Fast Forward Through Data Analysis Dirty Work” for examples of how the Canopy Data Import Tool accelerates data munging:

The post What’s New in the Canopy Data Import Tool Version 1.1 appeared first on Enthought Blog.



from Planet Python http://ift.tt/1dar6IN
via IFTTT

Show HN: My Python Solver for the On-Time Arrival Problem in Traffic Congestion

http://ift.tt/2s6Zinl

What is SOTA-Py?

SOTA-Py is a Python-based solver for the policy- and path-based "SOTA" problems, using the algorithm(s) described in Tractable Pathfinding for the Stochastic On-Time Arrival Problem (also in the corresponding arXiv preprint) and previous works referenced therein.

What is the SOTA problem? Read on...

Theory (in plain English)

What is the Stochastic On-Time Arrival problem (SOTA)?

It's the reliable routing problem:

How do you travel from point A to point B in T time under traffic?

For example, you might have a meeting in San Jose at 3pm, and one to reach in San Francisco at 4pm.
Or you might need to get from your house to the airport in less than 1 hour.

Doesn't Google Maps already solve this?

No. It doesn't let you specify a time budget. It only lets you specify a departure or arrival time, but not both.

What it (probably) gives you is the path with the least expected (average) time to your destination.

But so what? 30 minutes or 60 minutes—isn't there a single best path?

No. That would only be the case if traffic was perfectly predictable.

If you don't have a lot of time, you might need to take a riskier path (e.g. a highway), otherwise you might have no chance of reaching your destination on time. But if you have a lot of time, you might take a safer path (like side roads) that no one uses, to avoid suddenly getting stuck in the middle of, say, a highway, due to traffic.

That means your time budget can affect your route.

Policy- vs. Path-based Routing

What is the policy-based SOTA problem?

It is the case of the SOTA problem where you decide which road to take based on how much time you have left. You'd probably need a navigation device for this, since there are too many possibilities in the "policy" to print on paper.

This is what you'd prefer to do, because it can potentially give better results depending on whether you get lucky/unlucky with traffic.

This is a dynamic-programming problem, because the probability of reaching your destination on time is just the maximum probability of reaching it from each road the next intersection.

What is the path-based SOTA problem?

It is the SOTA problem in the case where you statically decide on the entire path before you depart.
You can just print out a map for this on paper, the old-fashioned way.

This is—counterintuitively!—a much tougher problem than finding the policy. Even though the solution looks simpler (it's just a path rather than a policy), it's much harder to compute. Why? Intuitively, it's because after you travel a bit, you won't necessarily be on the most optimal path anymore, so you can't make that assumption to simplify the problem initially.
By contrast, in the policy-based scenario, you always assume that your future actions are optimal, so you have an optimal subproblem structure to exploit.

The Algorithm

The (unhelpful) ultra-short version is that Dijkstra's algorithm is used for policy-based SOTA and A* is used for path-based SOTA.

The (more helpful) short version is:

  • For the policy computation, a re-visiting variant of Dijkstra's algorithm is used to obtain an optimal ordering for computing the reliabilty of each node, and a technique known as zero-delay convolution is used to perform cross-dependent convolutions incrementally to keep the time complexity quasilinear in the time budget. (A naive FFT would not do this.)
  • For the path computation, the computed policy is used as an (admissible) heuristic in A*. Note that this choice of a heuristic is critical. A poor heuristic can easily result in exponential running time.

For the long version, please see the paper linked above, and others referenced inside. The paper should (hopefully) be quite easy to follow and understand, especially as far as research papers go.

Note that the pre-processing algorithms from the paper (such as arc-potentials) are not implemented, but they should be far easier to implement than the pathfinding algorithms themselves.

The Traffic Model

This code models the travel time across every road as a mixture of Gaussian distributions (GMM) ("censored" to strictly positive travel times). It discretizes the distributions and solves the discrete problem in discrete-time.

Obviously, realistic travel times are not normally distributed. But that's the model of the data I had. Getting good traffic data is hard, and encoding it efficiently is also hard. If you don't like the current model, you'd have to change the code to accommodate other models.

The Code

Inputs

Dependencies

  • NumPy is the only hard external dependency.
  • Numba, if available, is used for compiling Python to native code (≈ 3× speedup).
  • PyGLET, if available, is used for rendering the results on the screen via OpenGL.
  • SciPy, if available, is used for slightly faster FFTs.

Map File Format

The road network and traffic data is assumed to be a concatentation of JSON objects, each as follows:

{
        "id": [10000, 1],
        "startNodeId": [1000, 0],
        "endNodeId": [1001, 0],
        "geom": { "points": [
                {"lat": 37.7, "lon": -122.4},
                {"lat": 37.8, "lon": -122.5}
        ]},
        "length": 12,
        "speedLimit": 11.2,
        "lanes": 1,
        "hmm": [
                {"mode": "go", "mean": 1.2, "cov": 1.5, "prob": 0.85},
                {"mode": "stop", "mean": 7, "cov": 0.1, "prob": 1.5E-1}
        ]
}

Note the following:

  • The HMM directly represents travel times for various "modes" of travel (stop, go, etc.) for the Gaussian mixture model.
  • The HMM is "optional". If missing, pseudo-random data is generated.
  • The length and speed limit are divided to obtain the minimum travel time across each edge (we assume an ideal world where everyone abides by the speed limit). Therefore, their individual values are not relevant; only their ratio is relevant.
  • The number of lanes is only for rendering purposes.
  • Every ID is assumed to be of the form [primary, secondary], where the secondary number is small.
    The secondary component is intended to distinguish different segments of the same road for each edge.
  • A minimum covariance is enforced in the code. (If your variance is too low, you may need to change this.)
  • No comma or brackets should delimit these objects, so the full file isn't strictly JSON.
  • For hand-checking simple cases, I recommend you set the length to be a multiple of the speed limit in order to avoid floating-point round-off error.

Maintenance (or: why is the code ugly?)

This code isn't intended to finish any job for you. It's certainly not production-quality. It's just meant to help any researchers working on this topic get started and/or cross-check their algorithm correctness.

Given that it's not meant to be used in any production, I don't plan on actively maintaining it unless I encounter bugs (or if I see enough interest from others).

Example

There's no short "getting started" code example, sorry. The main startup file is basically a (very) long example.

Usage

It's pretty self-explanatory:

python Main.py --source=65280072.0 --dest=65345534.0 --budget=1800 --network="data/sf.osm.json"

The time discretization interval is automatically chosen to be the globally minimum travel time across any edge in the network, since it should be as large as possible (for speed) and smaller than the travel time of every edge. You would need to change this in the code for greater accuracy.

Note that a time budget that is too high can cause the pathfinding algorithm to thrash exponentially, because the reliability of every path reaches 100% as your time budget increases, and the algorithm ends up trying all of them.
However, realistically, you would not need to run this algorithm for very high time budgets. A classical path would already be reliable enough.

Demo

Note that you (obviously) need both a map and traffic data to run this code. Unfortunately I can't release the dataset I used in the paper, but I have a sample map from OpenStreetMap, and the code attempts to naively fill in missing traffic data, so that should be good enough to get started.

Here's an example of what one can get in 15 seconds on my machine. The code runs in two phases:

  • As time increases, the optimal policy is computed for reachable roads farther and farther from the destination (highlighted), until the source is reached.
    Roads that can never be used to reach the destination on time are not examined.
  • Once the policy is determined, the optimal path for each time budget up to the one requested is determined, in order from high to low time budget.
    This is to demonstrate the fact that the optimal path can change depending on the time budget.

Animation

Contact

Licensing

Please refer to the license file.

For attribution, a reference to the aforementioned article (which this code is based on) would be kindly appreciated.

Questions/Comments

If you find a bug, have questions, would like to contribute, or the like, feel free to open a GitHub issue/pull request/etc.

For private inquiries (e.g. commercial licensing requests), you can find my contact information if you search around (e.g. see the paper linked above).



from Hacker News http://ift.tt/YV9WJO
via IFTTT

How to build a simple neural network in 9 lines of Python code

http://ift.tt/1grxzY5


How to build a simple neural network in 9 lines of Python code

As part of my quest to learn about AI, I set myself the goal of building a simple neural network in Python. To ensure I truly understand it, I had to build it from scratch without using a neural network library. Thanks to an excellent blog post by Andrew Trask I achieved my goal. Here it is in just 9 lines of code:

In this blog post, I’ll explain how I did it, so you can build your own. I’ll also provide a longer, but more beautiful version of the source code.

But first, what is a neural network? The human brain consists of 100 billion cells called neurons, connected together by synapses. If sufficient synaptic inputs to a neuron fire, that neuron will also fire. We call this process “thinking”.

Diagram 1

We can model this process by creating a neural network on a computer. It’s not necessary to model the biological complexity of the human brain at a molecular level, just its higher level rules. We use a mathematical technique called matrices, which are grids of numbers. To make it really simple, we will just model a single neuron, with three inputs and one output.

We’re going to train the neuron to solve the problem below. The first four examples are called a training set. Can you work out the pattern? Should the ‘?’ be 0 or 1?

Diagram 2

You might have noticed, that the output is always equal to the value of the leftmost input column. Therefore the answer is the ‘?’ should be 1.

Training process

But how do we teach our neuron to answer the question correctly? We will give each input a weight, which can be a positive or negative number. An input with a large positive weight or a large negative weight, will have a strong effect on the neuron’s output. Before we start, we set each weight to a random number. Then we begin the training process:

  1. Take the inputs from a training set example, adjust them by the weights, and pass them through a special formula to calculate the neuron’s output.
  2. Calculate the error, which is the difference between the neuron’s output and the desired output in the training set example.
  3. Depending on the direction of the error, adjust the weights slightly.
  4. Repeat this process 10, 000 times.
Diagram 3

Eventually the weights of the neuron will reach an optimum for the training set. If we allow the neuron to think about a new situation, that follows the same pattern, it should make a good prediction.

This process is called back propagation.

Formula for calculating the neuron’s output

You might be wondering, what is the special formula for calculating the neuron’s output? First we take the weighted sum of the neuron’s inputs, which is:

Next we normalise this, so the result is between 0 and 1. For this, we use a mathematically convenient function, called the Sigmoid function:

If plotted on a graph, the Sigmoid function draws an S shaped curve.

Diagram 4

So by substituting the first equation into the second, the final formula for the output of the neuron is:

You might have noticed that we’re not using a minimum firing threshold, to keep things simple.

Formula for adjusting the weights

During the training cycle (Diagram 3), we adjust the weights. But how much do we adjust the weights by? We can use the “Error Weighted Derivative” formula:

Why this formula? First we want to make the adjustment proportional to the size of the error. Secondly, we multiply by the input, which is either a 0 or a 1. If the input is 0, the weight isn’t adjusted. Finally, we multiply by the gradient of the Sigmoid curve (Diagram 4). To understand this last one, consider that:

  1. We used the Sigmoid curve to calculate the output of the neuron.
  2. If the output is a large positive or negative number, it signifies the neuron was quite confident one way or another.
  3. From Diagram 4, we can see that at large numbers, the Sigmoid curve has a shallow gradient.
  4. If the neuron is confident that the existing weight is correct, it doesn’t want to adjust it very much. Multiplying by the Sigmoid curve gradient achieves this.

The gradient of the Sigmoid curve, can be found by taking the derivative:

So by substituting the second equation into the first equation, the final formula for adjusting the weights is:

There are alternative formulae, which would allow the neuron to learn more quickly, but this one has the advantage of being fairly simple.

Constructing the Python code

Although we won’t use a neural network library, we will import four methods from a Python mathematics library called numpy. These are:

  • exp — the natural exponential
  • array — creates a matrix
  • dot — multiplies matrices
  • random — gives us random numbers

For example we can use the array() method to represent the training set shown earlier:

The ‘.T’ function, transposes the matrix from horizontal to vertical. So the computer is storing the numbers like this.

Ok. I think we’re ready for the more beautiful version of the source code. Once I’ve given it to you, I’ll conclude with some final thoughts.

I have added comments to my source code to explain everything, line by line. Note that in each iteration we process the entire training set simultaneously. Therefore our variables are matrices, which are grids of numbers. Here is a complete working example written in Python:

Also available here: http://ift.tt/2tu8cQt

Final thoughts

Try running the neural network using this Terminal command:

python main.py

You should get a result that looks like:

We did it! We built a simple neural network using Python!

First the neural network assigned itself random weights, then trained itself using the training set. Then it considered a new situation [1, 0, 0] and predicted 0.99993704. The correct answer was 1. So very close!

Traditional computer programs normally can’t learn. What’s amazing about neural networks is that they can learn, adapt and respond to new situations. Just like the human mind.

Of course that was just 1 neuron performing a very simple task. But what if we hooked millions of these neurons together? Could we one day create something conscious?

In my next blog post, I expand our neural network’s capabilities by adding a second layer of neurons.



from Hacker News http://ift.tt/YV9WJO
via IFTTT

How HBO’s Silicon Valley Built “Not Hotdog” with TensorFlow, Keras and React Native

http://ift.tt/2sSKlH1


How HBO’s Silicon Valley built “Not Hotdog” with mobile TensorFlow, Keras & React Native

The HBO show Silicon Valley released a real AI app that identifies hotdogs — and not hotdogs — like the one shown on season 4’s 4th episode (the app is now available on Android as well as iOS!)

To achieve this, we designed a bespoke neural architecture that runs directly on your phone, and trained it with Tensorflow, Keras & Nvidia GPUs.

While the use-case is farcical, the app is an approachable example of both deep learning, and edge computing. All AI work is powered 100% by the user’s device, and images are processed without ever leaving their phone. This provides users with a snappier experience (no round trip to the cloud), offline availability, and better privacy. This also allows us to run the app at a cost of $0, even under the load of a million users, providing significant savings compared to traditional cloud-based AI approaches.

The author’s development setup with the attached eGPU used to train Not Hotdog’s AI.

The app was developed in-house by the show, by a single developer, running on a single laptop & attached GPU, using hand-curated data. In that respect, it may provide a sense of what can be achieved today, with a limited amount of time & resources, by non-technical companies, individual developers, and hobbyists alike. In that spirit, this article attempts to give a detailed overview of steps involved to help others build their own apps.


  1. The App
  2. From Prototype to Production
    V0: Prototype
    V1: Tensorflow, Inception & Transfer Learning
    V2: Keras & SqueezeNet
  3. The DeepDog Architecture
    Training
    Running Neural Networks on Mobile Phones
    Changing App Behavior by Injecting Neural Networks on The fly
    What We Would Do Differently
  4. UX, DX, Biases & The Uncanny Valley of AI

1. The App

If you haven’t seen the show or tried the app (you should!), the app lets you snap a picture and then tells you whether it thinks that image is of a hotdog or not. It’s a straightforward use-case, that pays homage to recent AI research and applications, in particular ImageNet.

While we’ve probably dedicated more engineering resources to recognizing hotdogs than anyone else, the app still fails in horrible and/or subtle ways.

Conversely, it’s also sometimes able to recognize hotdogs in complex situations… According to Engadget, “It’s incredible. I’ve had more success identifying food with the app in 20 minutes than I have had tagging and identifying songs with Shazam in the past two years.”


2. From Prototype to Production

Have you ever found yourself reading Hacker News, thinking “they raised a 10M series A for that? I could build it in one weekend!” This app probably feels a lot like that, and the initial prototype was indeed built in a single weekend using Google Cloud Platform’s Vision API, and React Native. But the final app we ended up releasing on the app store required months of additional (part-time) work, to deliver meaningful improvements that would be difficult for an outsider to appreciate. We spent weeks optimizing overall accuracy, training time, inference time, iterating on our setup & tooling so we could have a faster development iterations, and spent a whole weekend optimizing the user experience around iOS & Android permissions (don’t even get me started on that one).

All too often technical blog posts or academic papers skip over this part, preferring to present the final chosen solution. In the interest of helping others learn from our mistake & choices, we will present an abridged view of the approaches that didn’t work for us, before we describe the final architecture we ended up shipping in the next section.

V0: Prototype

Example image & corresponding API output from Google Cloud Vision’s documentation

We chose React Native to build the prototype as it would give us an easy sandbox to experiment with, and would help us quickly support many devices. The experience ended up being a good one and we kept React Native for the remainder of the project: it didn’t always make things easy, and the design for the app was purposefully limited, but in the end React Native got the job done.

The other main component we used for the prototype — Google Cloud’s Vision API was quickly abandoned. There were 3 main factors:

  1. First and foremost, its accuracy in recognizing hotdogs was only so-so. While it’s great at recognizing a large amount of things, it’s not so great at recognizing one thing specifically, and there were various very common examples that would fail during our experiments with it in 2016.
  2. Because of its nature as a cloud service, it was necessarily slower than running on device (network lag is painful!), and unavailable offline. The idea of images leaving the device could also potentially trigger privacy & legal concerns.
  3. Finally, if the app took off, the cost of running on Google Cloud could have become prohibitive.

For these reasons, we started experimenting with what’s trendily called “edge computing”, which for our purposes meant that after training our neural network on our laptop, we would move export it and embed it directly into our mobile app, so that the neural network execution phase (or inference) would run directly inside the user’s phone.

V1: TensorFlow, Inception & Transfer Learning

Through a chance encounter with Pete Warden of the TensorFlow team, we had become aware of its ability to run TensorFlow directly embedded on an iOS device, and started exploring that path. After React Native, TensorFlow became the second fixed part of our stack.

It only took a day of work to integrate TensorFlow’s Objective-C++ camera example in our React Native shell. It took slightly longer to use their transfer learning script, which helps you retrain the Inception architecture to deal with a more specific image problem. Inception is the name of a family of neural architectures built by Google to deal with image recognition problems. Inception is available “pre-trained” which means the training phase has been completed and the weights are set. Most often for image recognition networks, they have been trained on ImageNet, a yearly competition to find the best neural architecture at recognizing over 20,000 different types of objects (hotdogs are one of them). However, much like Google Cloud’s Vision API, the competition rewards breadth as much as depth here, and out-of-the-box accuracy on a single one of the 20,000+ categories can be lacking. As such, retraining (also called “transfer learning”) aims to take a full-trained neural net, and retrain it to perform better on the specific problem you’d like to handle. This usually involves some degree of “forgetting”, either by excising entire layers from the stack, or by slowly erasing the network’s ability to distinguish a type of object (e.g. chairs) in favor of better accuracy at recognizing the one you care about (i.e. hotdogs).

While the network (Inception in this case) may have been trained on the 14M images contained in ImageNet, we were able to retrain it on a just a few thousand hotdog images to get drastically enhanced hotdog recognition.

The big advantage of transfer learning are you will get better results much faster, and with less data than if you train from scratch. A full training might take months on multiple GPUs and require millions of images, while retraining can conceivably be done in hours on a laptop with a couple thousand images.

One of the biggest challenges we encountered was understanding exactly what should count as a hotdog and what should not. Defining what a “hotdog” is ends up being surprisingly difficult (do cut up sausages count, and if so, which kinds?) and subject to cultural interpretation.

Similarly, the “open world” nature of our problem meant we had to deal with an almost infinite number of inputs. While certain computer-vision problems have relatively limited inputs (say, x-rays of bolts with or without a mechanical default), we had to prepare the app to be fed selfies, nature shots and any number of foods.

Suffice to say, this approach was promising, and did lead to some improved results, however, it had to be abandoned for a couple of reasons.

First The nature of our problem meant a strong imbalance in training data: there are many more examples of things that are not hotdogs, than things that are hotdogs. In practice this means that if you train your algorithm on 3 hotdog images and 97 non-hotdog images, and it recognizes 0% of the former but 100% of the latter, it will still score 97% accuracy by default! This was not straightforward to solve out of the box using TensorFlow’s retrain tool, and basically necessitated setting up a deep learning model from scratch, import weights, and train in a more controlled manner.

At this point we decided to bite the bullet and get something started with Keras, a deep learning library that provides nicer, easier-to-use abstractions on top of TensorFlow, including pretty awesome training tools, and a class_weights option which is ideal to deal with this sort of dataset imbalance we were dealing with.

We used that opportunity to try other popular neural architectures like VGG, but one problem remained. None of them could comfortably fit on an iPhone. They consumed too much memory, which led to app crashes, and would sometime takes up to 10 seconds to compute, which was not ideal from a UX standpoint. Many things were attempted to mitigate that, but in the end it these architectures were just too big to run efficiently on mobile.

V2: Keras & SqueezeNet

SqueezeNet vs. AlexNet, the grand-daddy of computer vision architectures. Source: SqueezeNet paper.

To give you a context out of time, this was roughly the mid-way point of the project. By that time, the UI was 90%+ done and very little of it was going to change. But in hindsight, the neural net was at best 20% done. We had a good sense of challenges & a good dataset, but 0 lines of the final neural architecture had been written, none of our neural code could reliably run on mobile, and even our accuracy was going to improve drastically in the weeks to come.

The problem directly ahead of us was simple: if Inception and VGG were too big, was there a simpler, pre-trained neural network we could retrain? At the suggestion of the always excellent Jeremy P. Howard (where has that guy been all our life?), we explored Xception, Enet and SqueezeNet. We quickly settled on SqueezeNet due to its explicit positioning as a solution for embedded deep learning, and the availability of a pre-trained Keras model on GitHub (yay open-source).

So how big of a difference does this make? An architecture like VGG uses about 138 million parameters (essentially the number of numbers necessary to model the neurons and values between them). Inception is already a massive improvement, requiring only 23 million parameters. SqueezeNet, in comparison only requires 1.25 million.

This has two advantages:

  1. During the training phase, it’s much faster to train a smaller network. There’s less parameters to map in memory, which means you can parallelize your training a bit more (larger batch size), and the network will converge (i.e., approximate the idealized mathematical function) more quickly.
  2. In production, the model is much smaller and much faster. SqueezeNet would require less than 10MB of RAM, while something like Inception requires 100MB or more. The delta is huge, and particularly important when running on mobile devices that may less than 100MB of RAM available to run your app. Smaller networks also compute a result much faster than bigger ones.

There are tradeoffs of course:

  1. A smaller neural architecture has less available “memory”: it will not be as efficient at handling complex cases (such as recognizing 20,000 different objects), or even handling complex subcases (like say, appreciating the difference between a New York-style hotdog and a Chicago-style hotdog)
    As a corollary, smaller networks are usually less accurate overall than big ones. When trying to recognize ImageNet’s 20,000 different objects, SqueezeNet will only score around 58%, whereas Vgg will be accurate 72% of the time.
  2. It’s harder to use transfer learning on a small network. Technically, there is nothing preventing us from using the same approach we used with Inception & Vgg, have SqueezeNet “forget” a little bit, and retrain it specifically for hotdogs vs. not hotdogs. In practice, we found it hard to tune the learning rate, and results were always more disappointing than training SqueezeNet from scratch. This could also be caused or worsened by the open-world nature of our problem.
  3. Supposedly, smaller networks rarely overfit, but this happened to us with several “small” architectures. Overfitting means that your network specializes too much, and instead of learning how to recognize hotdogs in general, it learns to recognize exactly & only the specific hotdog images you were training with. A human analogue would be visually-memorizing exactly which of the images presented to you were of a “hotdog” without abstracting that a hotdog is usually composed of a sausage in a bun, possibly with condiments, etc. If you were presented with a brand new hotdog image that wasn’t one of the ones you memorized, you would be inclined to say it’s not a hotdog. Because smaller networks usually have less “memory”, it’s easy to see why it would be harder for them to specialize. But in several cases, our small networks’ accuracy jumped up to 99% and suddenly became unable to recognize images it had not seen in training. This usually disappeared once we added enough data augmentation (stretching/distorting input images semi-randomly so instead of being trained 1,00 times on each of the 1,000 images, the network is trained on meaningful variations of the 1,000 images making it unlikely the network will memory exactly the 1,000 images and instead will have to learn to recognize the “features” of a hotdog (bun, sausage, condiments, etc.) while staying fluid/general enough not to get overly attached to specific pixel values of specific images in the training set.
Data Augmentation example from the Keras Blog.

During this phase, we started experimenting with tuning the neural network architecture. In particular, we started using Batch Normalization and trying different activation functions.

  • Batch Normalization helps your network learn faster by “smoothing” the values at various stages in the stack. Exactly why this works is seemingly not well-understood yet, but it has the effect of helping your network converge much faster, meaning it achieves higher accuracy with less training, or higher accuracy after the same amount of training, often dramatically so.
  • Activation functions are the internal mathematical functions determining whether your “neurons” activate or not. Many papers still use ReLU, the Rectified Linear Unit, but we had our best results using ELU instead.

After adding Batch Normalization and ELU to SqueezeNet, we were able to train neural network that achieve 90%+ accuracy when training from scratch, however, they were relatively brittle meaning the same network would overfit in some cases, or underfit in others when confronted to real-life testing. Even adding more examples to the dataset and playing with data augmentation failed to deliver a network that met expectations.

So while this phase was promising, and for the first time gave us a functioning app that could work entirely on an iPhone, in less than a second, we eventually moved to our 4th & final architecture.


3. The DeepDog Architecture

Design

Our final architecture was spurred in large part by the publication on April 17 of Google’s MobileNets paper, promising a new neural architecture with Inception-like accuracy on simple problems like ours, with only 4M or so parameters. This meant it sat in an interesting sweet spot between a SqueezeNet that had maybe been overly simplistic for our purposes, and the possibly overwrought elephant-trying-to-squeeze-in-a-tutu of using Inception or VGG on Mobile. The paper introduced some capacity to tune the size & complexity of network specifically to trade memory/CPU consumption against accuracy, which was very much top of mind for us at the time.

With less than a month to go before the app had to launch we endeavored to reproduce the paper’s results. This was entirely anticlimactic as within a day of the paper being published a Keras implementation was already offered publicly on GitHub by Refik Can Malli, a student at Istanbul Technical University, whose work we had already benefitted from when we took inspiration from his excellent Keras SqueezeNet implementation. The depth & openness of the deep learning community, and the presence of talented minds like R.C. is what makes deep learning viable for applications today — but they also make working in this field more thrilling than any tech trend we’ve been involved with.

Our final architecture ended up making significant departures from the MobileNets architecture or from convention, in particular:

  • We do not use Batch Normalization & Activation between depthwise and pointwise convolutions, because the Exception paper (which discussed depthwise convolutions in detail) seemed to indicate it would actually lead to less accuracy in architecture of this type (as helpfully pointed out by the author of the QuickNet paper on Reddit). This also has the benefit of reducing the network size.
  • We use ELU instead of ReLU. Just like with our SqueezeNet experiments, it provided superior convergence speed & final accuracy when compared to ReLU
  • We did not use PELU. While promising, this activation function seemed to fall into a binary state whenever we tried to use it. Instead of gradually improving, our network’s accuracy would alternate between ~0% and ~100% from one batch to the next. It’s unclear why this happened, and might just come down to an implementation error or user error. Fusing the width/height axes of our images had no effect.
  • We did not use SELU. A short investigation between the iOS & Android release led to results very similar to PELU. It’s our suspicion that SELU should not be used in isolation as a sort of activation function silver bullet, but rather — as the paper’s title implies — as part of a narrowly-defined SNN architecture.
  • We maintain the use of Batch Normalization with ELU. There are many indications that this should be unnecessary, however, every experiment we ran without Batch Normalization completely failed to converge. This could be due to the small size of our architecture.
  • We used Batch Normalization before the activation. While this is a subject of some debate these days, our experiments placing BN after activation on small networks failed to converge as well.
  • To optimize the network we used Cyclical Learning Rates and (fellow student) Brad Kenstler’s excellent Keras implementation. CLRs take the guessing game out of finding the optimal learning rate for your training. Even more importantly by adjusting the learning rate both up & down throughout your training, they help achieve a final accuracy that’s in our experience better than a traditional optimizer. For both of these reasons, we can’t conceive using anything else thant CLRs to train a neural network in the future.
  • For what it’s worth, we saw no need to adjust the α or ρ values from the MobileNets architecture. Our model was small enough for our purposes at α = 1, and computation was fast enough at ρ = 1, and we preferred to focus on achieving maximum accuracy. However, this could be helpful when attempting to run on older mobile devices, or embedded platforms.

So how does this stack work exactly? Deep Learning often gets a bad rap for being a “black box”, and while it’s true many components of it can be mysterious, the networks we use often leak information about how some of their magic work. We can look at the layers of this stack and how they activate on specific input images, giving us a sense of each layer’s ability to recognize sausage, buns, or other particularly salient hotdog features.

Training

Data quality was of the utmost importance. A neural network can only be as good as the data that trained it, and improving training set quality was probably one of the top 3 things we spent time on during this project. The key things we did to improve this were:

  • Sourcing more images, and more varied images (height/width, background, lighting conditions, cultural differences, perspective, composition, etc.)
  • Matching image types to expected production inputs. Our guess was people would mostly try to photograph actual hotdogs, other foods, or would sometimes try to trick the system with random objects, so our dataset reflected that.
  • Give lots of examples of things that are similar that may trip your network. Some of the things that look most similar to hotdogs are other foods (such as hamburgers, sandwiches, or in the case of naked hotdogs, baby carrots or even cooked cherry tomatoes). Our dataset reflected that.
  • Expect distortions: in mobile situations, most photos will be worse than the “average” picture taken with a DLSR or in perfect lighting conditions. Mobile photos are dim, noisy, taken at an angle. Aggressive data augmentation was key to counter this.
  • Additionally we figured that users may lack access to real hotdogs, so may try photographing hotdogs from Google search results, which led to its own types of distortion (skewing if photo is taken at angle, flash reflection on the screen visible moiré effect caused by taking a picture of an LCD screen with a mobile camera). These specific distortion had an almost uncanny ability to trick our network, not unlike recently-published papers on Convolutional Network’s (lack of) resistance to noise. Using Keras’ channel shift feature resolved most of these issues.
Example distortion introduced by moiré and a flash. Original photo: Wikimedia Commons.
  • Some edge cases were hard to catch. In particular, images of hotdogs taken with a soft focus or with lots of bokeh in the background would sometimes trick our neural network. This was hard to defend against as a) there just aren’t that many photographs of hotdogs in soft focus (we get hungry just thinking about it) and b) it could be damaging to spend too much of our network’s capacity training for soft focus, when realistically most images taken with a mobile phone will not have that feature. We chose to leave this largely unaddressed as a result.

The final composition of our dataset was 150k images, of which only 3k were hotdogs: there are only so many hotdogs you can look at, but there are many not hotdogs to look at. The 49:1 imbalance was dealt with by saying a Keras class weight of 49:1 in favor of hotdogs. Of the remaining 147k images, most were of food, with just 3k photos of non-food items, to help the network generalize a bit more and not get tricked into seeing a hotdog if presented with an image of a human in a red outfit.

Our data augmentation rules were as follows:

  • We applied rotations within ±135 degrees — significantly more than average, because we coded the application to disregard phone orientation.
  • Height and width shifts of 20%
  • Shear range of 30%
  • Zoom range of 10%
  • Channel shifts of 20%
  • Random horizontal flips to help the network generalize

These numbers were derived intuitively, based on experiments and our understanding of the real-life usage of our app, as opposed to careful experimentation.

The final key to our data pipeline was using Patrick Rodriguez’s multiprocess image data generator for Keras. While Keras does have a built-in multi-threaded and multiprocess implementation, we found Patrick’s library to be consistently faster in our experiments, for reasons we did not have time to investigate. This library cut our training time to a third of what it used to be.

The network was trained using a 2015 MacBook Pro and attached external GPU (eGPU), specifically an Nvidia GTX 980 Ti (we’d probably buy a 1080 Ti if we were starting today). We were able to train the network on batches of 128 images at a time. The network was trained for a total of 240 epochs, meaning we ran all 150k images through the network 240 times. This took about 80 hours.

We trained the network in 3 phases:

  • Phase 1 ran for 112 epochs (7 full CLR cycles with a step size of 8 epochs), with a learning rate between 0.005 and 0.03, on a triangular 2 policy (meaning the max learning rate was halved every 16 epochs).
  • Phase 2 ran for 64 more epochs (4 CLR cycles with a step size of 8 epochs), with a learning rate between 0.0004 and 0.0045, on a triangular 2 policy.
  • Phase 3 ran for 64 more epochs (4 CLR cycles with a step size of 8 epochs), with a learning rate between 0.000015 and 0.0002, on a triangular 2 policy.

While learning rates were identified by running the linear experiment recommended by the CLR paper, they seem to intuitively make sense, in that the max for each phase is within a factor of 2 of the previous minimum, which is aligned with the industry standard recommendation of halving your learning rate if your accuracy plateaus during training.

We were able to perform some runs on a Paperspace P5000 instance in the interest of time. In those cases, we were able to double the batch size, and found that optimal learning rates for each phase were roughly double as well.

Running Neural Networks on Mobile Phones

Even having designed a relatively compact neural architecture, and having trained it to handle situations it may find in a mobile context, we had a lot of work left to make it run properly. Trying to run a top-of-the-line neural net architecture out of the box can quickly burns hundreds megabytes of RAM, which few mobile devices can spare today. Beyond network optimizations, it turns out the way you handle images or even load TensorFlow itself can have a huge impact on how quickly your network runs, how little RAM it uses, and how crash-free the experience will be for your users.

This was maybe the most mysterious part of this project. Relatively little information can be found about it, possibly due to the dearth of production deep learning applications running on mobile devices as of today. However, we must commend the Tensorflow team, and particularly Pete Warden, Andrew Harp and Chad Whipkey for the existing documentation and their kindness in answering our inquiries.

  • Rounding the weights of our network helped compressed the network to ~25% of its size. Essentially instead of using the arbitrary stock values derived from your training, this optimization picks the N most common values and sets all parameters in your network to these values, which drastically reduces the size of your network when zipped. This however has no impact on the uncompressed app size, or memory usage. We did not ship this improvement to production as the network was small enough for our purposes, and we did not have time to quantify how much of a hit the rounding would have on the accuracy of the app.
  • Optimize the TensorFlow lib by compiling it for production with -Os
  • Removing unnecessary ops from the TensorFlow lib: TensorFlow is in some respect a virtual machine, able to interpret a number or arbitrary TensorFlow operations: addition, multiplications, concatenations, etc. You can get significant weight (and memory) savings by removing unnecessary ops from the TensorFlow library you compile for ios.
  • Other improvements might be possible. For example unrelated work by the author yielded 1MB improvement in Android binary size with a relatively simple trick, so there may be more areas of TensorFlow’s iOS code that can be optimized for your purposes.

Instead of using TensorFlow on iOS, we looked at using Apple’s built-in deep learning libraries instead (BNNS, MPSCNN and later on, CoreML). We would have designed the network in Keras, trained it with TensorFlow, exported all the weight values, re-implemented the network with BNNS or MPSCNN (or imported it via CoreML), and loaded the parameters into that new implementation. However, the biggest obstacle was that these new Apple libraries are only available on iOS 10+, and we wanted to support older versions of iOS. As iOS 10+ adoption and these frameworks continue to improve, there may not be a case for using TensorFlow on device in the near future.

Changing App Behavior by Injecting Neural Networks on The fly

If you think injecting JavaScript into your app on the fly is cool, try injecting neural nets into your app! The last production trick we used was to leverage CodePush and Apple’s relatively permissive terms of service, to live-inject new versions of our neural networks after submission to the app store. While this was mostly done to help us quickly deliver accuracy improvements to our users after release, you could conceivably use this approach to drastically expand or alter the feature set of your app without going through an app store review again.

What We Would Do Differently

There are a lot of things that didn’t work or we didn’t have time to do, and these are the ideas we’d investigate in the future:

  • More carefully tune our data-augmentation parameters.
  • Measure accuracy end-to-end, i.e. the final determination made by the app abstracting things like whether our app has 2 or many more categories, what the final threshold for hotdog recognition is (we ended up having the app say “hotdog” if recognition is above 0.90 as opposed to the default of 0.5), after weights are rounded, etc.
  • Building a feedback mechanism into the app — to let users vent frustration if results are erroneous, or actively improve the neural network.
  • Use a larger resolution for image recognition than 224 x 224 pixels — essentially using a MobileNets ρ value > 1.0

UX/DX, Biases, and The Uncanny Valley of AI

Finally, we’d be remiss not to mention the obvious and important influence of User Experience, Developer Experience and built-in biases in developing an AI app. Each probably deserve their own post (or their own book) but here are the very concrete impacts of these 3 things in our experience.

UX (User Experience) is arguably more critical at every stage of the development of an AI app than for a traditional application. There are no Deep Learning algorithms that will give you perfect results right now, but there are many situations where the right mix of Deep Learning+ UX will lead to results that are indistinguishable from perfect. Proper UX expectations are irreplaceable when it comes to setting developers on the right path to design their neural networks, setting the proper expectations for users when they use the app, and gracefully handling the inevitable AI failures. Building AI apps without a UX-first mindset is like training a neural net without Stochastic Gradient Descent: you will end up stuck in the local minima of the Uncanny Valley on your way to building the perfect AI use-case.

Source: New Scientist.

DX (Developer Experience) is extremely important as well, because deep learning training time is the new horsing around while waiting for your program to compile. We suggest you heavily favor DX first (hence Keras), as it’s always possible to optimize runtime for later runs (manual GPU parallelization, multi-process data augmentation, TensorFlow pipeline, even re-implementing for caffe2 / pyTorch).

With apologies to xkcd.

Even projects with relatively obtuse APIs & documentation like TensorFlow greatly improve DX by providing a highly-tested, highly-used, well-maintained environment for training & running neural networks.

For the same reason, it’s hard to beat both the cost as well as the flexibility of having your own local GPU for development. Being able to look at / edit images locally, edit code with your preferred tool without delays greatly improves the development quality & speed of building AI projects.

Most AI apps will hit more critical cultural biases than ours, but as an example, even our straightforward use-case, caught us flat-footed with built-in biases in our initial dataset, that made the app unable to recognize French-style hotdogs, Asian hotdogs, and more oddities we did not have immediate personal experience with. It’s critical to remember that AI do not make “better” decisions than humans — they are infected by the same human biases we fall prey to, via the training sets humans provide.


Please do get in touch if you have any questions or comments: timanglade@gmail.com

Thanks to: Mike Judge, Alec Berg, Clay Tarver, Todd Silverstein, Jonathan Dotan, Lisa Schomas, Amy Solomon, Dorothy Street & Rich Toyon, and all the writers of the show — the app would simply not exist without them.
Meaghan, Dana, David, Jay, and everyone at HBO. Scale Venture Partners & GitLab. Rachel Thomas and Jeremy Howard & Fast AI for all that they have taught me, and for kindly reviewing a draft of this post. Check out their
free online Deep Learning course, it’s awesome! JP Simard for his help on iOS. And finally, the TensorFlow team & r/MachineLearning for their help & inspiration.

… And thanks to everyone who used & shared the app! It made staring at pictures of hotdogs for months on end totally worth it 😅



from Hacker News http://ift.tt/YV9WJO
via IFTTT

The unconventional way I choose my startup investments

http://ift.tt/2t5Mh0E


GUEST:

I recently retired after working for 40 years at publicly held companies where my fiduciary duty was to my shareholders. For those 40 years I was tasked with delivering the highest possible return to my investors, without engaging in anything illegal or unethical. While I balanced short-term results with long-term value creation considering my community, customers, and employees, I always prioritized my shareholders. And while each of them had their own priorities, foremost among them were company profitability, growth, and share-price appreciation.

When I started investing for myself in startups, I decided to create my own criteria and priorities. This led to a balanced approach that equally weights four criteria, with the goal of investing in opportunities that score well across the board. I am not advocating this approach for everyone, as investing is never one-size-fits-all, but by sharing my criteria I hope to inspire you to come up with your own set of priorities.

My criteria:

1. Customers — Are the products or services delivered going to benefit the customer (e.g. improves the lives of customers, reasonable pricing)?

2. Community — Is the business model going to benefit the community? Are there potential harmful effects?

3. Returns — Is the business likely to generate an attractive return for investors (e.g. profitable, growth prospects, competitive advantages, driven or seasoned management)?

4. Employees — Are the business leaders committed to creating a great environment for their employees? (e.g. employees treated as valued team members, supportive work environment, diversity practices, success trickles down to employees)

For me, the ultimate test of how well my investments are working is how well they meet my four criteria over time and whether they deliver attractive financial outcomes. As only one investment has had an exit so far, I can only gauge the performance of the rest based on interim indicators of success or failure. I can also compare the performance of investments using these criteria against those that I made solely to achieve outsized financial gains.

Here’s how I used these criteria, with real examples of a few investments:

Opportunity A:  A startup taking an innovative approach to education for autistic children, founded by a friend with a passion for creating a better world for those with autism and success in starting, growing, and exiting a company in the financial analytics space. The startup aimed to succeed by leveraging the latest academic research and deploying software and animation tools.

Opportunity B: A startup aiming to incrementally innovate in the mortgage space by combining traditional approaches with new fintech-type technologies and offshore processing that would speed up the process, simplify the customer interfaces, and reduce origination cost and errors. The founder had proven experience and a history of successful incremental innovation in the traditional mortgage lending space.

Opportunity C: A relaunch of a failed curated produce and grocery delivery business with a newly hired CEO; the startup had initially raised and overspent a lot of capital, overextending itself. The market for food delivery was attracting new entrants, and a couple of players were starting to gain dominance. The new CEO had experience in the industry and promised to fix the mistakes made by the earlier executive team and narrow down the business model while reducing the burn rate.

I had limited information and little or no history on these startups — except in some cases the founders’ previous track records — so evaluation them required a lot of judgement and may seem arbitrary. I initially tried to rate each investment “High, Medium, or Low” but ended up modifying my approach to rating my first two criteria (benefit to customers and community) as “Yes” or “No” — trying to avoid business that would only succeed by taking advantage of less sophisticated consumers or be negative for the community (or country) at large. I also introduced an “Unclear” rating rather than force a conclusion where one was difficult to draw. Ultimately, the evaluation did more to help me organize and structure my thinking than to serve as the final say in an investment decision. The exercise proved valuable when combined with investor presentations, financial and return projections, and conversations with the founding teams and other knowledgeable experts from my network. Nobody said this was easy.

Below is the performance of these opportunities, again keeping in mind that it is not over until it is over (meaning an IPO, an exit, or a failure):

Opportunity A

  • My investment decision was heavily driven by knowing the founder and social good considerations.
  • The company’s projections fell short and new money needed to be raised to continue. The new money, under the circumstances, came at terms unfavorable to the first-round investors.
  • I invested in the second round based on (a) my assessment of the potential going forward and (b) the feeling that I had a chance to recoup my initial investment.
  • The company is now doing well and I am hoping for an exit in the next couple of years with a 1 to 2X return.
  • Conclusion: I would still invest, but a lot less than I did. The nature of the business, from developing products, hiring the right people, and mastering the sales process turned out to be difficult.

Opportunity B

  • This is one of my largest investments. I invested early and then alongside highly respected private equity money that came in as the company’s model proved itself and stood in contrast to other struggling fintech startups in the same space.
  • Steady success from the start – perhaps not dramatic enough for those who funded other potentially disruptive (but at this point less successful) startups.
  • Ironically, this company initially had a challenge in raising money because the founder was a seasoned and proven executive.
  • Conclusion: This is a difficult business with economic cycles and intensive competition, but so far I am happy with the size and prospects of my investment.

Opportunity C

  • I invested after meeting with the new CEO and appreciating his approach to turning around the company; it’s difficult to understand why startups in non-tech spaces do not manage the burn rate more carefully.
  • Success from the start, and in this case I benefitted along with the other rescue investors as the new CEO quickly reduced the burn rate and refocused the company on critical success drivers.
  • Conclusion: It could turn out to be my most lucrative investment, as the company is growing revenues at an unbelievable rate. It is up around 10X in a short period based on new capital raise to fund the fast growth; wish I had invested more – a lot more.

Investing in startups can be risky and rewarding. You have to think through prospects carefully, seek advice from those with experience, assess the economics, and appreciate the risks, but that still won’t guarantee success. One of the most critical factors driving success is the founding team.  I now look for a combination of passion for growth and innovation combined with solid experience in dealing with the areas of challenge that lie ahead. An ideal team balances the mix of someone not bogged down with experience that would keep them from disrupting, combined with some relevant experience that can help them navigate the maze of complexity without holding back innovation. For fintech, I find the need for experience on the team to be very important, while for consumer products it’s less important, unless success depends on conventional distribution channels (i.e. stores) where relevant experience is critical.

Another complicating factor is navigating forces outside the control of the business. Managing factors in the control of the company is difficult enough, but navigating external factors such as economic and credit cycles or dealing with the changing demand for loans in the secondary market does require experience. For businesses that need in-house employees (vs. outsourced resources), previous experience in recruiting talent and scaling up can also be a differentiator.

Ultimately, though, even seemingly great opportunities with dream teams may not succeed. I have learned to not overinvest in any one opportunity, only invest what I am prepared to lose, keep accumulating experience and insights, and remember that a portfolio of carefully screened, non-correlated investments may produce the best results. You can listen to others, read about approaches and results but must realize that, in the final analysis, you alone make the call and will bear the success or the failure.

S.A. Ibrahim has almost 40 years of leadership experience in the fintech sector, most recently as CEO of Radian Group. He is also active in several educational, religious, and policy related non-profit organizations. Currently, he works as an advisor and angel investor focused on socially-responsible startups.



from VentureBeat https://venturebeat.com
via IFTTT