Dear Lance,
Dear Lance, 

The fact you're referring to is proved as Prop 1.14 in Levin, Peres and Wilmer's book "Rapid Mixing in Markov Chains" (2nd edition). They show the slightly more general fact that if 

\pi(y) = E_z (number of visits to y before returning to z),

where E_z denotes the expectation when the chain begins from z, then \pi(y) is a stationary measure for the transition matrix P (assuming it is irreducible and the state space is finite).

To make a probability distribution out of this they divide by the expected return time from z to z. In this if you consider \pi(z)/E_z(first return to z) that is naturally 1/E_z(first return to z), which proves the result.
Amitabha I find it hard to believe an official transcript wouldn't have the name of the college. Can you name some examples?

The engineers in the movie were looking for theoretical solutions rather than iterative numerical approximations - the reference to old math rather than new math. They wanted a human computer who could handle analytic geometry. When she made the suggestion light bulbs went off for the other engineers. She made the intuitive leap. Unlike students who studied Runge-Kutta and wrote programs in the early 60s the IBM 7090 was not online at NASA until 1961 and Fortran programming on the IBM machines was available in 1960. In the time frame of this movie writing Fortran code on IBM machines was not the widespread common practice it would soon become. A year or two makes a big difference in the transition from innovative application to common practice.

I would think having their project decided and even started a little before they show up would make sense but my question is whether you see students taking too long to get started on their eventual project and then end up getting less done. If you do then set the expectation that they all show up ready to start and have everyone have some contact before coming to make that happen.

In the movie it wasn't that she was the only one that knew about it, she was the one that thought outside the box to use it in this problem. She actually went and got a book off the shelf about it to check herself and help solve the problem. I am sure everyone knew about it, they just didn't think of to calculate the coordinates.

Are you saying that while it's good for people to know how an African American woman was working for NASA doing math, that if racism or sexism or any number of other things had stood in the way of her having that job that someone else with a math background would have gotten it and also been able to do what she did?

There is no problem in having interacting machinery. for example our "Turing machine" (or computer, if you like) can call an arbitrarily complex function by pasting the name of the function into a cell of the tape, and waiting for the answer to arrive in another cell. The processing will be provided by other machinery using these two cells of the tape as shared memory.

There has been intense interest in concurrent processing since the late 1960's (Dijkstra, Wirth, for example). In 1964, three men spent a weekend at a motel developing a concurrent operating system for a computer of 4Kbytes. They included the ex-Xerox and Microsoft luminary, Charles Simonyi (then 16 years old), and Per Brinch Hansen, famous in the 1970's for his work on concurrent programming.

For this reason, it seems odd to me that Wegner has more recently been promoting interactive computing as a new paradigm. Wegner is another luminary from the early 1970's and is extremely well informed on the Turing-completeness of computing devices.

For me, Wegner's comments serve as a coded attack on managerialism. A coded cry against the horrors of bureaucracy.

Richard Mullins
Brisbane, Australia

The unexpected power of interaction...

I saw the movie yesterday. As a white Southerner I appreciated the struggle that blacks had to endure in the South, which was doubly difficult for black women. That said the only unbelievable part of the movie was the claim that only Katherine Johnson knew anything about the Runge-Kutte Method, or more specific, Euler's Method, which is the reference in the movie. When I heard that in the movie memory bells went off.

If you have two functions and they intersect somewhere, the iterative Runge-Kutte Method is a simply way of having increasingly accurate ways of determining the coordinates where they intersect. The intersection in the movie would have been the re-entry point for the space capsule which was the intersection of the elliptical orbit with the parabolic descent path. Basically I wrote the same code as an undergraduate Chemistry student in the early 1960s on the vacuum tube IBM computer at Florida State University using Fortran language that was used by the women in the movie. Ironically I was coding it at about the same time as the time frame of the movie. Everyone studying applied mathematics at that time would have known the Runge-Kutte Method.
Mason Kelsey

Re "how much of an art machine learning still is" - a Deep Neural Network has a lot of engineering choices to make, not just gradient descent methods and threshold functions but the large scale architecture of the connections among layers, where there's a CNN, an RNN, an LSTM, etc. In comparison, SVMs, Random Forests, and regression have just a handful of hyper-parameters to tweak.
Mitch

"Where were all these cool tools when I was a kid?" - When you were a kid, the disgruntled gray hairs were envious of your tools like a keyboard and screen. 

When 'kids these days' are old they'll be marveling at the youngsters that seem to just twitch their eyelids to pilot their interstellar spacecraft.

I'm not saying you have gray hair.
Mitch

As a follow-up, the above concrete computational considerations in regard to rank-jumping in tensor network representations are surveyed abstractly in "Yellow Book" comment #91 on Scott Aaronson's Shtetl Optimized essay "My 116-page survey article on P vs. NP" (of Jan 03 2017). Such attempted compositions — from me or anyone — can rightly be appreciated as tributes to a small-yet-vital community, namely the proprietors of mathematical weblogs.<br /><br />Math weblogs require of their proprietors a sustained personal commitment that (as it seems to me and many) crucially nourishes the vitality of the 21st century's diverse STEAM enterprises. This New Year's appreciation of math weblogs, and heartfelt gratefulness for the sustained efforts of their oft-underappreciated proprietors, is therefore extended.
John Sidles

------
Lance asks "How many nodes should you have in your network? How many levels? Too many may take too long to train and could cause overfitting. Too few and you don't have enough parameters to create the function you need."
------
Algorithmic answers to these questions center upon the notion of "rank-jumping" (as at least some portions of the literature call it). 

Specifically in regard to the rank-jumping literature, a notably student-friendly multi-reference multi-example survey is Vin de Silva and Lek-Heng Lim's "Tensor rank and the ill-posedness of the best low-rank approximation problem" (SIAM Journal on Matrix Analysis and Applications, 2008).

The de Silva/Lim survey has been concretely helpful (to me) in upgrading quantum simulation codes that, dynamically and adaptively, raise-and-lower the ranks of tensor representations. Algorithms that once were ad hoc, evolve to be more nearly universal and natural (and stable too). 

Sweet! Hoorah for "Team Yellow Book"! :)

Further suggestions in regard to this "Yellow Book" literature — whether in the language of "rank jumps" or "topological closure" or any other GAGA-esque terminology — would be welcome to me and many. It's been plenty challenging (for me at least) to reduce this literature's beautiful insights to concrete algorithmic practice.
John Sidles

The idea behind convolution net is as follows: think of images and a box. Let's define a feature over the pixels in the box like the existence of a vertical line and have a neural network for it. Now the location of this box doesn't matter for the feature, so if you are looking for vertical lines in an image you can just use the same network for all of them, you can share the weights between networks for the feature. This saves a lot of weights and makes training and inference practical.

Many important papers in machine learning are about intelligent ways for saving computation time. You really don't want the number of computation steps to grow superlinearly with respect to network depth, input size, ... that would make training and inference infeasible in practice. CNNs are the reason deep learning worked in practice and beat all previous algorithms in image recognition by a large margin. Machine learning requires a good deal of engineering to have practical algorithms that you can actually run and test, even constant factors matter.
https://arxiv.org/abs/1611.01578

https://arxiv.org/abs/1505.00521

https://media.nips.cc/Conferences/2015/tutorialslides/wood-nips-probabilistic-programming-tutorial-2015.pdf

Have you tried TF's playground?
Yuval

"Convolution nets has a special first layer that captures features of pieces of the image."

This is not correct. Convolutional neural networks have many convolutional layers, anywhere, not necessarily at the first layer. These layers exploit locality and translation invariance, two important properties of image-like data. Here is a stylized example showing how convolutional neurons can recognize higher-and-higher level abstractions, from edges through noses to faces:

https://i.stack.imgur.com/Hl2H6.png
Dániel Varga

Small correction: recurrent nets represent time dependencies (or more generally, dependencies along any DAG), not feedback loops. Each computation of a recurrent net can be unfolded into a feed-forward net of depth O(input size).
Fernando Pereira

The simpsons did it: https://www.youtube.com/watch?v=w4zqR7GhrqQ
Billy Te

AH- in terms of being testable, no, its not a prediction.
To make it into a testable prediction parameters you care about (unemployment rate. inflation, global temp, number of mass shootings) and see if they go in a bad direction. If there was some measure of conflict of interest or corruption then I would use that for my prediction. I think the economy will also go bad so some parameter there also.

Sorry- this is still not really a prediction.
GASARCH

I said it would get worse and wosre. I did not say it should be banned. I don't know what the answer is.

As for the New York Times vs Pizzagate:
The New York Times (and many other Main Stream Media) helped lead us into the Iraq War which was pointless and lead to ISIS. The PIzzagate story only lead to (so far) to one deragned lunatic almost shooting up a pizza place. Gee, if you want to give an provable example of how the NYT and others are fake news, the Iraq War is a far better example then anything about Russia.
GASARCH

Number 1 isn't really much of a prediction either....

"Fake News will become worse and worse" - weren't it you who warned that the "fake news" argument might be used to silence (US) dissidents? Did you change your mind?

Among the worse "fake news" are those which are coming from US propaganda machines such as NYT, CNN and the like, and smear Russia.
Why are you silent about this? American patriotism?

So sad to see Aaronson's blog (and thinking) overtaken by politics this past year. Now this. It's a helluva drug.
carl