And a brief overview of each, written by myself.

Also take a look at the MIT Probabilistic Computing Project's Reading List

- Introduction to Mathematical Logic
- by Alonzo Church
- Set Theory and Its Logic
- by Willard Van Orman Quine
- The Calculi of Lambda-Conversion
- by Alonzo Church
- Introduction to Probability
- by Dimitri P. Bertsekas & John N. Tsitsiklis
- Markov Chains and Mixing Times [pdf]
- by David A. Levin & Yuval Peres

Formal logic is the foundation of most modern scientific breakthroughs. This book provides a strong and thorough stepping stone into mathematics.

This book delves into the mathematical theory of classes. I'll
leave a quote by Zermelo, which Quine mentions in the
introduction: * "Set theory is that branch of mathematics whose
task is to investigate mathematically the fundamental notions of
'number', 'order', and 'function' taking them in their pristine,
simple form, and to develop thereby the logical foundations of all
of arithmetic and analysis."*

The content of this work has heavily served for advancement in the
field of computation. At its core lies functions and abstraction
as concepts for a formal system. Also, it's *extremely* well
written.

This book effectively communicates basic concepts in probability theory and thoroughly describes them by both motivation and formal expression.

This is a comprehensive text on Markov chains, focused on determining the rate of convergence to the stationary distribution.

- Introduction to the Theory of Computation
- by Michael Sipser
- Introduction to Algorithms
- by Thomas H. Cormen, Ronald L. Rivest, Charles E. Leiserson, & Clifford Stien
- The Structure and Interpretation of Computer Programs
- by Harold Abelson, Gerald Sussman, & Julie Sussman
- Principles of Computer System Design
- by Jerome H. Saltzer & M. Franz Kaashoek
- Artifical Intelligence: A Modern Approach
- by Stuart Russel & Peter Norvig
- Elements of Statistical Learning: Data Mining, Inference, and Prediction
- by Trevor Hastie, Robert Tibshirani, & Jerome Friedman
- Machine Learning: A Probabilistic Perspective
- by Kevin P. Murphy
- Reinforcement Learning: An Introduction [pdf]
- by Richard S. Sutton & Andrew G. Barto
- Deep Learning in Neural Networks: An Overview [arXiv]
- by Jürgen Schmidhuber

This book covers the foundations of computation: automata & languages, computability theory, and complexity theory.

This book covers algorithms and algorithmic design techniques that are crucial to efficiently solving large classes of mathematical problems.

This book teaches principles of computer programming. Its focus is on abstraction, recursion, modularity, and interpreters. It can be appreciated more after reading The Calculi of Lambda-Conversion by Alonzo Church (see above).

This book analyzes fundamental principles and abstractions in
computer system design. It teaches how to analyze computer systems
and how to design *good* computer systems.

This book is standard for understanding what AI typically refers to, and covers ideas for solving problems, including the roles of learning, reasoning, planning, and predicting in solving goal-oriented tasks.

This book covers the field of statistics as it applies to learning from data. It explains concepts for learning in a statistical framework as a mean for solving variants of a basic learning problem. It covers topics including regression, classification, neural networks, inference, and graphical models.

This book applies the tools of probability theory to machine learning, where the goal of machine learning is "to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest." It covers topics including generative models, Bayesian variants of regressors and classifiers, Gaussian processes, {exact, variational, & Monte Carlo} inference, clustering, structure learning, and deep learning.

This book covers ideas and algorithms of reinforcement learning
(as both planning and learning). It discusses dynamic
programming for perfect models, Monte Carlo methods, and
temporal-difference learning methods such as Q-learning. It
expands on these methods for approximation in large or infinite
state spaces. For more reinforcement learning, check out
*Deep Reinforcement Learning: An Overview* by Yuxi Li
(on arXiv).

This preprint of an overview of deep learning covers the historical progression of artificial neural networks and their applications, primarily focused in supervised an unsupervised learning settings with some coverage of reinforcement learning. From data handling to backpropagation, feedforward networks to recurrent networks, and shallow to deep networks, this work serves as a handbook for deep learning, particularly in an historical context.

- Society of Mind
- by Marvin Minky
- Gödel, Escher, Bach: An Eternal Golden Braid
- by Douglas R. Hofstadter
- The Origin of Concepts
- by Susan Carey

This classic book models intelligence as a well-organized
collection of simple interacting agents. It develops theories for
learning, memory, action, and the processing of language within
this system. It is *extremely* well written, and a must-read
for any study of intelligence.

This classic book discusses themes in mathematics, art, and music
as metaphors for (a theory of) the emergence of cognition. Its
focus on the philosophy of mind is maintained primarily as an
undertone, though the analogies become apparent as the book
progresses. It is *extremely* well written.

*Overview TODO*

- Combining Q-Learning and Search with Amortized Value Estimates [arXiv]
- (2019) Hamrick, J.B., Bapst, V., Sanchez-Gonzalez, A., Pfaff, T., Weber, T., Buesing, L., & Battaglia, P.W.
- Write, Execute, Assess: Program Synthesis with a REPL [arXiv]
- (2019) Ellis, K., Nye, M.I., Pu, Y., Sosa, F., Tenenbaum, J.B., & Solar-Lezama, A.
- Synthetic Datasets for Neural Program Synthesis [pdf]
- (2019) Shin, R., Kant, N., Gupta, K., Bender, C., Trabucco, B., Singh, R., & Song, D.X.
- Relational inductive biases, deep learning, and graph networks [arXiv]
- (2018) Battaglia, P.W., Hamrick, J.B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V.F., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gülçehre, Ç., Song, H.F., Ballard, A.J., Gilmer, J., Dahl, G.E., Vaswani, A., Allen, K.R., Nash, C., Langston, V., Dyer, C., Heess, N.M., Wierstra, D., Kohli, P., Botvinick, M.M., Vinyals, O., Li, Y., & Pascanu, R.
- Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis [arXiv]
- (2018) Bunel, R., Hausknecht, M.J., Devlin, J., Singh, R., & Kohli, P.
- Tree-to-tree Neural Networks for Program Translation [arXiv]
- (2018) Chen, X., Liu, C., & Song, D.
- Learning Explanatory Rules from Noisy Data [arXiv]
- (2017) Evans, R., & Grefenstette, E.
- RobustFill: Neural Program Learning under Noisy I/O [arXiv]
- (2017) Devlin, J., Uesato, J., Bhupatiraju, S., Singh, R., Mohamed, A. R., & Kohli, P.
- DeepCoder: Learning to Write Programs [arXiv]
- (2017) Balog, M., Gaunt, A.L., Brockschmidt, M., Nowozin, S., & Tarlow, D.
- Mastering the game of Go with deep neural networks and tree search [link]
- (2016) Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G.V., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T.P., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D.
- Example-Directed Synthesis: A Type-Theoretic Interpretation [pdf]
- (2016) Frankle, J., Osera, P., Walker, D., & Zdancewic, S.
- The computational origin of representation and conceptual change [pdf]
- (2016) Piantadosi, S.T.
- Neuro-Symbolic Program Synthesis [arXiv]
- (2016) Parisotto, E., Mohamed, A., Singh, R., Li, L., Zhou, D., & Kohli, P.
- Probabilistic data analysis with probabilistic programming [arXiv]
- (2016) Saad, F., & Mansinghka, V.
- Building Machines that Learn and Think Like People [arXiv]
- (2016) Lake, B. M., Ullman, T. D., Tenenbaum. J. B., & Gershman, S. J.
- Do you see what I mean? Visual resolution of linguistic ambiguities [arXiv]
- (2016) Berzak, Y., Barbu, A., Harari, D., Katz, B., & Ullman, S.
- Understanding visual concepts with continuation learning [arXiv]
- (2016) Whitney, W. F., Chang, M., Kulkarni, T., & Tenenbaum, J. B.
- Neural Programmer-Interpreters [arXiv]
- (2015) Reed, S. E., & de Freitas, N.
- Human-level concept learning through probabilistic program induction [pdf]
- (2015) Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B.
- Deep convolutional inverse graphics network [arXiv]
- (2015) Kulkarni, T. D., Whitney, W. F., Kohli, P., & Tenenbaum, J. B.
- Concepts in a probabilistic language of thought [pdf]
- (2014) Goodman, N. D., Tenenbaum, J. B., & Gerstenberg, T.
- Bootstrap learning via modular concept discovery [pdf]
- (2013) Dechter, E., Malmaud, J., Adams, R. P., & Tenenbaum, J. B.
- Structure Discovery in Nonparametric Regression through Compositional Kernel Search [pdf]
- (2013) Duvenaud, D. K., Lloyd, J. R., Grosse, R. B., Tenenbaum, J. B., & Ghahramani, Z.
- Bootstrapping in a language of thought: A formal model of numerical concept learning [link]
- (2012) Piantadosi S. T., Goodman, N. D., & Tenenbaum, J. B.
- Theory learning as stochastic search in a language of thought [pdf]
- (2012) Ullman, T. D., Goodman, N. D., & Tenenbaum, J. B.
- Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis [pdf]
- (2011) McDermott, J. H., & Simoncelli, E. P.
- Church: a language for generative models [pdf]
- (2008) Goodman, N. D., Mansinghka, V. K., Roy, D. M., Bonawitz, K., & Tenenbaum, J. B.
- A rational analysis of rule-based concept learning [pdf]
- (2008) Goodman, N. D., Tenenbaum J. B., Feldman J., & Griffiths, T. L.
- The rational basis of representativeness [pdf]
- (2001) Tenenbaum, J. B. & Griffiths, T. L.

Overview TODO

Overview TODO

Overview TODO

Graph networks provide a general framework for encoding relational inductive biases in structured representations. "Blocks" that compute on graphs are comprised of three update functions (on edges, vertices, and a global attribute) and three aggregation functions (on edges incident each vertex, all edges, and all vertices). The authors demonstrated that this framework with neural nets for the update functions could e.g. learn to find the shortest path on graphs and could learn to perform physical simulations with transfer to systems that had different numbers of entities. Notable gaps on GNs are that control flow and recursion are difficult to represent and require seperate components for interpretation.

Overview TODO.. Introduces the Karel dataset..

Differentiable inductive logic programming by treating ILP problems as binary classification. Robust to noise and capable of working with raw perceptual input (e.g. pixels). It restricts a lot of structure (e.g. all predicates have arity less than three and (WLOG) are defined with exactly two clauses. There are other explicit restrictions), allowing for a neural architecture which generically operates on logic programming. See the classification likelihood function in Section 4.2 for a high-level understanding of the architecture. It takes a "world" (language and background assumptions), rule templates (which specify ranges of clauses that can be generated), and positive and negative examples to produce a target classifier.

Input/output examples are inputs to a double-attentional LSTM that produces hidden state. Each I/O example for a program has its own such layers with shared weights, and the hidden states of those layers are pooled at each timestep before sent to a softmax yielding a real vector in the number of tokens in the program vocabulary. This architecture is trained for 256 million random programs with 4 I/O examples each. Finally, the output of the softmax is passed into a beam search decoder that outputs the highest-scoring program. The researchers also experiment with an approach that does not decode a program and instead gets the output directly from the network architecture via latent program representation.

Uses an encoder/decoder to learn a distribution over programs conditioned on input/output examples. The encoder maps examples to a latent real-valued vector, and the decoder maps that vector to logits of each primitive DSL function appearing in the source code. These logits are used to guide program enumeration favoring likely programs based on presence of the primitives.

Learns to play Go with reinforcement learning. Actions are sampled using a policy network and evaluated using a separate value network. Monte Carlo tree search is used to select actions that maximize value with some bias for exploration. The policy network's state is initialized using the result of supervised learning on expert data and subsequently learned via self-play. The later iteration of this work, AlphaGo Zero, is completely self-play based where domain knowledge is limited to rules (used during tree search), scoring, grid representation (19x19 is baked into the NN architecture), and board symmetry (for training invariance and action sampling).

The authors formalize examples as refinement types, and describe a synthesis procedure for synthesizing programs that satisfy such examples based on theorem-proving. Their formalism allows for rich specifications in the type system that don't necessitate concrete values and permits negative constraints. Synthesis of programs converts a single type problem to potentially many subproblems by either manipulating the environment ("left-rule") or by constructing an expression ("right-rule").

*Overview TODO*

This introduces a technique for synthesizing programs in a domain-specific language from input-output examples. It maps the examples to a continuous space using a selection of encoders with deep recurrent networks, and sends those encoded examples to neural networks that generate and encode partial solution trees.

This is a thorough introduction and overview of probabilistic programming and its applications with the conceptual and mathematical formalism for Composable Generative Population Models (CGPMs). Figure 8 (p. 23) has a really useful example and a diagram describing the flow of logic at a high level.

This reviews progress in cognitive science and AI research, and argues for future direction to get close to building machines that learn and think like people. This primarily involves the building of explanatory and causal models rather than pattern-recognizers, initializing systems with intuitive theories of physics and psychology, and build machings that support learning-to-learn to accelerate the learning process for related tasks.

This describes a model that uses visual context to disambiguate a sentence. It constructs an HMM for each predicate interpreted from a query sentence, which are connected to variable trackers that are themselves HMMs for potential bounding boxes. A sort-of linguistic encoder then constrains the system to resolve the ambiguities.

This describes a model for disentangling visual concepts by learning the transformations of input over time, yielding a symbolic representation for objects and motion. It uses a gating mechanism which receives from two consecutive frames. The goal is essentially to learn concepts like lighting and azimuth, which were given in a supervised manner in Kulkarni's deepconv paper (see below).

Learns to represent and execute programs by training on execution traces. The traces are used per-subroutine, permitting composition with previously learned subroutines. Equations 1-2 are critical to understanding the system's operation. The system is parameterized by domain-specific differentiable state encoders, typically multilayer perceptrons, which effectively allow use of intermediate computation. The execution traces used for training are tuples of the environment, arguments (both inputs to the state encoder) and program-ID for input, and the next program-ID with a return value if the program should terminate for output.

This describes a model for program induction that aims to perform well in one-shot learning scenarios — to successfully generalize from a single example. It introduces Bayesian program learning. In the context of handwritten characters (the focus of the paper), it achieves human-level performance and outperforms deep learning.

This describes a system centered on performing analysis by synthesis. It uses a hybrid encoder-decoder model to learn a disentangled representation for transformations like lighting at azimuthal rotation. After the encoding step, it uses a generic Stochastic Gradient Variational Bayes estimator to backpropagate reconstruction error when decoding. Clamping is used for (and against) training on each transformation to disentangle the representation.

This describes knowledge of concepts in a probabilistic language of thought, formally represented as functions in an enriched stochastic lambda calculus. Primary features are compositionality and symbolic scaffolding for productive thought, and probabilistic inference for successful reasoning under uncertainty.

This describes a model for concept learning as a problem of program induction. It represents programs as a subset of simply-typed lambda calculus restricted to primitive combinators, and defines a stochastic grammer over programs. The frontier of promising programs are split into partial solutions which are preferentially enumerated to decide on modular subtrees which yield the most compressive set of solutions. The experiments are in symbolic regression and Boolean function learning.

This presents an automatic statistician, focused on regression problems. It uses kernel families and their attributes, and compositions of those kernels, as an expressive language of models. It does stochastic search over an implementation of Bayesian Occam's Razor on the model language. Also check out the follow-up paper and source code for more info.

This describes a model for making elegant inductive leaps on a logical system. It is built on a lambda calculus with a collection of sort-of cognitive primitives, and inference is done with a Markov-chain Monte-Carlo algorithm to find the highest probability lexicon.

This describes a hierarchical Bayesian framework for theory acquisition, aimed at being an algorithmic model for the development of theories in children. It uses a construction of statements with probabilistic Horn clause grammars starting with foundational predicates, then with theories as simple laws relating them, then with concrete models to extend the predicates to objects in the given domain, then finally tying to the observations to yield constraints for inference.

This describes a model for processing real-world sound textures. Figure 1 shows an overview of the process for analyzing waveforms to obtain values of statistical representations for the sounds. Sound is then synthesized by applying an iterative procedure to random knows which converges to meeting the same original statistics. There are discrepancies in realism when sound perception involves pitch, rhythm, and reverberation. Introducing feedback mechanisms to perpetuate perceptions would certainly improve the system, and also aligns with typical musical recognition.

This introduces a universal language for describing stochastic generative processes. The most remarkable features are its querying capabilities with conditional sampling, stochastic memoization, and generic schemes for inference. The language, Church, is a generalization of Scheme which introduces probabilistic computation to the language. The most useful resource for probabilistic models of cognition using Church is at probmods.org.

This describes a model for concept learning with rational analysis. It starts with a context-free grammar comprising of a set of feature predicates and logical connectives, and defines a syntactic prior over formulae of the language to capture simplicity bias and a likelihood corresponding to the explanatory completeness of a formula. It goes on to define the Rational Rules model which joins these priors and likelihoods to yield generalized inference consistent with the standard practices of Bayesian modeling.

This defines representativeness in the context of Bayesian inference. It analyzes how Bayesian inference stands compared to statistical and inductive methods in addressing the task of categorization. It features the famous example of coin flips and addressing whether a coin is biased.

- No compromises: distributed transactions with consistency, availability, and performance [pdf]
- (2015) Dragojević, A., Narayanan, D., Nightingale, E. B., Renzelmann, M., ... & Castro, M.
- (this is FaRM)
- transactions: hardware-specific
- one-sided RDMA RPC to non-volatile DRAM
- commit: execute, lock, validate, commit backup, commit primary, truncate
- client = transaction coordinator
- Large-scale cluster management at Google with Borg [pdf]
- (2015) Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., & Wilkes, J.
- cluster management, pre-kubernetes
- cells of slave/master (borglet/borgmaster)
- jobs of tasks, with updatable configs, RPC exposure
- specific resource requests, priority-based allocation
- Wormhole: reliable pub-sub to support geo-replicated internet services [pdf]
- (2015) Sharma, Y., Ajoux, P., Ang, P., Callies, D., Choudhary, A., ... & Kumar, S.
- publishers for subsets of data (many shards)
- sequential shard update delivery (flow)
- periodic datamarker (txn log index) calibrate

- MCRD/SCRD (multi/single copy reliable delivery)
- SCRD has local datamarker, MCRD uses Zookeeper

- caravan (one reader per flow), ideally only one
- disparate latency -> another caravan

- In search of an understandable consensus algorithm [pdf]
- (2014) Ongaro, D., & Ousterhout, J.
- (this is Raft)
- state machine replication (via consensus)
- only the leader can add to the replicated log
- confirms log entry when majority of followers have committed

- bounded random election timeouts
- confirmed log entries never roll back
- The tail at scale [pdf]
- (2013) Dean, J., & Barroso, L. A.
- tail latency from shared resources, daemons, queuing, global resource sharing, gc, energy
- X{upper, lower}-level task <-> X-size queue
- granularize, many small coordinated requests
- synchronize disruption of background activities
- load-balancing micro-partitions, selective replication
- request to multiple replicas, use first
- latency-induced probation
- Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing [pdf]
- (2012) Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., ... & Stoica, I.
- (this is Spark)
- distributed memory abstraction for iterative algorithms
- manipulates immutable RDDs (resilient distributed datasets)
- RDD dependency: narrow (pipelining), wide (data reuse)
- ZooKeeper: wait-free coordination for Internet-scale systems [pdf]
- (2010) Hunt, P., Konar, M., Junqueira, F. P., & Reed, B.
- generic master service
- centralized KV, messaging (locks, watches)
- FT with state machine replication
- PNUTS: Yahoo!'s hosted data serving platform [pdf]
- (2008) Cooper, B. F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., ... & Yerneni, R.
- geographically distributed db
- full db copies per region (fast reads)
- record-level: mastering, versioning, seq. consistency
- relaxed consistency (stale reads)
- Dynamo: amazon's highly available key-value store [pdf]
- (2007) DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., ... & Vogels, W.
- no master -> always writable
- eventually consistent -> highly available
- ring membership, gossip failures
- partitioning, replication, versioning
- coordinator nodes maintain preference list
- forward put/get to all N pref nodes

- sloppy quorum, logical clocks (vector)
- Chord: A scalable peer-to-peer lookup service for internet applications [pdf]
- (2001) Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., & Balakrishnan, H.
- dynamic decentralized lookup protocol
- key-based (hashing)

Copyright (c) 2016, 2017, 2018, 2019 Lucas Morales