Data Science Knowledge Repo
A central knowledge resource for data scientists / analytics experts
Big Data Knowledge Repos »
Data Science Repo
A prevailing characteristic of data scientists is deep intellectual curiosity  a trait that drives them to be passionate learners, always picking up new skills on their own volition. Many of these fascinating but difficult techniques of data science are grounded in hard math and machine learning  e.g. Bayesian inference, nonparametric regression, neural net classifiers, hidden markov models, evolutionary algorithms, content/collaborative filters, NLP, etc. Data science is so broad and deep that even the most seasoned experts always have something new to learn; there is simply too much collective knowledge out there.
The purpose of the "Data Science Knowledge Repo" is to provide a central resource that data scientists can revisit frequently to refresh knowledge or learn new skills. If you have any recommended additions  guides, technical papers, and other resources  email frank@datajobs.com.
A
Auto-Regressive Models
B
Bayesian Inference
- The Philosophy of Bayesian Statistics  Gelman & Shalizi
 - Bayesian Inference Guide  Statisticat
 - Bayesian Statistics Basics  Harvey Thornburg
 - Bayesian Statistics Basics  Patrick Lam
 - Conjugate Priors Summary  Alexandre Tchourbanov
 - Bayesian Inference in Machine Learning  Michael Tipping
 
C
Collaborative Filtering
Clustering Methods
- Clustering Methods Guides  Rokach & Maimon
 - Example Clustering Heuristic  Foursquare
 - Markov Clustering Technical Paper  Stijn van Dongen
 
D
Decision Tree Learning
- Decision Tree Guide  Rokach & Maimon
 - Classification and Regression Tree Basics  Wei-Yin Loh
 - Classification and Regression Tree Guide  CMU
 
Dominance Analysis
E
Ensemble Methods
- Ensemble Methods Guide  Lior Rokach.
 - Boosting and Bagging  Barutcuoglu & Alpaydın
 - Random Forest Guide  Frederick Livingston
 - Random Forest in R  Liaw & Weiner
 
Expectation-Maximization Algorithm
- Expectation Maximization Basic Primer  Do & Batzoglou
 - Expectation Maximization Guide  Frank Dellaert
 - Expectation Maximization for Clustering  Avinash Kak
 
F
Factor Analysis
Fixed Effects Models
G
Genetic Algorithms
Gradient Descent
H
Hidden Markov Models
Hierarchical Bayes Models
I
Independent Component Analysis (ICA)
J
K
K-Means Clustering
L
Linear Algebra
Linear Discriminant Analysis (LDA)
M
Machine Learning
Markov Chain Monte Carlo (MCMC)
N
Naive Bayes
Natural Language Processing (NLP)
- NLP Lecture  Peter Norvig
 - NLP Background  SU
 - NLP Approach with Python  Nitin Madnani
 - NLP Approach - Maximum Entropy  Berger et al.
 
Neural Nets
- Neural Nets Primer  Gunther & Fritsch
 - Neural Nets in R  Carlos Gershenson
 - ImageNet Deep Convolutional Neural Net  Hinton et al.
 
O
Ordinary Least-Squares
P
Principal Component Analysis (PCA)
Probability Theory
Q
R
R (Statistical Computing Software)
Recommender Systems
- Recommender Systems / Matrix Factorization  Netflix
 - Recommender Systems / Linear Classifiers  Zhang & Iyengar
 - Recommender Systems / Collaborative Filtering  Amazon
 
Regression Analysis
- Intro to Regression Analysis  Alan Sykes
 - Interpreting Regression Weights  Nathans et al.
 - Logistic Regression Modeling  Peng et al.
 - Generalized Linear Models  Andrew Ng
 
S
SAS (Statistical Computing Software)
Singular Value Decomposition (SVD)
Supervised Learning
- Supervised Learning Comparison  Caruana & Niculescu-Mizil
 - Supervised Classification Methods  S. B. Kotsiantis
 
Support Vector Machines (SVM)
- Support Vector Machine Guide  Andrew Ng
 - Support Vector Machine Basic Tutorial  Jason Weston
 - Support Vector Machines in R  David Meyer
 - Multiclass Support Vector Machines  Hsu & Lin
 
T
Time-Series Analysis
U
Unsupervised Learning
V
W
X
Y
Z
