Data Science Knowledge Repo
A central knowledge resource for data scientists / analytics experts
Big Data Knowledge Repos »
Data Science Repo
A prevailing characteristic of data scientists is deep intellectual curiosity a trait that drives them to be passionate learners, always picking up new skills on their own volition. Many of these fascinating but difficult techniques of data science are grounded in hard math and machine learning e.g. Bayesian inference, nonparametric regression, neural net classifiers, hidden markov models, evolutionary algorithms, content/collaborative filters, NLP, etc. Data science is so broad and deep that even the most seasoned experts always have something new to learn; there is simply too much collective knowledge out there.
The purpose of the "Data Science Knowledge Repo" is to provide a central resource that data scientists can revisit frequently to refresh knowledge or learn new skills. If you have any recommended additions guides, technical papers, and other resources email frank@datajobs.com.
A
Auto-Regressive Models
B
Bayesian Inference
- The Philosophy of Bayesian Statistics Gelman & Shalizi
- Bayesian Inference Guide Statisticat
- Bayesian Statistics Basics Harvey Thornburg
- Bayesian Statistics Basics Patrick Lam
- Conjugate Priors Summary Alexandre Tchourbanov
- Bayesian Inference in Machine Learning Michael Tipping
C
Collaborative Filtering
Clustering Methods
- Clustering Methods Guides Rokach & Maimon
- Example Clustering Heuristic Foursquare
- Markov Clustering Technical Paper Stijn van Dongen
D
Decision Tree Learning
- Decision Tree Guide Rokach & Maimon
- Classification and Regression Tree Basics Wei-Yin Loh
- Classification and Regression Tree Guide CMU
Dominance Analysis
E
Ensemble Methods
- Ensemble Methods Guide Lior Rokach.
- Boosting and Bagging Barutcuoglu & Alpaydın
- Random Forest Guide Frederick Livingston
- Random Forest in R Liaw & Weiner
Expectation-Maximization Algorithm
- Expectation Maximization Basic Primer Do & Batzoglou
- Expectation Maximization Guide Frank Dellaert
- Expectation Maximization for Clustering Avinash Kak
F
Factor Analysis
Fixed Effects Models
G
Genetic Algorithms
Gradient Descent
H
Hidden Markov Models
Hierarchical Bayes Models
I
Independent Component Analysis (ICA)
J
K
K-Means Clustering
L
Linear Algebra
Linear Discriminant Analysis (LDA)
M
Machine Learning
Markov Chain Monte Carlo (MCMC)
N
Naive Bayes
Natural Language Processing (NLP)
- NLP Lecture Peter Norvig
- NLP Background SU
- NLP Approach with Python Nitin Madnani
- NLP Approach - Maximum Entropy Berger et al.
Neural Nets
- Neural Nets Primer Gunther & Fritsch
- Neural Nets in R Carlos Gershenson
- ImageNet Deep Convolutional Neural Net Hinton et al.
O
Ordinary Least-Squares
P
Principal Component Analysis (PCA)
Probability Theory
Q
R
R (Statistical Computing Software)
Recommender Systems
- Recommender Systems / Matrix Factorization Netflix
- Recommender Systems / Linear Classifiers Zhang & Iyengar
- Recommender Systems / Collaborative Filtering Amazon
Regression Analysis
- Intro to Regression Analysis Alan Sykes
- Interpreting Regression Weights Nathans et al.
- Logistic Regression Modeling Peng et al.
- Generalized Linear Models Andrew Ng
S
SAS (Statistical Computing Software)
Singular Value Decomposition (SVD)
Supervised Learning
- Supervised Learning Comparison Caruana & Niculescu-Mizil
- Supervised Classification Methods S. B. Kotsiantis
Support Vector Machines (SVM)
- Support Vector Machine Guide Andrew Ng
- Support Vector Machine Basic Tutorial Jason Weston
- Support Vector Machines in R David Meyer
- Multiclass Support Vector Machines Hsu & Lin
T
Time-Series Analysis
U
Unsupervised Learning
V
W
X
Y
Z