Statistical Analysis and Model in Python¶
Error Propagation¶
- astropy.uncertainty
- Provides a Distribution object to represent statistical distributions in a form that acts as a drop-in replacement for Quantity or a regular numpy.ndarray. Still work in progress.
- uncertainties - Transparent calculations with uncertainties on the
quantities involved
- The uncertainties package is a free, cross-platform program that transparently handles calculations with numbers with uncertainties (like 3.14±0.01). It can also yield the derivatives of any expression.
Modeling Tool¶
- spotpy - A Statistical Parameter Optimization
Tool
- SPOTPY is a Python framework that enables the use of Computational optimization techniques for calibration, uncertainty and sensitivity analysis techniques of almost every (environmental-) model.
- BayesianOptimization - A Python implementation of global
optimization with gaussian
processes
- This is a constrained global optimization package built upon bayesian inference and gaussian process, that attempts to find the maximum value of an unknown function in as few iterations as possible.
Sampling Tools and Bayesian Analysis¶
- emcee - The Python ensemble sampling toolkit for affine-invariant
MCMC
- By Dan Foreman-Mackey. emcee is a stable, well tested Python implementation of the affine-invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed by Goodman & Weare (2010).
- dynesty - Dynamic Nested Sampling package for computing Bayesian
posteriors and evidences
- By Josh Speagle. A Dynamic Nested Sampling package for computing Bayesian posteriors and evidences. Pure Python.
- nestle - Pure Python, MIT-licensed implementation of nested sampling
algorithms for evaluating Bayesian
evidence
- By Kyle Barbary
- nnest - Neural network accelerated nested and MCMC
sampling
- By Adam Moss. Based on this paper
- sampyl - MCMC samplers for Bayesian estimation in Python, including
Metropolis-Hastings, NUTS, and
Slice
- Sampyl is a package for sampling from probability distributions using MCMC methods. Similar to PyMC3 using theano to compute gradients, Sampyl uses autograd to compute gradients.
- PyMC3 - Probabilistic Programming in Python: Bayesian Modeling and
Probabilistic Machine Learning with
Theano
- PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.
- Getting started with PyMC3 and the Example Notebooks are good places to get started.
- PyMC4 - A high-level probabilistic programming interface for TensorFlow Probability
Gaussian Process¶
- A full introduction to the theory of Gaussian Processes is available for free online: Rasmussen & Williams (2006).
- An Astronomer’s Introduction to Gaussian
Processes
- Very good introduction by Dan Foreman-Mackey.
- sklearn.gaussian_process - The Gaussian Processes module in scikit-learn
- GPy - Gaussian processes framework in
python
- Gaussian processes underpin range of modern machine learning algorithms. In GPy, we’ve used python to implement a range of machine learning algorithms based on GPs. Online document is here
- Jupyter notebooks to introduce GPy
- gpflow - Gaussian processes in
TensorFlow
- GPflow is a package for building Gaussian process models in python, using TensorFlow.
- GPflow implements modern Gaussian process inference for composable kernels and likelihoods.
- GPflow uses TensorFlow for running computations, which allows fast execution on GPUs, and uses Python 3.5 or above.
- Online document is here
- gpytorch - A highly efficient and modular implementation of Gaussian
Processes in PyTorch
- GPyTorch is a Gaussian process library implemented using PyTorch. GPyTorch is designed for creating scalable, flexible, and modular Gaussian process models with ease.
- george - Fast and flexible Gaussian Process regression in
Python
- By Dan Foreman-Mackey. George is a fast and flexible Python library for Gaussian Process (GP) Regression.
- Unlike some other GP implementations, george is focused on efficiently evaluating the marginalized likelihood of a dataset under a GP prior, even as this dataset gets Big
- Example applications:
- celerite - Scalable 1D Gaussian Processes in C++, Python, and
Julia
- By Dan Foreman-Mackey. Online document is here
- Based on Fast and scalable Gaussian process modeling with applications to astronomical time series
Survival Analysis¶
- Traditionally, survival analysis was developed to measure lifespans of individuals. The analysis can be further applied to not just traditional births and deaths, but any duration.
- Survival function: the survival function defines the probability the death event has not occured yet at time t, or equivalently, the probability of surviving past time t
- Hazard curve: the probability of the death event occurring at time t, given that the death event has not occurred until time t. Hazard function is non-parametric.
- Kaplan-Meier estimator for survival function: Survival analysis assumes that upper limits have the same underlying distribution as the data, and the Kaplan-Meier esti- mator further assumes that detections and upper limits are mutually independent
- lifelines - implementation of survival analysis in
Python
- Handles right-censored data.
- Example of astrophysical usage: radio SED of high-z SF galaxies