Another one (definitely no link to the actual study in here)
http://www.mirror.co.uk/news/uk-news/breastfed-children-earn-more-money-5358181
"Breastfeeding generally was found to increase adult intelligence, length of schooling and adult earnings."
Is this really a causal relationship? Here's an example of questionable statistical language in the news. Brazil breast-feeding study.
Another one (definitely no link to the actual study in here) http://www.mirror.co.uk/news/uk-news/breastfed-children-earn-more-money-5358181 "Breastfeeding generally was found to increase adult intelligence, length of schooling and adult earnings."
0 Comments
From Andrew Gelman, some "important methods and concepts related to statistics that are not as well known as they should be."
I was looking into the use of Stan for Hamiltonian Monte Carlo. On page 23 of the Stan reference (stan-reference-2.9.0.pdf), I found this excellent and brief summary of HMC: HMC accelerates both convergence to the stationary distribution and subsequent parameter exploration by using the gradient of the log probability function. The un- known quantity vector θ is interpreted as the position of a fictional particle. Each iteration generates a random momentum and simulates the path of the particle with potential energy determined [by] the (negative) log probability function. Hamilton’s decom- position shows that the gradient of this potential determines change in momentum and the momentum determines the change in position. These continuous changes over time are approximated using the leapfrog algorithm, which breaks the time into discrete steps which are easily simulated. A Metropolis reject step is then applied to correct for any simulation error and ensure detailed balance of the resulting Markov chain transitions (Metropolis et al., 1953; Hastings, 1970). Immediately after that, the tuning parameters are discussed: Basic Euclidean Hamiltonian Monte Carlo involves three “tuning” parameters to which its behavior is quite sensitive. Stan’s samplers allow these parameters to be set by hand or set automatically without user intervention. http://www.math.uah.edu/stat/dist/Density.html (and related pages)
Part 1 and Part 2. Also another view here. Another PPT from Ingmar Schuster (Universitat Leipzig) appears to be very good. (attached and viewable below)
below is from Wikipedia, here Regressions
In regression analysis, the distinction between errors and residuals is subtle and important, and leads to the concept of studentized residuals. Given an unobservable function that relates the independent variable to the dependent variable – say, a line – the deviations of the dependent variable observations from this function are the unobservable errors. If one runs a regression on some data, then the deviations of the dependent variable observations from the fitted function are the residuals. However, a terminological difference arises in the expression mean squared error (MSE). The mean squared error of a regression is a number computed from the sum of squares of the computed residuals, and not of the unobservable errors. If that sum of squares is divided by n, the number of observations, the result is the mean of the squared residuals. Since this is a biased estimate of the variance of the unobserved errors, the bias is removed by multiplying the mean of the squared residuals by n / df where df is the number of degrees of freedom (n minus the number of parameters being estimated). This latter formula serves as an unbiased estimate of the variance of the unobserved errors, and is called the mean squared error.[1] However, because of the behavior of the process of regression, the distributions of residuals at different data points (of the input variable) may vary even if the errors themselves are identically distributed. Concretely, in a linear regression where the errors are identically distributed, the variability of residuals of inputs in the middle of the domain will be higher than the variability of residuals at the ends of the domain: linear regressions fit endpoints better than the middle. This is also reflected in the influence functions of various data points on the regression coefficients: endpoints have more influence. Thus to compare residuals at different inputs, one needs to adjust the residuals by the expected variability of residuals, which is called studentizing. This is particularly important in the case of detecting outliers: a large residual may be expected in the middle of the domain, but considered an outlier at the end of the domain. |
Categories
All
Archives
December 2016
AboutThis blog is mainly for statistics, R, or Duke-related stuff that is not directly relating to research activity. |