Bayesian Statistics for Beginners: A Step-By-Step Approach 1st edition by Therese M. Donovan, Ruth M. Mickey – Ebook PDF Instant Download/DeliveryISBN: 0192578259, 9780192578259
Full download Bayesian Statistics for Beginners: A Step-By-Step Approach 1st edition after payment.
Product details:
ISBN-10 : 0192578259
ISBN-13 : 9780192578259
Author: Therese M. Donovan, Ruth M. Mickey
Bayesian statistics is currently undergoing something of a renaissance. At its heart is a method of statistical inference in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available. It is an approach that is ideally suited to making initial assessments based on incomplete or imperfect information; as that information is gathered and disseminated, the Bayesian approach corrects or replaces the assumptions and alters its decision-making accordingly to generate a new set of probabilities. As new data/evidence becomes available the probability for a particular hypothesis can therefore be steadily refined and revised. It is very well-suited to the scientific method in general and is widely used across the social, biological, medical, and physical sciences. Key to this book’s novel and informal perspective is its unique pedagogy, a question and answer approach that utilizes accessible language, humor, plentiful illustrations, and frequent reference to on-line resources. Bayesian Statistics for Beginners is an introductory textbook suitable for senior undergraduate and graduate students, professional researchers, and practitioners seeking to improve their understanding of the Bayesian statistical techniques they routinely use for data analysis in the life and medical sciences, psychology, public health, business, and other fields.
Bayesian Statistics for Beginners: A Step-By-Step Approach 1st Table of contents:
SECTION 1. Basics of Probability
CHAPTER 1. Introduction to Probability
What is probability?
Should you play?
How can we get a good estimate of Pr(four) for this particular die?
Is one roll good enough?
What would we expect if the die were fair?
How would you change the table and probability distribution if the die were loaded in favor of a fou
What would the probability distribution be for the bet?
Do Bayesians think of probability as long-run averages?
What’s next?
CHAPTER 2. Joint, Marginal, and Conditional Probability
What is an eyeball event?
Why is it called a Venn diagram?
What is the probability that a person in universe U is in group A?
What about people who are not in group A?
I’m sick of eyeballs. Can we consider another characteristic?
Can we look at both characteristics simultaneously?
Is it possible to have Morton’s toe AND be a lefty?
Is it possible NOT to have Morton’s toe if you are a lefty?
What if five lefties also have Morton’s toe?
Of the four events (A, �A, B, and �B), which are not mutually exclusive?
Are any events mutually exclusive?
If you were one of the lucky 100 people included in the universe, where would you fall in this diagr
What does this have to do with probability?
What is the probability that a person selected at random is a righty and has Morton’s toe?
What does the word “marginal” mean?
Can you fill in the empty cells in Table 2.7?
Quickly: What is the marginal probability of having Morton’s toe with this conjoint table?
Can you express the marginal probability of having Morton’s toe as the sum of joint probabilities?
Can we look at this problem from the Venn diagram perspective again?
If you have Morton’s toe, does that influence your probability of being a lefty?
What is conditional probability?
How exactly do you calculate the probability that a person is a lefty, given the person has Morton
So if you have Morton’s toe, does that influence your probability of being a lefty?
Does Pr(A | B) ¼ Pr(B | A)?
Can you calculate the conditional probability of being a lefty, given you have Morton’s toe, from
Can you calculate conditional probability of having Morton’s toe, given you are a lefty, from our
If we know the conditional and marginal probabilities, can we calculate the joint probabilities?
Are Pr(A|B) and Pr(B|A) related in some way?
SECTION 2. Bayes’ Theorem and Bayesian Inference
CHAPTER 3. Bayes’ Theorem
First, who is Bayes?
Is that really a picture of Thomas Bayes in Figure 3.1?
Ok, what exactly is Bayes’ Theorem?
What does this have to do with Bayes’ Theorem?
What is so remarkable about this?
If you have a member of B, what is the probability that he/she is also a member of A?
So, when would we need to use Bayes’ Theorem?
Is that all there is to it?
CHAPTER 4. Bayesian Inference
What exactly is science?
How do we go about actually conducting science?
How on earth did Thomas Bayes make a connection between probability and scientific inference?
What is Bayesian inference?
How does Bayesian inference work?
How can we turn this into a Bayesian inference problem?
Is there a pattern in the denominator of this new version?
Does anything else about this equation strike you as notable?
So, why all the fuss?
How does this relate to science?
Ok, what exactly is the difference between the two interpretations of Bayes’ Theorem?
What if there are more than two hypotheses?
One more time…what is Bayesian inference again?
What if I collect more data?
What other sort of questions have been tackled using Bayesian inference approaches?
CHAPTER 5. The Author Problem: Bayesian Inference with Two Hypotheses
What is step 1?
What is step 2?
What is step 3?
What is step 4?
How exactly do we compute the likelihood?
Which of the two hypotheses more closely matches the observed rate?
If likelihood is a probability, how do we quantify this “consistency” in terms of probability?
What is step 5?
Where are the priors in this equation?
Where is the posterior probability of the Hamilton hypothesis in this equation?
Where are the likelihoods of the observed data under each hypothesis in this equation?
So, what is the posterior probability of the Hamilton hypothesis?
How do we set the prior probabilities?
What if we found more papers known to be authored by Hamilton and Madison?
Do the likelihoods of the data have to add to 1.0?
Did Mosteller and Wallace really use this approach?
Can we summarize this problem?
How does this problem differ from the Breast Cancer Problem in the last chapter?
CHAPTER 6. The Birthday Problem: Bayesian Inference with Multiple Discrete Hypotheses
Should Bobbie and Reggie use an informative prior?
What data do we need then?
Is the divisor of Bayes’ Theorem always a constant?
What if the non-informative prior were used instead of the “When You Were Born” prior?
So, the choice of the prior really affects the results?
Are there other times when the prior drives the results?
What is so tricky about setting the prior?
I’ve heard the terms “objective” and “subjective” with reference to Bayesian analysis. Wha
What really happened to Bobbie and Mary?
Isn’t that nice?
CHAPTER 7. The Portrait Problem: Bayesian Inference with Joint Likelihood
Who won the bet?
Why did men wear wigs in the 1700’s?
So, how can we determine the probability that the man in the photo is Thomas Bayes?
Paint?
And how can lead white help us with dating the Thomas Bayes’ portrait?
Great! Can we get started?
Step 1. What are the hypotheses?
Step 2. What are the prior probabilities that each hypothesis is true?
Step 3. What are the data?
OK, then. What are the observed data with respect to wigs?
And what are the observed data with respect to similarity?
So what is our final dataset for step 3?
Step 4. What is the likelihood of the observed data under each hypothesis?
Should we start with wigs?
What about similarity between “Thomas Bayes” and Joshua Bayes?
How do we calculate the likelihood under each hypothesis?
OK, then, what is the likelihood of observing a similarity score of 55 or greater under each hypothe
So how do we combine both results into one likelihood for each hypothesis?
Step 5. What is the posterior probability that the portrait is of Thomas Bayes?
Are the two pieces of information really independent?
Can we add on more independent pieces of information?
What if our information is not independent?
Are there any assumptions in this analysis?
What is the main take-home point for this chapter?
Looking back at the portrait, who was Barrett, developer of the columnar method?
What’s next?
SECTION 3. Probability Functions
CHAPTER 8. Probability Mass Functions
What is a function?
What is a random variable?
Can you show an example?
Is a random variable a function?
Where do we go from here?
What is the probability of observing y ¼ 3 heads?
How do we move from the probability of a given value of Y to the probability distribution for all po
Is this an example of a probability distribution?
Is this also a probability mass function?
What if we had flipped the coin 10 times?
Really?
OK, what does “binomial” mean?
When do we use binomial probability?
What does the binomial probability mass function look like?
What notation should I use to describe a binomial process like coin flipping?
What is a binomial distribution?
How about the probability of observing 2.5 heads out of 3 coin flips, given that the coin is fair?
What is a parameter?
What are the assumptions of the binomial probability mass function?
Are there other probability mass functions besides the binomial?
What do all of these functions have in common?
All right then . . . what is the Bernoulli distribution?
Likelihood
OK, what exactly is likelihood?
Why is this important?
Are there any other key points to bear in mind regarding likelihood?
Can we quickly confirm that the likelihoods do not need to sum to 1.0 here?
How would this be used in a Bayesian inference problem?
Can we depict this problem graphically?
Can we compare this problem with the authorship problem?
What if we considered all possible hypotheses for p between 0 and 1 instead of just two specific hyp
Can we summarize the main points of this chapter?
OK, what’s next?
CHAPTER 9. Probability Density Functions
What is a function?
Can you give me an example of a continuous random variable?
What is the probability that a bacterium lives exactly 5 hours?
So, what can we do?
Can we see an example of a probability density function?
I see . . . and what distribution would result from this pdf?
Why is the density 0.5 in this example?
Can we formally define a uniform pdf?
What is the probability that x is between 4.5 and 5.5 hours for our uniform distribution?
What is the probability that x is exactly 5 hours?
Are there other examples of continuous probability density functions?
What exactly is the normal distribution?
And what does “Gaussian” refer to?
What does a normal (Gaussian) distribution look like?
So, the normal distribution has two parameters?
How were these distributions generated?
What exactly is the normal (Gaussian) pdf?
Can you give me an example of how to use the normal pdf?
So, we plug in multiple values for x and generate the distribution?
How do we go from probability density to probability with a normal pdf?
What is the probability that a bacterium has a lifespan between 4.5 and 5.5 hours?
Is the total area equal to 1.0?
How would one express this mathematically?
With 41 slivers, what is the probability that a bacterium has a lifespan between 4.5 and 5.5 hours?
What is the total area if our rectangles became really, really skinny?
Do all probability density functions have an area under thecurve = 1.0?
What would the integral look like for the normal pdf?
OK, one more time! What is the probability that a bacterium has a lifespan between 4.5 and 5.5 hours
How does one go about integrating?
Are there refresher courses that review this material?
What other probability density functions are there?
OK, what exactly is likelihood?
What if you don’t know that σ is 0.5?
How can this be used in a Bayesian inference problem?
Can you estimate the probability of a specific hypothesis for theta?
So, there are no specific hypotheses?
Can we see a Bayesian inference problem with infinite hypotheses?
Can we depict this problem graphically?
If we couldn’t integrate the normal distribution, how on earth are wegoing to integrate the denomi
SECTION 4. Bayesian Conjugates
CHAPTER 10. The White House Problem: The Beta-Binomial Conjugate
What do YOU think Shaq’s probability of getting into the White House is?
What probability function would be appropriate for Shaq’s bet?
Are these trials independent?
But what about Shaq’s friend?
Step 1. What are the hypotheses for p?
Step 2. What are the prior densities for these hypotheses?
What is the beta distribution?
Can you show other examples of beta distributions?
I don’t have a good feeling for what the α and β parameters do in terms of controlling the shape
Can I see an example of how this would be used?
What prior distribution did Shaq and his friend settle on?
Hyperparameters?
Step 3. Now what?
Step 4. And then?
Do you see a problem here?
Now what?
How do we make headway?
Really?
Would the posterior be different if Shaq used a different prior distribution?
Is the flat prior really non-informative?
What if Shaq makes a second attempt?
How is this shortcut possible?
Could he have just tried twice before we updated the prior?
What exactly does the word “conjugate” mean?
Why does the shortcut work?
Can you show me the proof?
Can the beta distribution be used as a conjugate prior for data other than binomial data?
Will we have more practice with conjugate priors?
OK. I’ll wait. Suppose I do an analysis and have a new posterior distribution. How should I presen
How should I describe my confidence in the hypothesized values for p?
Can we summarize this chapter?
What really happened to Shaq?
Will we see Shaq again?
CHAPTER 11. The Shark Attack Problem: The Gamma-Poisson Conjugate
So, what exactly is the Poisson distribution?
To what does “Poisson” refer?
OK, what exactly is the function?
What are the parameters of the function?
What is the probability of 3 attacks if λ ¼ 2.1?
What is so “bursty” about a Poisson process?
What does the Poisson pmf have to do with a Bayesian analysis?
What is your estimate of lambda?
Sounds good! Where do I start?
OK. Is there some sort of probability distribution I can use to help me set the prior distribution?
Gamma distribution?
Uh-huh. Can you simplify?
Do α and β have anything to do with the α and β from the beta distribution in Chapter 10?
Why are there three ways to parameterize this function?
How do we use this distribution to estimate probability?
Looks Greek to me!
But what do we do about Γ(2)?
OK, so how do we use the gamma distribution for the Shark Attack Problem?
Step 1. What are the hypotheses for lambda?
Step 2. What are the prior densities for these hypotheses?
OK, what values for α and β should we use?
But the mean of that dataset is 2.1. How do I convert this to the α and β parameters for a gamma d
Which gamma distribution should we use for this problem?
Hyperparameters?
Step 3. Now what?
Step 4. And then?
And Step 5?
How do we make headway?
How is this shortcut possible?
Can I see the proof?
What if I used a different prior distribution?
Why do I need to use a Bayesian approach? Couldn’t I just use the data from Table 11.1 and add in
OK, what happens to the posterior if I collect new data?
What parameters do you use for α0 and β0 if you want a vague prior?
OK, how should I present my Bayesian results?
Can I use a different prior distribution than the gamma distribution?
Can we summarize this chapter?
CHAPTER 12. The Maple Syrup Problem: The Normal-Normal Conjugate
What exactly is maple syrup?
Why does Canada have a maple-syrup cartel?
I see. And how does this apply to Bayesian inference?
So how would we capture the essence of this with a probability distribution?
What is the probability density associated with 4.8 million gallons of syrup?
What does the normal pdf have to do with the Canadian syrup cartel?
OK. Where do I start?
Step 1. What are the hypotheses for μ and σ?
Step 2. What are the prior probabilities for each hypothesis?
Step 3. Now what?
Step 4. And then?
And Step 5?
Is this a realistic approach?
Back to the drawing board?
OK, I’ve got it. Now what?
How do we make headway?
I’m confused!
What is the parameter τ?
Why is this chapter called “The normal-normal conjugate”?
So, what is the conjugate shortcut?
Why does the shortcut work?
Can you show me the proof?
What happens to the posterior if we collect new data?
Why do we use τ instead of σ directly?
What if I know μ, but want to estimate σ or τ instead?
Can we summarize this chapter?
What’s next?
SECTION 5. Markov Chain Monte Carlo
CHAPTER 13. The Shark Attack Problem Revisited: MCMC with the Metropolis Algorithm
What was the Shark Attack Problem again?
Step 1. What are the hypotheses for λ?
Step 2. What were the prior probabilities for each hypothesis?
Step 3. Now what?
Step 4. And then?
And step 5?
How would you solve the posterior distribution with a Markov Chain Monte Carlo (MCMC) approach?
Where can I learn about the mathematics of the Metropolis algorithm?
Can we see an example of these operations in action?
What are some key characteristics of the Metropolis algorithm?
Are there certain terms associated with this process?
How do we depict our results graphically?
And how does this process lead to the posterior distribution?
How many trials do you need to generate an adequate posterior distribution?
How do we summarize the posterior distribution?
Are these the statistics that we report for our analysis?
But shouldn’t we express the MCMC results as a gamma distribution?
Would we get different results if we used a different prior distribution?
Are the other algorithms we can use in an MCMC analysis?
How do we know that our MCMC posterior really hits the spot?
And what’s the big picture?
OK, what’s next?
CHAPTER 14. MCMC Diagnostic Approaches
How do we know that our MCMC posterior really hits the spot?
What was the Shark Attack Problem again?
Step 1. What are the hypotheses for λ?
Step 2. What were the prior probabilities for each hypothesis?
Step 3. Now what?
Step 4. And then?
And step 5?
What can we do to hit our target posterior distribution?
Is the number of trials the only thing to consider?
What acceptance rate should we target?
Uh. . . and what might that be?
What do we do in such cases?
OK, once I have tuned my tuning parameter properly, am I finished with diagnostics?
Can we see an example of these challenges?
OK, I’m afraid to ask, but what else should concern me?
Is there anything else to worry about with MCMC analysis?
Logs?
Goodness! That’s a lot of diagnosing!
Can we summarize this chapter?
What’s next?
CHAPTER 15. The White House Problem Revisited: MCMC with the Metropolis–Hastings Algorithm
What were the analytic steps again?
Step 1. What are the hypotheses for p?
Step 2. What were the prior probabilities for each hypothesis?
Step 3. Now what?
Step 4. And then?
And step 5?
How would you solve the posterior distribution with an MCMC approach?
OK then, what is the Metropolis–Hastings algorithm?
Why do we need a correction factor?
How does the correction factor work?
What if we want to use a proposal distribution that is not symmetrical?
Do you have to center the distribution on the mean?
Why can’t you just use the normal distribution as the proposal distribution?
OK, I think I’ve got it. Can we walk through the White House MCMCanalysis using the Metropolis–H
Should we start with a table?
What is operation 1?
What is operation 2?
What is operation 3?
What is operation 4?
What is operation 5?
What is operation 6?
What is operation 7?
What is operation 8?
Can we see an example of operations 6 and 7, please?
Can anything go wrong with this approach?
Can we review the terms associated with this process?
Can we see the posterior distribution after multiple trials?
How do we summarize the posterior distribution?
But shouldn’t we express the MCMC results as a beta distribution?
Where can I find more information on this algorithm?
Are there other algorithms we can use?
CHAPTER 16. The Maple Syrup Problem Revisited: MCMC with Gibbs Sampling
Can we make headway on this problem?
What is Gibbs sampling?
What is so special about Gibbs sampling?
What would this look like visually?
So we’ll be collecting dots or sand grains across many trials?
OK! Can we start now?
What is step 1?
And step 2?
Is it time for step 3?
And steps 4 and 5?
How do we begin?
Who the heck is Gibbs?
Did Josiah Gibbs come up with the Gibbs sampler?
Why did they name their algorithm the Gibbs sampler?
Do you have a reference for the Geman brother’s paper?
Can we finish our trials please?
How do we display the results of a Gibbs MCMC?
How do we use the MCMC results to draw conclusions about the posterior distributions for μ and τ?
What conclusions do I draw about τ?
What’s the difference between our maple syrup estimation conjugate approach and the Gibbs sampling
Can we look at the joint posterior distribution too?
You don’t do these calculations by hand, do you?
Why is the Gibbs sampler known as a special case of the Metropolis–Hastings algorithm?
Can we solve the Maple Syrup Problem using the Metropolis–Hastings algorithm instead?
Why not just use the Metropolis–Hastings algorithm instead?
How do we know if our MCMC results are any good?
Can we summarize the main points of this chapter?
What’s next?
SECTION 6. Applications
CHAPTER 17. The Survivor Problem: Simple Linear Regression with MCMC
What is a function?
Can we see another example?
This function looks vaguely familiar . . . is this a linear function?
So what does this have to do with regression analysis?
Can we see an example?
Can you summarize the relationship between Var1 and Var3 with a straight line?
What’s the second equation written in black?
So our goal is to find the signal within the data?
What exactly is science?
How do we go about actually conducting science?
What flavor of science will we explore in this chapter?
Great! What will we be analyzing?
Grit?
Can you really measure a person’s grittiness?
Can we have a peek at their dataset?
How do I get started with my analysis?
OK, so how do I find patterns between variables?
Where were we?
Is this a statistical model?
Why is it incomplete? What exactly is a statistical model?
What assumptions do we make about how the Survivor data were generated?
Is that all there is to it?
OK, where do we start?
And what is step 2?
All of these prior distributions are vague. Couldn’t we have used existing Survivor data to help s
What is step 3?
On to steps 4 and 5?
How do we get started?
What is Gibbs sampling again?
How do we make this modification?
Brilliant! Can we get started now?
Now can we see the Gibbs sampler in action?
Goodness! That is a lot of subscripting to track and calculating to do!
And where does Bayes’ Theorem fit in?
Are the Geman brothers responsible for this approach?
Why the emphasis on marginal densities?
What do the results look like for all 10 trials?
How do we know if our MCMC results are any good?
And how do we summarize our results?
What about the Bayesian credible intervals?
Can we do something similar for τ?
So what is our linear equation?
What would Bayes do?
Does the posterior predictive distribution help us assess model fit?
Do you get different results if you use a different prior?
Can we summarize this chapter?
What’s next?
CHAPTER 18. The Survivor Problem Continued: Introduction to Bayesian Model Selection
Grit?
Shall we get started?
OK, phew! What were the results again?
Are there other ways to assess fit besides the posterior predictive distributions?
Model fit?
OK, shall we evaluate the grit model?
So which model is the better model?
Then what?
Simplicity?
So why not go with the red line?
What’s wrong with that?
OK, so how do we quantify this trade-off between “fit” and “simplicity” for a given model?
How is this metric computed?
Do these criteria somehow help us to compare models?
How do you use these two results to draw conclusions?
Can we summarize the chapter so far?
Since each model is a hypothesis regarding success, can’t we use Bayes’ Theorem in some way and
This looks familiar. Have I seen this before?
Well, how do we calculate the marginal distribution of the data for a given model?
How do I get below the surface?
What’s next?
CHAPTER 19. The Lorax Problem: Introduction to Bayesian Networks
How could this tragedy have been averted?
Sustainable manner?
What is a Bayesian belief network?
Aack! Can we see an example?
So, is this a probabilistic model?
Directed acyclic graph?
OK. Got it! Who created the sprinkler–rain–grass network?
All right then. What about the SPRINKLER node of our diagram?
Are there any other probabilities stored in this table?
Remind me again . . .how do you calculate conditional probability?
Um. . . the tables consist of conditional probabilities?
What do you notice about the setup of the SPRINKLER table?
And what about the GRASS WET variable?
How do we calculate the values in this table?
What if you don’t have any data for the necessary calculations?
Can we use this particular network to answer questions?
How do we start?
Can you tell me more about the chain rule?
Can we try another one?
Why is this important?
How do we actually use the Bayes’ network for addressing practical problems?
And where does Bayes’ Theorem come into play?
And how does Bayesian inference fit into this network?
What about the hypothesis that the grass is wet because the sprinkler is on?
What is the fourth approach?
OK, I think I’ve got it. Can we try setting up the Once-ler’s network?
Is there a way to create an influence diagram without drawing it like you did?
What about the conditional probability tables (CPT’s)?
Where are the CPT’s?
Can we look at one more CPT?
OK, now what?
The arrows do not show that water quality influences the Thneed business, so what is going on here?
What if you have more observations?
Who came up with the idea of Bayesian networks?
Can we summarize this chapter?
I noticed decision-making is on the list. Can we see an example of this?
CHAPTER 20. The Once-ler Problem: Introduction to Decision Trees
How could this tragedy have been averted?
Decision tree?
Why is it called a tree?
Is that all there is to it?
What kind of probabilities do these represent?
Now what’s the best decision?
Does this answer tell the Once-ler what he should do?
Utility?
How is utility determined?
Do people really use utility in practice in decision-making?
What about the Bar-ba-loots, Swammy-Swans, and Humming-Fish?
Is that how Bayes’ Theorem is used in decision trees?
Did you forget Bayes’ Theorem?
Who can we credit for developing the decision tree analysis?
Can we summarize this chapter?
One more question…Did Yogi Berra really deliver the quotes in this chapter?
APPENDIX 1. The Beta-Binomial Conjugate Solution
The prior distribution
The observed data
Bayes’ Theorem
The conjugate proof
APPENDIX 2. The Gamma-Poisson Conjugate Solution
The prior distribution
The observed data
Bayes’ Theorem
Conjugate proof
APPENDIX 3. The Normal-Normal Conjugate Solution
The prior distribution
The observed data
Bayes’ Theorem
Conjugate proof
APPENDIX 4. Conjugate Solutions for Simple Linear Regression
The likelihood of the data
The b0 parameter
The b1 parameter
The τ parameter
APPENDIX 5. The Standardization of Regression Data
Standardizing datasets
Back-transforming standardized coefficients to the original scale
Bibliography
Hyperlinks Accessed August 2017
People also search for Bayesian Statistics for Beginners: A Step-By-Step Approach 1st:
what is bayesian statistics
is bayesian statistics useful
is bayesian statistics hard
bayesian statistics for beginners a step-by-step approach
bayesian statistics for dummies
Tags: Bayesian Statistics, Beginners, Step By Step Approach, Therese Donovan, Ruth Mickey