Final Course Survey

Thanks so much for your participation in Probability for Scientists. As instructors, we enjoyed this class and learned a lot from teaching it. Please help us improve by answering the following anonymous survey.

The final survey is available here. You have a week to complete it.

Administrative notes

  • Due to low interest, Drew will *not* be available this Thurs 12 Dec.
  • We will be emailing you feedback on your written report. Please email the class website if you would like to arrange to pick up the physical copy.

Final Project Posters

Posters from the final project are now available as pdfs here. In the event that you have any updates, email a new pdf to the class address and we will post them.


Final Project Details

The final poster session will be on Thursday, the 5th of December. Feel free to invite friends/family to this class.

Key points to keep in mind as you work on the written projects:

  • The written report is *separate* from your poster. We strongly encourage you to complete your written reports early so you have ample time to focus on your poster.
  • Mark corresponding author, include email address.

  • Your report should be between 500 and 1,000 words total (including captions). Use simple, declarative statements - avoid "flowery" language like "we intend to", "we hope to", "it might be the case that", etc.

  • When there are multiple authors, use "We" rather than "I".

  • Aim for 3-5 final figures. Each figure should be numbered, have sensible axis labels w/units, and a 2-3 sentence caption.

  • Always spell check.

  • Bibliography: you can use any accepted style. An online tool like this can be helpful.

Poster details:
  • We will be printing posters for you. Thus, we need a final version by 5pm Tues 3 Dec.
  • To submit your poster, export your finished poster from Powerpoint as a pdf and email it to the class address.
  • I strongly suggest that each group reads this post about poster design. It includes several templates that you can use to start your poster. The final poster size should be 36" wide by 24" tall.


Week 13: Working with Rstudio and Data

For class this Thursday, each group should have a laptop with Rstudio installed, and a dataset that's ready to be imported.


  • First, you need to install R from here.
  • Next, install Rstudio from here.
  • There's very good documentation on Rstudio here. Learning a few shortcuts and understanding syntax highlighting can make your life much easier.

Importing Data

  • First, make sure you can open your data in a spreadsheet program like Excel or OpenOffice.
  • If there are multiple tables or "sheets", identify the most important ones, and think about how you might combine several tables together into a single table.
  • From your spreadsheet program, select "Save as" and save the spreadsheet as a CSV (comma-separated value) file.
  • Finally, load the CSV file using Rstudio ( nice instructions here).

For class, please download this R script and data file to the same directory.


The Monty Hall Problem

There are several different valid specifications of the Monty Hall problem. All use Bayes' theorem to incorporate the initial choice of the contestant and the information that the host "gives away" to reach the same conclusion.

This link provides a concise explanation of the set-up and the math.

The Wikipedia article for the problem is long, but contains a very good introduction, as well as some interesting history on the problem, and a detailed list of solutions.

Finally, there's a New York Times interview with Monty Hall himself here that describes some of the intricacies of the actual game show.

Annotated Outline Instructions

The annotated outline (Due Thurs 14 Nov) is an expanded version of your proposal, and will form the outline of your poster. This should be typed, and written to address a non-technical audience (can a high school senior understand it?). As noted in class, probability and statistics Wikipedia pages will be considered valid sources for this project.

The format for the annotated outline is as follows:

  • Introduction (200 words max, at least 2 sources)
    • Background of the research system
    • Problem statement
    • Significance of problem
  • Methods (200 words max, at least 2 sources)
    • Describe data, including source, number of samples, variables, whether data collection is complete.
    • Analysis: What will you do to answer the question, and how did you choose these methods? How will you interpret the results? Describe any (nontrivial) assumptions you'll be making.
  • Discussion (200 words max)
    • Expected outcome and reasoning.
    • Significance of results.

Proposal Feedback and Revisions

We've reviewed all of your final projects and provided feedback.
We need a few things from you:

  • Please choose one corresponding author per group. We will email feedback to the corresponding author ASAP.
  • For Tuesday 12 November, please revise your proposals, taking our comments into account.

Revised proposals should be typed, and addressed to a non-technical audience (would a high school senior understand it without further explanation?). When revising your proposals, please use the following format:

  • Title
  • List of authors (mark corresponding author and provide preferred email)
  • Proposal text. Text should be less than 200 words. It should explain the research question and significance. Briefly describe the data that will be used, and whether it has been collected already. Describe the methods you will use, and how they answer the research question. Finally, include a brief description of your expected results and their practical significance.


Final Project Proposal

Final project proposals are due Thurs 7 Nov (one per group, hard copy).
Consider the following critera when choosing a project:

Projects should be:
  • Interesting (to group members, and hopefully to others)
  • Have real-world significance
  • And be tractable (you can make progress during the course of the semester).

The proposal should include the group members, a title, and a 3-5 sentence abstract. The abstract should explain the problem, and the proposed method for solving or exploring the problem.


Quiz make-up

As I mentioned last week, you can make up a single quiz (Due Tues 26 Nov) by doing the following:

1. Find a probability/statistics article on Wikipedia.
2. Read as much of it as you can. I expect you to spend somewhere between 20 and 40 minutes reading the article. Use links in the article to look up some of the terms you're unfamiliar with.
3. Write down 5 things you learned or found interesting, one sentence each.

The following 2 links are Wikipedia "Outline" articles. They show the organization of all the content related to that topic on Wikipedia, and can be a good place to start.



Data entry: Number of Heads and Switches

You can find the survey to enter data from today's activity here.


Week 9: Confidence Intervals

In class this week we are looking at confidence intervals, starting with the proportion of successes (p) of a binomial process. p̂ is the sample estimate of p. The following figures illustrate what we did for a range of values.

This figure shows how the standard error (SE) of p̂ changes with respect to p̂ and the sample size.

This figure shows how the width of the confidence interval changes with respect to p̂, the sample size, and α. The panel headings show the "confidence", i.e. (1-α).


Change in Readings

The schedule has been updated to switch the reading for the next two weeks. Next week's quiz is on CGS Ch 8, and the week after is CGS Ch 7.


Lab 4: Discrete and Continuous Distributions

Lab 4 is available here. It's due at the beginning of class on Tues, 22 Oct 2013.

Week 7: Hypothesis testing, class videos

This week we're looking at hypothesis testing. We started out using the Wilcoxon rank-sum test (also known as the Mann-Whitney U test) to test whether samples were drawn from different populations.

The world is full of statistical (hypothesis) tests. Each one generates a test statistic. The key to understanding a test is understanding what the distribution of the test statistic would be if the null hypothesis was true.

The test statistic of the rank sum test is U: the sum of the ranks minus a sample size correction factor.
For the rank sum test, the null hypothesis is (approximately) that two samples are drawn from populations with the same mean. The following figures show the distribution of U, assuming the null hypothesis is true. The area of the shaded region sums to alpha. The vertical red lines show our critical values of U. Values of U that are more extreme than these critical values are unlikely due to chance if the null hypothesis is true. Thus, if we observe U values this extreme, we can reject the null hypothesis.

If we lower alpha, we see the area in the tails get smaller.

For larger sample size, we see the value of U gets much larger, but the same pattern holds.

For a devil's advocate view of what p-values mean, we turn to the internet:
What the p-value

The Mann-Whitney U test (also known as the Wilcoxon rank sum test) is a non-parametric test: it makes no assumptions about the distribution of the data. Most common statistical tests are parametric, and usually assume the data (or something about the data) is normally distributed. The t-test is the parametric sibling of the rank sum test. It assumes the data is normally distributed.

This video describes hypothesis tests in general, and walks through the t-test.
What is a t-test?

By the end of this week, this comic should make sense.
XKCD: Significant


Week 6 survey results: reaction times

Here's a familiar histogram of each person's reaction times (in distance), along with everyone combined together (heading "All").

Overall, the whole class (heading "All") appears approximately normally distributed, though somewhat right-skewed. Why might we expect this distribution to be right (upwards) skewed?

With this 3-column layout, it is difficult to compare individual performances. We could use one column and 15 rows, but that would make a very long figure. The following figure condenses each person's data, easing comparisons. This is called a box-and-whisker plot, or boxplot. The black dot shows the median, and the box shows the interquartile range (which measures the variability, similar to standard deviation). The individual points are considered outliers. For more information, see the boxplot wikipedia page.

From this, I can easily see some people's responses vary quite a bit, while others are much more consistent. I also notice that 2 people appear faster (lower distance) than the average, and one person appears slower. How might we test if these are significantly different from the rest of the class?


Drew's Central Limit Theorem Demonstration

Here's the demo application that we used in class. It simulates adding up dice rolls similar to what we did in class today.


Data entry survey: reaction times

The reaction times survey is here.
It is worth 5 points.

Week 5 survey results: number of heads (binomial distribution)

I've made a movie out of the coin-flip results. Each frame of the movie adds a single coin-flip.

I plotted 2 histograms: the top one shows the total number of each result, and the bottom shows the proportion or density of each result. You can see the Y-axis of the top histogram change as we add flips.

What would this movie look like if we added another 1,000 flips?

Movie Link


Lab 2

The Lab 2 answer key is available here.

A lab rewrite can earn you up to 50% of missed points. Rewrite questions are marked with @@.

Identify errors/typos in the key for extra credit. See answer key for details.

Note that Lab 2 is worth 95 points (it says 100 on the assignment - typo), with a possible 5 points of extra credit.


Class Feedback Survey

The class feedback survey is now available here.
As usual, you need to register with a gmail address, but your responses will be anonymous.

This survey is worth 5 points.


Lab 1 Answer Key (Updated)

The answer key to Lab 1 is now available here.

Edit: The link is now working.


Week 3 Survey: More dice rolls.

The data entry survey for dice tosses from class on 3 Sep is now available here.

Extra: Venn Diagram Deconstruction

This is a smart and funny blog post about Venn Diagrams that I found while researching material for week 2. Highly recommended.

Prostitutes, Doctors, and TSA Agents

Lab 2: Rules of probability, games, and DNA.

Lab 2 is now available here.
5 points extra credit for identifying any technical errors (e.g. non-grammar/spelling).

We will make sure everyone has a collaborator by the end of Thurs class.


Week 2 Survey: Dice rolls

The data entry survey for the dice rolls activity from class on Thurs 29 Aug is now available here.


Quick admin notes

The intro survey is here: http://survey.x14n.org/index.php/128369. Please complete it if you haven't yet done so.

Also, Lab 1 was updated this evening. If you downloaded the lab before midnight Tues Sep 27, please re-download it.


Lab 1: Fractions and Pigeons

Lab 1 is now available here.
For this lab, you will be assigned a collaborator. Each person should turn in their own write-up as a hard copy.

This lab is due at the beginning of class next Tues, 3 Sep 2013.

Survey Results: Survey timings, coin flip simulations, and a word cloud.

You can find a pdf of results from the coin flips class survey here.


Week 2 - Aug 27th

Laws of Probability 
Introduction to Sample Space
Lab 2 (Sep 10)
DW Chapter 1


Class Activity: Coin Flips

Today in class we generated 3 sequences of 0s and 1s. The first 2 sequences were generated by humans. The final sequence was generated by flipping a single coin 50 times. We then chose one of the 3 sequences at random (using the coin flips) to write on the board. Drew, Ara, and Christian each guessed on whether the chosen sequence was generated by a human or coin based on 3 different criteria:

* Total number of heads
* Total number of transitions
* Number of runs of length 5 or 6

Before the next class, please enter your data into the survey here:
Next week in class, we'll take a look at the data.

Administrative notes:
* Lab 1 will be assigned on Tues Aug 27.
* The weekly reading quiz will be on Thurs Aug 29.


Class Survey

Please fill out the class survey here:

Please use your gmail address to sign up for the survey. After you sign up, you should receive an email at your gmail address sent by probforsci at x14n dot org. If you need a gmail account, you can sign up here:


Week 1

We have a few details for you before classes start next week.

1.  We will be using two books in class that available from Amazon.com
for a combined total of $25.  We recommend you order these as soon as
is convenient.
  * The Drunkard's Walk: How Randomness Rules Our Lives
  * The Cartoon Guide to Statistics

2. There is a survey to take as well if you are registered for the class.


Probability for Scientists Bio 409 / Stat 479 Tu Thu 12:30-1:45 pm

Probability for Scientists is a hands-on, interdisciplinary introduction to probability theory that emphasizes big-picture thinking. Problems are introduced via in-class exploration of physical objects such as coins, dice, and cards. Learn what randomness really looks like, how to read a histogram, and why the normal distribution is so common.

Probability for Scientists is appropriate for students from a range of backgrounds and disciplines, including undergraduate and graduate students. This course is cross-listed in Biology & Statistics, and has no prerequisites other than curiosity