Friday, March 12, 2010

R code for Propensity Score Weighting

Here is R code for propensity scores, covered in the last couple of lectures (thanks to Chris Tausanovitch).

Monday, March 8, 2010

Problem Set 3

Here is the final problem set, which covers weighting-based corrections for nonresponse and self-selection. To make things simpler, I have created an  .Rdata file which has a self-weighting subset of the November 2008 CPS. This was created by sampling the entire CPS file described in the problem set with probability proportional to the weights. The file is smaller--10,000 rows--and does not need to be weighted, which makes it easier to use with either glm or the quantile functions. The .Rdata file also includes the ANES Internet panel data and all categorical variables in both data frames have been made into factors (with appropriate labels).

Almost done...

Here and here are the lecture notes from Wednesday and Monday, respectively, though the latter cover the material I should have done before the former.

Thursday, March 4, 2010

References on Bayesian Statistics

Here are a few references that will give you the basic ideas and some simple examples of Bayesian statistics in under an hour:
  • Evans and Rosenthal, Probability and Statistics: The Science of Uncertainty, 2nd ed. (Freeman, 2010), chapter 7, sections 1-4. (This is the textbook I'm planning on using in 350A in the fall. Unfortunately, it's rather expensive.)
  • Ron Christensen has posted chapters of a forthcoming textbook on Bayesian inference on his website. Chapter 1 provides the basic ideas with just enough of the mathematics to get by.
  • Gelman et al., Bayesian Data Analysis, 2nd ed. (Chapman & Hall, 2005) is one of my favorite books, but it will take you a lot longer than an hour to cover the basic material in chapters 1-2.
  • The same comment applies to Simon Jackman's Bayesian Analysis for the Social Sciences (Wiley, 2009), introduction and chapters 1-2. Simon also has an excellent chapter on hierarchical modeling (chapter 7), so you should definitely buy the book. I will try to give you the Esperanto version.

Monday, March 1, 2010

Raking and Calibration

Here are the notes on raking (and R code) and here are the notes on calibration (and R code).

Monday, February 22, 2010

More notes

Notes on PPS sampling are here and today's notes (the first installment on nonresponse) are here. These are both a bit rough.

Friday, February 19, 2010

Problem Set 2

The data for problem set 2 is here for 2008 and here for 2004.

Tuesday, February 16, 2010

More lecture notes and R files

It's been a while since I've uploaded lecture notes and R files, so here's a dump of recent stuff:

See you tomorrow.

Tuesday, January 26, 2010

Notes on Stratified Sampling and Ratio Estimation

In addition to falling further behind schedule, I've been a bit slow to post the lecture notes. Here are the notes on stratified sampling and ratio estimation.

Wednesday, January 13, 2010

Finite Population Inference

Here are the notes from Lecture 3. They are a little shorter than previously, though I will quickly go over the material on unequal sampling that was not covered in Lecture 2 (or Lecture 1). This is fairly theoretical material, though not difficult, and most people find it interesting.

The first problem set is now available. It's due in class next Wednesday.

Monday is a holiday, so the next class will be Wednesday January 20. The pace will pick up substantially at that point, since we will be having class twice a week.

Wednesday, January 6, 2010

Simple Random Sampling Notes

The notes on simple random sampling (Lecture 2) are posted here. The Horvitz-Thompson material from Lecture 1, with some edits, appears at the end of these notes, though I doubt we'll get to them tomorrow.

The notes are a bit more technical than I'd intended, but most of this material should be familiar (aside maybe from finite population corrections). Some background references on finite probability are in the footnotes, but this really should be stuff that you know. (I remain eternally optimistic.)

There will be no class on Monday January 11. The next class will be on Wednesday January 13.

Monday, January 4, 2010

Syllabus and Notes from Lecture 1

Here are the syllabus and the notes from the first lecture. I covered sections 1 and 2 of the notes in class and will resume with section 3 on Wednesday. (I've corrected some typos in the notes, so they're a little different from what was distributed in class.) Please read chapter 2 of Lohr before class on Wednesday.

Sunday, January 3, 2010

First Class Meeting


The class meets on Monday and Wednesday mornings from 9:00-10:30 a.m. in Wallenberg Hall (Building 160), Room 329 (the cave-like room pictured above).

A few notes about the course: the approach to survey sampling in this course will be statistical and practical. By "statistical," I mean that it's about the effective use of quantitative data and includes such issues as model building, design, estimation, and inference (thought not necessarily in that order!). By "practical," I refer to the attention that will be paid to the large and small imperfections that occur in real world surveys. It is generally impossible to implement the sampling designs exactly, substantial amounts of non-response and self-selection are inevitable, and the models employed will be, at best, approximations.

I will not be discussing "survey methodology." How to write a questionnaire, how to train interviewers, or manage a Web panel are all important skills for conducting a survey. This is largely an art (as indicated by the title of Stanley Payne's The Art of Asking Questions, still one of my favorites) and best learned by doing. In recent years, numerous studies have been done testing various hypotheses about survey methods, but this literature tends to be an ad hoc collection of results, often of limited generality, and not, despite claims to the contrary, a coherent "new science." At least, that's my opinion.

The applications that will be covered in the course come largely from surveys of U.S. elections. This reflects primarily my personal interests, but the methods and results have much wider applicability. Between campaign and media polls, exit polls, academic surveys (such as the American National Election Studies), and Internet panels, we encounter all of the common designs (simple random samples, stratified samples, one and multi-stage cluster samples, probability proportional to size, systematic, and balanced selection), estimation methods (ratio and regression estimators, post-stratification, raking, propensity scores, matching, Hierarchical and empirical Bayes), and problems (frame imperfections, nonresponse, self-selection).