Skip Navigation

Public Opinion Quarterly 2007 71(5):750-771; doi:10.1093/poq/nfm050
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kennedy, C.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Evaluating the Effects of Screening for Telephone Service in Dual Frame RDD Surveys

Courtney Kennedy

e-mail: ckkenned{at}umich.edu.


    Abstract
 TOP
 Abstract
 Factors Influencing the Decision...
 Research Design
 Results
 Discussion
 References
 
The high costs and largely unknown error properties of cellular telephone interviews make screening for cell-only adults a potentially attractive option in dual frame RDD surveys. Screening out adults with landline telephones from the cellular sample does not affect the coverage properties of a dual frame survey, but it may affect other sources of error, especially nonresponse. In this study, data from a 2006 dual frame RDD survey conducted for the Pew Research Center, the Associated Press, and AOL are used to evaluate the effects of implementing a cell-only screener on both the bias and variance of weighted survey estimates. The effect of screening appears to be minimal so long as an adjustment for telephone service is included in the weighting method. Results of an attempt to correct for residual nonresponse due to inaccessibility are also discussed.


The potential for coverage error stemming from the exponential growth of the cell phone only population (Blumberg and Luke 2007Go) has led to the development of dual frame, random digit dial (RDD) surveys. In these dual frame designs, a traditional sample from the landline RDD frame is supplemented with an independent sample from the banks of numbers designated for cellular phones. The emergence of this new approach to telephone survey design has raised numerous statistical questions as well as operational ones. A critical decision, with implications for both survey costs and errors, is whether or not to screen for cell-only persons (or households) in the cellular sample (Fleeman 2006Go).

A cell-only screener is implemented by asking respondents reached in the cellular sample whether or not they also have a landline. Persons with both a landline and a cellular phone (dual users) have a chance of being sampled in both frames. Consequently, screening them out of the cellular sample does not affect the coverage properties of the survey. A cell-only screener may, however, affect other sources of error. If dual users reached in a cellular sample are unlikely to respond in a landline sample and they differ from landline respondents on characteristics measured in the survey, then screening may increase nonresponse error. Conversely, if dual users reached in cellular and landline samples tend to be similar and cell phone interviews are relatively expensive, then screening may reduce costs without compromising data quality. Understanding the effect of screening is, thus, an important step in advancing the design of dual frame surveys.

In terms of estimation, it may be useful to recognize a key consequence of screening. Surveys that implement a cell-only screener are technically not "dual frame," as the term is commonly used in the statistical literature (Hartley 1962Go, 1974Go; Lund 1968Go; Fuller and Burmeister 1972Go; Bankier 1986Go). Instead, the samples from such designs are stratified, with each frame constituting its own stratum. Every adult in the telephone population has a nonzero probability of being interviewed in exactly one stratum. Conversely, dual frame RDD surveys that do not use a screener facilitate composite (rather than stratified) estimation. When no screener is used, dual users can be interviewed in either sample, creating an overlap. This allows researchers to compute two separate estimates (based on the two samples) for the segment of the population belonging to that overlap (i.e., dual users). The two estimates for the dual users can be composited and added to the estimates for the landline-only population and the cell-only population to produce a single estimate for the entire population.

While the multiple frame literature offers some guidance, several statistical issues have yet to be resolved for this nascent RDD design. In particular, there is uncertainty as to the nature of the weighting adjustment that should be used to correct for nonresponse bias. Brick and his colleagues (2006) experimented with five different methods, none of which was clearly superior. This state of affairs complicates efforts, such as the one undertaken here to investigate issues related to the quality of estimates produced from dual frame designs. I address this uncertainty by using the weighting methods proposed by Brick et al. and exploring the use of a new method. The primary focus of the research described below is assessing the effect of screening, but as a secondary goal, I also attempt to replicate and expand the work done by Brick et al. concerning the performance of post-survey adjustments for dual frame designs.


    Factors Influencing the Decision to Screen
 TOP
 Abstract
 Factors Influencing the Decision...
 Research Design
 Results
 Discussion
 References
 
Ideally, researchers would base the decision to use a cell-only screener on three factors: (i) the variance of the estimators, (ii) the bias (from nonresponse and possibly measurement error) in key estimates, and (iii) differential costs of data collection in the two frames, including the cost of a screener relative to a full interview.

The decision to use a cell-only screener in the cellular sample has two main implications for the variance of the sample estimates. The first implica- tion is the potential increase or decrease in standard errors from the weights (the design effect due to weighting). To date, no studies have addressed whether the design effect due to weighting tends to be larger for dual frame surveys that employ a cell-only screener versus those that do not. This is one of the issues explored in the current study. The second implication for the variance of the sample estimates pertains to the potential increase in sample size as a function of data collection costs. If the cost per completed full interview is more expensive in the cellular frame, then a cell-only screener permits a larger total sample size by maximizing the number of interviews conducted in the less expensive landline frame. In turn, the larger sample size would yield lower sampling variance, holding other factors constant.

With respect to the bias of survey estimates, two potential error sources are relevant to the screener decision – nonresponse and measurement error. Brick and his colleagues (2006) reported substantial nonresponse bias on questions related to telephone usage in a 2004 dual frame RDD survey. They attribute the bias to topic saliency and lower accessibility of dual users sampled through the cellular frame as compared to cell-only persons. The latter problem of differential accessibility is potentially endemic to all cellular RDD samples.

Similarly, there is evidence that cell-only persons and dual users who primarily use their cell phone are over-represented in cellular samples. Brick and his colleagues found that dual users from the cellular sample were significantly more likely to report being a "frequent" cellular phone user (45 percent) than dual users from the landline sample (26 percent) and dual users interviewed in the 2004 Current Population Study (CPS) (33 percent), which uses a national area-probability frame rather than RDD. Steeh (2004)Go reports similar findings. It should be noted, however, that while such nonresponse bias undermines estimates of telephone usage, it may not affect other survey estimates. If dual users reached in a cellular sample are similar to those reached in a landline sample on all the key survey measures, then it is unproductive from a coverage and nonresponse bias standpoint to interview them in both frames.

Measurement error, however, may not be the same in both frames. Several characteristics of cellular phones have led researchers to speculate that measurement errors could be larger when a respondent is interviewed on a cellular phone than on a landline phone (Steeh 2004Go; Lavrakas and Shuttles 2005Go; Brick et al. 2007Go). In particular, cellular phone respondents may be contacted at virtually any location, unlike landline respondents who are almost certainly at home.1 In addition, some have asserted that cellular phone conversations tend to be shorter than those conducted with landline phones, leading to a reduced threshold for lengthy questionnaires. This notwithstanding, preliminary studies have found no support for these hypotheses and instead support the assumption of equal levels of measurement quality. Steeh (2004)Go and Brick et al. (2007) report equivalent rates of item nonresponse in cellular and landline RDD samples. Using multilevel models with interviewer as a random effect and device (cell/landline) as a fixed effect, Brick et al. (2007)Go also report no main effect for device on the length of responses to an open-ended question. It should be emphasized, however, that these studies only consider indirect indicators of measurement error. More rigorous designs, ideally featuring random assignment to phone type and/or a record check, are needed to more fully address this question.

Cost is the third critical factor influencing the decision to employ a cell-only screener. Incentives compensating respondents for cellular phone charges and lower cooperation rates in the cellular frame (Brick et al. 2007Go; Keeter et al. 2007Go) potentially make screening a cost-effective option. Telephone numbers belonging to dual users, however, constitute a large segment of the cellular RDD frame. Consequently, screening for cell-only adults may be quite inefficient depending on the cost parameters. Aside from work by Link et al. (2007) based on a pilot study for the Behavioral Risk Factor Surveillance System, little is known about the cost differential between a screen-out and a full interview in dual frame RDD designs.

The current study assesses the impact of implementing a cell-only screener in several stages. First, characteristics of those who would be screened out (i.e., cellular sample dual users) are considered and compared to those of their counterparts in a landline sample. This is followed by a discussion of the process used to compute weighted estimates in dual frame surveys. Next, the bias in weighted estimates produced under both screener and no screener scenarios are estimated using external benchmarks. Finally, the impact on variance is considered, and the magnitude of this error is evaluated relative to the error from bias.


    Research Design
 TOP
 Abstract
 Factors Influencing the Decision...
 Research Design
 Results
 Discussion
 References
 
The Pew Research Center in partnership with the Associated Press and AOL, conducted a national dual frame RDD survey March 8–28, 2006. Content of the survey focused on technology, telephone usage, and political attitudes. The survey introduction identified the Associated Press as the sponsor, and the topic of the survey was not mentioned. Interviews were attempted with all eligible persons in both samples; no screener was used. A major thrust of the analysis is comparing estimates based on all the cases to estimates produced when simulating a cell-only screener by dropping the dual users in the cellular sample.

Both the cellular and landline samples were drawn by Survey Sampling International LLC and the data collection was conducted by Schulman, Ronca, & Bucuvalas, Inc. Numbers for the landline sample were drawn with equal probability from 100 banks with three or more listed residential numbers. The cellular sample was not list-assisted as there is no database of listed cellular phone numbers. Instead, the numbers for the cellular sample were drawn through a systematic equal probability sample from 1000 banks dedicated to cellular service according to the Telcordia database.

By design, the responding sample sizes were nearly identical. For the landline sample, 752 interviews were completed. This compares with 751 for the cellular sample. Among the landline sample respondents, 29 percent were landline-only and 71 percent were dual users; and by coincidence, among the cellular sample respondents, 26 percent were cell phone only and 74 percent were dual users.

In the landline sample, interviews were conducted with the youngest male aged 18 or older at home at the time of the call or, absent any adult male, with the youngest female at home. While this selection technique departs from probability sampling, it seems unlikely to have a systematic effect on the result reported below. In the cellular sample, interviews were conducted with the person who answered the phone. Interviewers verified that the person was aged 18 or older and in a safe place to talk before administering the survey. All cellular sample respondents were offered a $10 postpaid mailed remuneration for their participation. A message was left only once on the voicemail of cellular sample cases. No messages were left on answering machines for landline sample cases. The interviewing protocol was essentially a 10-call attempt design, although a few numbers (less than 1.5 percent of all numbers dialed for either sample) were attempted up to 13 times. Refusal conversion was attempted once for soft refusals in both samples.

The cellular sample proved more efficient in terms of eligibility and reaching phone owners but given contact, the participation rate was lower for the landline sample. Of the 8,414 numbers dialed for the cellular sample, 4,946 (59 percent) were coded as working residential numbers. Calls were placed to a total of 6,662 numbers for the landline sample, 2,859 (43 percent) of which were coded as working residential numbers. The AAPOR(3) response rate was 30 percent for the landline sample and 20 percent for the cellular sample. While the cellular sample achieved a somewhat higher contact rate (76 percent versus 68 percent for the landline sample), the cooperation rate for the cellular sample (28 percent) was approximately half that achieved for the landline sample (50 percent).


    Results
 TOP
 Abstract
 Factors Influencing the Decision...
 Research Design
 Results
 Discussion
 References
 
Comparison of Dual Users Interviewed in the Two Frames
In theory, interviewing dual users in the cellular sample is unnecessary, since the dual users are represented in the landline sample. As highlighted in Brick et al. (2006)Go, however, nonresponse can produce respondent samples that look quite dissimilar, at least with respect to telephone service. A comparison, presented in table 1, of the two sets of dual users interviewed in the Pew/AP/AOL study shows that they differ with respect to demographic and attitudinal variables as well.


View this table:
[in this window]
[in a new window]

 
Table 1.  Comparison of Dual Users from the Landline and Cellular Pew/ AP/AOL Samples to NHIS Benchmarks

 
The differences between the dual users in the two samples reflect heavier usage of cellular phones among young people. Of the dual users interviewed in the cellular sample, 18 percent were aged 18 to 25 compared with 8 percent in this age range in the landline sample (t = 4.68, df = 1,060, p <.001). It follows that the cellular sample dual users are more likely to never have been married (t = 3.01, df = 1,067, p <.01). Similarly, the differences on having low income (t = 1.45, df = 919, p =.15) and being a homeowner (t = –1.45, df = 1,084, p =.15) are in the expected direction, although they do not reach statistical significance. There is also some evidence that dual users from the landline sample hold more conservative political views than their counterparts in the cellular sample. Dual users from the cellular sample are more likely to self-identify as a Democrat (t = 2.45, df = 1,084, p =.01) and less likely to identify as Independent/Other (t = 2.32, df = 1,086, p =.02).

Not surprisingly, there is a strong relationship between how often dual use respondents use their landline relative to their cellular phone and the frame in which they are interviewed. A majority of dual users from the cellular sample use their cellular phone most of the time, while a plurality of dual users from the landline sample primarily use their landline phone (t = 5.68, df = 1,084, p <.001). Thus, the cellular sample of dual users over-represents heavy cellular phone users (who tend to be young) relative to the landline sample.

The first two columns in table 1 provide some evidence that dual users interviewed in the landline and cellular RDD frames differ with respect to variables important to many social science surveys. This leads to the question of which group is more representative of all dual users in the country. To help answer this, weighted estimates based only on dual users interviewed in the 2005 National Health Interview Survey (NHIS) are presented in the far right column in table 1. The NHIS is based on a national area probability sample and achieved a final response rate of 69 percent in 2005. Consequently, NHIS estimates are not nearly as susceptible to the coverage and nonresponse problems plaguing telephone RDD surveys. That said, the NHIS is an imperfect benchmark because the measurement procedures differed from those used in the Pew/AP/AOL study. In the most recent year for which a NHIS public use file is available, information about telephone usage was collected at the family level. By contrast, information about telephone usage was collected at the individual level in the Pew/AP/AOL study. This potential measurement error confound should be kept in mind when assessing the differences shown in table 1.

Twenty benchmark point estimates are compared for seven demographic variables. In light of the large sample size for the NHIS figures (N = 13,783), the more stringent 0.01 significance threshold is used for the benchmark comparisons. This was done to focus attention on the largest and likely most robust disparities (across repeated RDD samples). Using the 0.01 threshold, less than one significant difference would be expected from sampling variability alone for each set of dual users. There are, however, 10 estimates that are significantly different from the NHIS benchmark for each group of dual users. In both the samples, homeowners and persons with an annual income of $75,000 or higher are underrepresented. Approximately one-third of dual users in the landline and cellular samples report an annual income in this range (33 percent and 32 percent, respectively) compared with over half (51 percent) of the dual users in the NHIS. It should be kept in mind that the Pew/AP/AOL data in this section of the analysis are unweighted. Nevertheless, these differences are quite striking and suggest the potential for substantial nonresponse bias on measures related to wealth in RDD telephone surveys using either type of sample.

On other demographic dimensions, neither group is consistently more representative of dual users in the US population as measured by the NHIS. The landline sample dual users more closely resemble their counterparts in the NHIS with respect to gender and race, but the landline group also tends to be older. The cellular sample dual users include a disproportionate number of non-whites, in terms of race, but essentially the same proportion identifying as Hispanic as the NHIS estimate for dual users.

It is worth noting that, in addition to possible confounding of the benchmark comparisons, measurement error may also be compromising the internal comparison between the two sets of dual users from the Pew/AP/AOL study. This is less of a concern for demographic questions. There is no obvious theoretical reason to expect the measurement error properties of demographic items to differ for landline versus cellular phone interviews. On attitudinal items, however, measurement error may be more problematic. Without an experimental design that randomly assigns dual user respondents to be interviewed on only one type of phone, the confound between nonresponse and measurement error cannot be fully disentangled.

Developing Weighted Estimates to Evaluate the Effect of Screening
Screening for cell-only persons in the cellular frame potentially affects both the bias and the variance of estimates from dual frame RDD surveys. In order to gauge these effects, it is necessary to compute weighted survey estimates under both the screener and no screener conditions. This involves specifying a weighting method that (1) combines the landline and cellular samples in a way that properly reflects probabilities of selection and (2) adjusts for deviations from known population parameters correlated with the key survey measures.

Unfortunately, since dual frame RDD surveys are considered a new development, researchers still are working to identify the best way to carry out such a weighting method. The analysis presented here builds upon the work by Brick et al. (2006)Go, who describe an approach for combining the samples and present several poststratification methods. I proceed by briefly reviewing several of the weighting methods presented in that article2 and then introducing an additional method that attempts to account for nonresponse due to differential (landline/cellular) usage among dual users.

Following Brick et al. (2006)Go, dual frame estimates were computed using a simple composite estimator (Hartley 1962Go) ycomp = yLLO + yCPO + {lambda} yDual, Landline RDD + (1 - {lambda}) yDual, Cell RDD, where yLLO, yCPO, yDual are the totals for the landline-only, cell-only, and dual user populations, respectively. The estimates for the dual user population based on the landline and cellular samples (yDual, Landline RDD, yDual, Cell RDD) are combined with the "mixing parameter,"{lambda}(0 ≤ {lambda} ≤ 1), which is set to 0.5 in this analysis.3

Several factors related to the sample design and respondent characteristics must be taken into account before composite estimates can be calculated. The initial step is to create base weights. First, each sample is inflated to the size of the frame by multiplying by the ratio of the size of the frame to the sample size. Next, a nonresponse adjustment is calculated by sorting cases into weighting classes. The adjustment is the reciprocal of the response rate within the weighting class. The weighting classes are defined by region in the landline sample and the cross-classification of region and number of calls to first contact in the cellular sample. It is also necessary to account for the fact that many individuals have multiple phone numbers and, thus, multiple opportunities for selection into the sample. This is accomplished by multiplying the nonresponse-adjusted weight by the reciprocal of the number of phones. The denominator was truncated at 3 for landline sample cases and 4 for cellular sample cases because very small percentages of the population fall beyond these bounds.4 This composite weight can then be expressed as


Formula

for the landline sample cases and as


Formula

for the cellular sample cases, where IDual is an indicator variable with value 1 if the respondent is a dual user and value 0 otherwise.

After the two samples are combined using the composite weight, it is usually still necessary to adjust for nonresponse. Two options proposed by Brick et al. are (1) to rake the combined sample to demographic variables or (2) to rake the combined sample to demographic variables and an estimated parameter for telephone service (landline-only, dual, or cell-only). Following their nomenclature, these methods are referred to as "raked" and "service," respectively.

One potential shortcoming of the weighting procedures proposed by Brick and colleagues is that they do not account for variation in telephone usage within the population of dual users. It is well known that some people with both a landline and a cellular phone use the cellular only rarely, such as in an emergency (Tucker, Brick, and Meekins 2007Go). Other dual users do just the reverse; they make and receive nearly all their calls on their cellular phone and rarely use their landline phone. Importantly, at both extremes, some dual users do not answer any incoming calls on one of their two types of phones.

One technique to help correct this residual nonresponse error is to poststratify (rake) to an estimated national parameter for relative telephone usage. In theory, this assures that each usage group identified in the survey (e.g., heavy cell users, heavy landline users, etc.) gets weighted in proportion to its prevalence in the population. This method, superficially at least, seems to more accurately reflect and account for people's true response propensities in each frame. Thus, the most complicated weighting method evaluated here calls for raking to demographic variables and an estimated parameter for relative telephone usage (landline-only, mostly landline; mostly cell; cell-only).

The estimated parameter for relative usage comes from the 2004 supplement to the Current Population Survey. The question was worded:

Of all the calls that you or any other member of your household receive, about how many are received on a cell phone? All or almost all calls; more than half; less than half; or very few or none?

A highly similar, although not identical, pair of questions was asked in the Pew/AP/AOL survey:

Thinking about all the phone calls that you make, do you make more calls with your cell phone or more calls with your regular home phone?

Would that be a LOT MORE or just a FEW more with your (cell phone/regular home phone)?

In order for poststratification to a relative usage parameter to be effective, several conditions must be satisfied. The measurement process should be nearly identical in the benchmark study and the study at hand. As illustrated here, this equivalence can be difficult to achieve in practice.5 The CPS item shown above was the most viable option for use in the Pew/AP/AOL study despite the fact that it was somewhat outdated,6 uses a different question, and asks about household-level telephone usage rather than the respondent's usage. The reliability of responses to both the CPS and the Pew/AP/AOL questions are uncertain, particularly due to the use of vague quantifiers, such as "more than half" and "a lot more." Thus, it is likely that measurement error will, to some degree, undercut the potential benefits in nonresponse reduction from raking to this variable.

Another necessary condition for the success of this weighting adjustment concerns the relationship between the relative usage variable and outcome measures of interest. As with any parameter, adjusting for relative usage will only reduce bias if it has a substantial covariance with the key survey measures and that covariance is not already explained by other variables in the weighting method. This is an empirical question that is specific to each survey measure. If the relative usage method is a success, it will reduce bias relative to the service method, which also rakes to demographics and telephone service, but not to relative telephone usage. Conversely, if the relative usage method fails, this will manifest as higher design effects (due to variability in the weights) without any perceptible reductions in bias.

In total, four weighting methods are evaluated in this study. The baseline comparison is the set of estimates based just on the landline sample poststratified to demographic variables. The most simplistic set of dual frame estimates are based on poststratifying to the same set of demographic variables. The other two weighting adjustments call for poststratifying to the demographic variables as well as increasingly nuanced measures of telephone usage. The methods can be summarized as follows.

(1) Landline Sample Only: Drop the entire cellular sample. Rake the landline sample to six demographics variables.
(2) Raked: Combine the landline and cellular samples using the composite weight. Then rake to six demographic variables.
(3) Service: Combine the landline and cellular samples using the composite weight. Then rake to six demographic variables and telephone service (landline-only; dual; cell-only).
(4) Relative Usage: Combine the landline and cellular samples using the composite weight. Then rake to six demographic variables and relative telephone usage (landline-only; mostly landline; mostly cell; cell-only).

There are two important and related differences between the weighting methods described by Brick et al. and those used for this study. Brick et al. perform household-level estimation rather than the adult-level estimation undertaken here. This has consequences for the parameters that are suitable for use in the raking adjustment. Brick et al. raked to three demographic parameters: (1) reference person is of Hispanic origin (yes/no), (2) the number of adults in the household and their marital status, and (3) whether the home was owned or rented. By contrast, the Pew/AP/AOL survey was designed for adult-level estimation, and so a different and arguably richer set of demographic variables was available for adjustment. The six demographic variables used in this analysis are cross-classifications of sex/age, sex/education, education/age, race/ethnicity, as well as, region and county population density (quintiles).7

The Effect of Screening Under Several Weighting Methods
The comparison of the performance of the four weighting methods is presented in table 2. The far left column reports the difference, for each response category, between the weighted landline sample estimate and a benchmark estimate from one of three large area probability surveys.8 The other three weighting methods utilize both the landline and cellular samples, and so two sets of benchmark differences are presented – differences for estimates based on all the cases (no screener) and differences for estimates based on a simulated design using a cell-only screener.


View this table:
[in this window]
[in a new window]

 
Table 2. Differences Between Survey Estimates with and without a Screener and Benchmarks by Poststratification Method

 
Across the weighting methods, persons with low income, renters, and those not married are overrepresented, although the landline sample-only estimates tend to be more accurate. These errors persist despite the fact that each of the weighting methods aligned the sample to population parameters for age, gender, race, education and so forth. Differences between the weighting methods on the telephone service question at the top of the table conform to expectations. By definition, there is essentially no difference between the survey estimates and the benchmarks when the weights adjusting for telephone service are used. The slight deviations observed for the relative usage estimates reflect the fact that convergence to the control totals was nearly, but not completely achieved from raking (17 iterations to each raking variable were performed).

On the central question of the effect from screening, there is some evidence that screening leads to greater bias if no adjustment is made for telephone service. Under the weighting method raking only to demographics (raked), the absolute values of the differences between the survey estimates and the benchmarks tend to be greater under the screener scenario than under the no screener scenario. When an adjustment is made for telephone usage, however, this effect goes away and estimates based on a screener appear to perform at least as well as those based on a design with no screener. Under each of the two methods making a telephone adjustment (service and relative usage), the median deviation was approximately 4.0 percentage points under the screener scenario and 4.2 percentage points under the no screener scenario.

The direction of the screener effects observed with the raked weighting method (second and third columns from the left) is informative and in some ways unexpected. Under the cell-only screener scenario, dual users from the cellular sample are excluded. Thus, on items for which the dual users from the cellular sample are less representative than their landline sample counterparts (as indicated in table 1), we might expect the screener estimates to be more accurate than the no screener estimates. By this logic, the benchmark deviations in table 2 are expected to be smaller under the screener scenario relative to the no screener scenario for estimates, such as the proportion of adults with income under $20,000, the proportion owning their home, and the proportion who are married. We observe, however, the exact opposite result; deviations on these estimates are somewhat larger under the screener scenario. For example, the screener estimate under the simple demographic weighting (raked) for the proportion of adults owning their home is 14.6 percentage points less than the benchmark estimate from the CPS. By contrast, the no screener estimate under this weighting method is 9.9 percentage points less than the benchmark.

One explanation is that the most consequential effect of screening is on the contribution of cell-only respondents to sample estimates, rather than on the composition of the pool of dual users interviewed. In the Pew/AP/AOL study, the effect of screening on the contribution of cell-only respondents is the most dramatic within the youngest age group. When no screening is performed, the distribution of telephone service among those ages 18–24 (n = 204) is 30 percent cell-only, 65 percent dual user, and 5 percent landline-only. When a screener is used, this distribution among the youngest age group becomes 55 percent cell-only, 35 percent dual user, and 10 percent landline-only. Clearly, even after an age adjustment is made in weighting (as it is under all methods reported here), there remains a sizeable effect from screening on the composition of this subgroup.

Research based on area-probability surveys indicates that cell-only status is associated with low income (Blumberg and Luke 2007Go), renting (Blumberg and Luke 2007Go; Tucker, Brick, and Meekins 2007Go), and being unmarried (Tucker, Brick, and Meekins 2007Go). In the Pew/AP/AOL study, under the raked weighting method, the effect of screening on each of these three estimates is greater error in the direction of overrepresentation of the cell-only respondents. This effect from screening is attenuated, if not eliminated, when the weighting method includes a telephone service adjustment.

On political items, the effect of screening is quite mixed. If anything, the results suggest that within weighting classes, cell-only individuals are fairly similar on these measures to those who are dual users or landline-only. Variation in the magnitude of the effect of screening observed in table 2 shows that impact of this design decision varies across survey statistics. As mentioned above, however, the central finding is that screening has no perceptible, consistent effect on bias so long as a correction for telephone service is made in the post-survey adjustment.

Although the use of a cell-only screener may not affect bias, there is some evidence that it affects precision. The bottom two rows of table 2 report the design effects due to weighting. When the weighting methods raking to telephone service or relative usage (in addition to demographics) are used, the increase in variance is somewhat lower for the screener design than the no screener design. When the dual frame weight adjusting only for demographics is used, the difference in design effects under the screener versus no screener design is negligible (1.43 versus 1.42). The approximate design effect was computed using the standard formula of 1.00 plus the squared coefficient of variation of the weights. Actual design effects were computed using Taylor Series approximation in SUDAAN V.9 (Research Triangle Institute 2004Go) to account for the composite and poststratification weights. The results are highly similar.

Setting aside the issue of screening briefly, we can consider the performance of the weighting methods under a no screener design. There is no consistent evidence, based on the composite estimates, that adjusting for telephone usage reduced bias on the estimates evaluated in this study. The median deviation from the benchmark figures (excluding telephone service) came out to the same value (4.2 percentage points) when an adjustment for telephone service or relative usage was made and when it was not. These results corroborate the findings in Brick et al. (2006)Go.

There is also no evidence that raking to the CPS relative telephone usage measure reduced bias relative to raking to telephone service. Deviations from the benchmarks tend to be just as large with the relative usage weight as they are with the service weight. The design effects are fairly comparable across the weighting methods, although there is some indication that the increase in variance is slightly greater under the relative usage method. Looking at weights for the composite estimates (no screener), the mean design effect was 1.77 under the relative telephone usage method, 1.52 for the telephone service method, 1.42 for the demographic adjustment, and 1.38 for the landline sample-only weights.

Replicating another finding from Brick et al. (2006)Go, there is no consistent evidence that conducting the cellular sample was beneficial for this study. The landline sample only design tended to yield less biased and more precise estimates than the designs utilizing the cellular sample.

The Effect of Screening on the Total Error of Survey Estimates
The analysis above presents a mixed picture as to the effect of screening on the quality of dual frame RDD survey estimates. Weighted estimates from a no screener design are sometimes closer, but other times farther, from benchmark figures relative to estimates produced when a cell-only screener is simulated. An alternative approach for evaluating the effect of screening is to consider the total (or mean square) error on estimates produced with and without a screener. Mean square error reflects both the bias and variance of a given survey estimate. Considering these two factors simultaneously may provide greater insight into the consequences of screening.

The mean square error was computed for the modal response category for each of the five measures not used in any weighting adjustments, but for which a benchmark was available. The squared bias was calculated as the square of the difference between the weighted estimate and the benchmark figure. The variance was calculated as the squared standard error from SUDAAN. The root mean square errors (RMSE) are ultimately used in this analysis because in dealing with proportions, squared errors are deceptively small values that sound trivial when, in actuality, may be quite substantial. For example, consider an error of 5 percent on an estimate, which when squared, becomes 0.0025.

The selection of the modal categories for this analysis yielded a fairly representative subset of the results that would be observed were we to consider the RMSE for every response option. As noted above, the effect of screening varies across response categories, so checks were performed to verify that this subsample did not yield misleading results. In 27 of the 39 nonmodal estimate pairs, the screener versus no screener comparison mirrored that of the modal response. For example, when the raked weight (adjusting only for demographics) is used to estimate income, the no screener estimates are closer to the benchmark for the modal response ($40,000–$74,999) and for each of the nonmodal responses.

Figure 1 presents the root mean square errors observed for each of the five modal response estimates under both the cell-only screener and no screener scenarios. The dark-shaded segment of each RMSE bar represents the amount of error due to bias, and the lighter-shaded segment represents the error due to variance. The results are shown separately using the raked weighting method that adjusts only for demographics (left side) and the service weighting method that adjusts for telephone service in addition to demographics (right side). The results under the relative usage weight are not presented because, on balance, it was inferior to the service weight for this study.


Figure 1
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Root mean square errors on estimates for modal response categories with and without a screener by poststratification method.

 
The results in figure 1 suggest that bias is a more significant error source than variance. To illustrate this, we can consider an item with an average amount of bias – the proportion self-identify as a Democrat or leaning to the Democratic Party. From table 2, the median discrepancy between the weighted survey estimate and the benchmark figure is roughly 4 percentage points, looking across the bottom five questions. This is approximately the value of the discrepancies observed for estimates of Democrats. In the middle row of figure 1 (% Democrat), we see that roughly 80–90 percent of the error on this estimate comes from bias rather than variance. This holds for both designs (screener/no screener) and both weighting methods (raked/service).

The primacy of bias is consequential because it suggests that researchers considering a cell-only screener should be more concerned with nonresponse bias than with design effects due to weighting or modest influences on sampling error. Indeed, compared to the estimated bias, the estimated variance is relatively stable across estimates. Between the two weighting schemes, the minimum standard error on these five estimates was 0.013 and the maximum was 0.020.

While it is clear from figure 1 that bias is generally a larger source of error than variance, it is not clear which design (screener or no screener) should be implemented to reduce bias. As indicated in table 2, whether bias is smaller when there is a cell-only screener or when there is no such screener varies from estimate to estimate.


    Discussion
 TOP
 Abstract
 Factors Influencing the Decision...
 Research Design
 Results
 Discussion
 References
 
How researchers design and weight samples from the cellular RDD frame has important consequences for survey error. In particular, screening out adults with landline telephones from the cellular sample has the potential to increase bias unless an adjustment based on telephone service is used. Without such an adjustment, there is some evidence that cell-only respondents are overly influential in estimation, although this effect varies across statistics and likely depends on the relative sizes of the landline and cellular samples. When adjustments are made for telephone service as well as traditional demographics (gender, age, education, race/ethnicity, etc.), the estimates produced with and without a cell-only screener appear to yield comparable levels of bias. The increase in variance due to weighting appears to be somewhat lower under designs using a cell-only screener. Variance, however, comprises a much smaller proportion of total error than bias for the demographic, lifestyle, and political measures evaluated in this study.

A useful next step in evaluating cell-only screening would be to take into account the relative costs of landline interviews, cellular interviews, and screen-outs in the cellular sample. In the Pew/AP/AOL study, the total costs of conducting the cellular sample were approximately 2.4 times the costs of the landline sample, when expenses associated with the $10 incentive for cellular sample respondents are included. The applicability of this cost-differential to future surveys may be limited, however, due to the newness of the data collection procedures, among other factors. The results of this analysis do not indicate that dual frame RDD designs with or without a screener yield more accurate or reliable estimates than landline sample-only designs, at least for the estimates in this study. Cost permitting, however, researchers may still want to err on the side of conducting a cellular sample to guard against coverage error as telephone behaviors continue to evolve.

Future research on dual frame RDD design should continue to explore different statistical adjustments for combining landline and cellular samples. The results presented here demonstrate the challenges associated with identifying an appropriate telephone usage parameter for adjustment purposes. The weighting method that raked to a relative usage item from the CPS (landline-only, mostly landline, mostly cell, cell-only) performed no better than the comparable method that raked to telephone service (landline-only, dual, cell-only). In fact, the relative usage method inflated the variance without reducing bias on the estimates evaluated.

The failure of the more detailed telephone usage variable has several potential explanations. Measurement error due to differences in question wording and other aspects of the response process is one potential problem. Another is change over time in the phenomenon. The 2004 CPS measure indicated that two-thirds of dual phone households received most of their calls on a landline phone while one-third received most calls on a cellular phone. It is possible that these figures shifted by a nonnegligible extent between the time the CPS data were originally gathered and when the Pew/AP/AOL study was conducted in early 2006. It may also be the case that relative telephone usage is simply not correlated with many survey measures – at least not after adjustments are made for other variables. A related explanation is that the proposed weighting methods are sound, but changes in telephone behaviors have not yet evolved far enough for them to prove useful. For example, in 10 years, it may be the case that adjusting for relative usage does reduce bias and dual frame designs do, in fact, outperform stand-alone landline samples. The regular public release of results for a fixed set of telephone usage questions, such as those added to the NHIS in 2007, should greatly aid researchers in improving weighting techniques and, by extension, evaluating the effects of design decisions in dual frame surveys.


    Footnotes
 
COURTNEY KENNEDY is with the University of Michigan Program in Survey Methodology, 426 Thompson Street, Room 4050, Ann Arbor, MI 48104, USA. I am grateful to the Pew Research Center for the use of these data and, in particular, to Scott Keeter for engaging me in the topic. I would also like to thank Roger Tourangeau, Robert M. Groves and J. Michael Brick for their invaluable comments on earlier versions.

1 While no authoritative estimates exist, the proportion of landline numbers that have been transported to a cellular phone is believed to be trivial, at least at present. Less than 1 percent of the landline respondents in the Pew/AP/AOL study described below were interviewed on a cellular phone. Back

2 The weighting method that performed the worst in the Brick et al. study is not discussed here. That method, the "separate composite" weighting, calls for raking the two samples separately and then combining them. The method that involves no raking (simple composite) is also not discussed because it does not reflect the fact that many national RDD surveys have some form of post-survey weighting adjustment. The simple composite method was used in the Brick et al. study as essentially a baseline measure. Back

3 In Brick et al. (2006)Go and in the current study, roughly the same number of dual users were interviewed in each sample. Consequently, 0.5 serves as a reasonable approximation to the optimal value for {lambda}. Back

4 This multiplicity correction could only be computed for the cellular sample cases. Data were not collected on the number of landlines used by respondents in the landline sample. This information was not collected for two reasons (1) internal studies conducted by Pew have consistently found that adjusting for the number of landlines in the household has little or no perceptible effect on survey statistics and (2) questionnaire space in this study was extremely limited due to a desire to keep the interview length at roughly 10 minutes. To evaluate the effect of excluding the multiplicity correction, the composite weight was calculated and estimates were produced with and without the correction for the cellular sample cases. The differences in the resulting point estimates and in the variance of the weights were negligible, and so the multiplicity correction was dropped from the analysis. Back

5 Data from the first area probability study to routinely ask a fixed set of questions about relative telephone usage (the NHIS) have yet to be released to the public. Back

6 An alternative weighting procedure would be to use the potentially outdated variable as a "preweight" and not include it as a raking margin. That is, the composite weights could be ratio-adjusted, so that the weighted counts sum to the relevant out-of-date totals from the CPS. Then, the final raking would be performed, but the sample would not be forced to match the totals for the out-of-date variable. Preweighting was not carried out in this analysis so as to demonstrate the full effect of using the variable in adjustment (imperfect though it may be), but this may be a fruitful approach in future work. Back

7 Consideration was also given to including home ownership status (own/rent/other) in the adjustment because findings from Blumberg and Luke (2007)Go indicate that this variable is strongly associated with telephone status. Ultimately, the variable was excluded because one analytic goal is to compare estimates using what was formerly Pew's standard landline sample design with estimates based on a dual frame design. Traditionally, Pew has not included home ownership in post-survey adjustment. Furthermore, in this analysis there is substantial utility in having home ownership as a benchmark comparison, which would not be possible if it were included in the weighting. Back

8 Benchmark estimates come from the National Health Interview Survey, the Current Population Survey, and the General Social Survey. See the footnote in table 2 for an item-by-item listing. Back


    References
 TOP
 Abstract
 Factors Influencing the Decision...
 Research Design
 Results
 Discussion
 References
 
Bankier Michael D. "Estimators Based on Several Stratified Samples with Applications to Multiple Frames Surveys." Journal of the American Statistical Association (1986) 81:1074–1079.[CrossRef][ISI]

Blumberg Stephen J., Luke Julian V. "Wireless Substitution: Early Release of Estimates Based on Data from the National Health Interview Survey, July–December 2006." (2007) Available at http://www.cdc.gov/nchs/nhis.htm.

Brick J. Michael, Brick Pat D., Dipko Sarah, Presser Stanley, Tucker Clyde, Yuan YangYang. Cell Phone Survey Feasibility in the U.S.: Sampling and Calling Cell Numbers versus Landline Numbers. Public Opinion Quarterly (2007) 71(1):23–39.[Abstract/Free Full Text]

Brick J. Micheal, Dipko Sarah, Presser Stanley, Tucker Clyde, Yuan YangYang. "Nonresponse Bias in a Dual Fame Sample of Cell and Landline Number." Public Opinion Quarterly (2006) 70(5):780–793.[Abstract/Free Full Text]

Fleeman Anna. Merging Cellular and Landline RDD Sample Frames: A Series of Three Cell Phone Studies. (2006) Paper presented at the Second International Conference on Telephone Survey Methodology, Miami, FL.

Fuller Wayne A., Burmeister Leon F. "Estimators for Selected Samples from Two Overlapping Frames." Proceedings of the Social Statistics Section, American Statistical Association (1972) 245–249.

Hartley Herman O. "Multiple Frame Surveys." Proceedings of the Social Statistics Section, American Statistical Association (1962) 203–206.

Hartley Herman O. "Multiple Frame Methodology and Selected Applications." Sankhya (1974) 36(Series C):99–118.

Keeter Scott, Kennedy Courtney, Tompson Trevor, Mokrzycki Mike, Clark April. What's Missing from National RDD Surveys? The Impact of the Growing Cell-Only Population. (2007) Paper presented at the Annual Meeting of the American Association for Public Opinion Research, Anaheim, CA.

Lavrakas Paul J., Shuttles Charles. "Cell Phone Sampling, RDD Surveys, and Marketing Research Implications." Alert! (2005) 43(6):4–5.

Link Michael W., Battaglia Michael P., Frankel Martin R., Osborn Larry, Mokdad Ali H. Conducting Public Health Surveys over Cell Phones: The Behavioral Risk Factor Surveillance System Experience. (2007) Paper presented at the Annual Conference of the American Association for Public Opinion Research, Anaheim, CA.

Lund Richard E. "Estimators in Multiple Frame Surveys." Proceedings of the Social Statistical Section, American Statistical Association (1968) 282–288.

Research Triangle Institute. SUDAAN Example Manual, Release 9.0 (2004) Research Triangle Park, NC: Research Triangle Institute.

Steeh Charlotte. A New Era for Telephone Surveys. (2004) Paper presented at the Annual Conference of the American Association for Public Opinion Research, Phoenix, AZ.

Tucker Clyde, Brick J. Michael, Meekins Brian. "Household Telephone Service and Usage Patterns in the United States in 2004: Implications for Telephone Samples." Public Opinion Quarterly (2007) 71((1)):1–22.[Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kennedy, C.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?