Example 1 Explanation

In this analysis, we are slicing and dicing:

  • person (pwgt1) count
  • household (houswgt, phouseholds) counts

by

  • household income (rhhinc)
  • number of automobiles that they have (autos)
  • citizenship (citizen)


The easiest way to read this document is to print the household level and person level sections of the analysis. We are going to refer back and forth between this document and the analysis frequently, and it's easier to have a hard copy.

We are doing this for the state of Pennsylvania. This is a very simple analysis, which we are using to introduce the features that are common in all of our analyses. There is only one real derived variable (phouseholds, which is present in nearly all analyses), and no simulations. The points to notice, which are common to all of our analyses are:

To minimize confusion, the columns are named consistently with the fieldnames chosen by the census bureau. For example, the census bureau collects several kinds of household income, but only one is called "rhhinc". We have placed this analysis in a table for readability, but usually we give our customers a comma delimited ASCII file, meant to be fed into a spreadsheet or other data management tool.

The first label (found on line number 1 of the household section) that appears looks like this: '-1-. This particular label is an open-ended range, which means "negative one or less". All ranges are prefixed with a single quote, because the results from the analyses are often fed into spreadsheets. If we don't quote it, the spreadsheet takes the range, and evaluates it as an expression, thus making (e.g.) "500-1000" into "-500". Prefixing it with a single quote prevents the evaluation. The "-" after the quote means that the number is negative, in this case it's negative one. The "-" after the one means that the range includes all numbers less than "-1". Similarly, "500+" would include all numbers greater than or equal to 500. Note that "rhhinc" is a continuous variable, meaning that the census bureau records the exact figure, rather than a range. This allows the customer to define ranges any way they want to.

The last column in the household level section is labeled "Houswgt". This is the weighted count of households. Remember that this is long form data, and not everyone fills out the long form. So the census bureau takes 5% of the country and weights each person and household in the 5% to simulate 100%. So this is theoretically the number of households in the entire state of PA that have these characteristics. For a in-depth statistical discussion on the validity of the sampling for your particular application, see this site. But be advised, the file is 2266 KB, and it will take some effort to find what you want.

There are two different sections to the analysis, a household level section, and a person level section. That is because most analyses are really conducted at both levels. The household level section has only household level dimensions and volumes. The person level section has all household level dimensions, person level dimensions, and person level volumes. (Volumes are things that are accumulated, like person count, household count and dollars.)

There are two different kinds of housing counts. At the household level, there is "houswgt", which is the weighted count provided by the census bureau. At the person level, there is "phouseholds", for "person level households" (which is also weighted). This field is derived by our tools at analysis time. It tells you how many households the persons on any particular line of the analysis occupy. It is important to note that in nearly every case, "phouseholds" cannot be tallied to "houswgt", because they are taken at two different levels of analysis. As an example, look at line number 2 of the household section.  There are 585 households in PA which have one vehicle and had a negative income in 1989.  But if we go to the person level section (see lines 2-3 of the person level section), we see that for households which have one vehicle and lost money, there are 585 households which have people born in the US and 7 households which have non-US citizens.  The 7 non-citizens live in houses with citizens, which is one reason that households at the person level are not usually additive.   Click here for a more complete explanation of differences in housing counts at different levels of analysis. Another item to note here is that in line 5 of the person level section there appears to be 7 people living in 10 households. This is an artifact of the weighting of the data, and when cells have only 7 people in them, they aren't valid for decision making. If this is a concern, the aforementioned site will help with an understanding of validity for your particular application.

Lines 9-13 of the person level section seem to reflect a large number of people living in zero households.  The household categorizations for these lines are income between $0 and $25,000 and N/A for autos, because the households are vacant or group quarters.  If the households are vacant, there shouldn't be any entries in the "person level" section.  Also, we have counts in the Pwgt1 (weighted person count) column.  If we look at the household level section for this category, we find 441,637 households on line 7.  How can we have people living in 0 households in the "person level" section and 441,637 households in the same category in the household section?  This is an example of a situation where an analyst might get into a lot of trouble with a lesser tool than ours.  We can simply add more dimensions to the analysis until we understand what is happening.  We know that if the households are vacant, they should not appear in the person level section.  So there must be people in them.  But why does the household count come up zero in the person level section? 

To answer the question, we perform the same analysis, but this time add a dimension to indicate if they are group quarters or not (Gqinst).  Here are the household level and person level sections of the second analysis. Line 7 of the household level section of the new analysis shows us that the 441,637 households in the household level section are housing units - they aren't group quarters.  Lines 9-18 of the person level section of the new analysis show us that they are not housing units, but are group quarters.  What's happening is that there are vacant households showing up in the household section but not the person section (because there are no people in them), and group quarters showing up in the person level section but not in the household level section (because the census bureau weights them as zero).  They are two completely different sets of data that appeared to be the same, until an additional dimension was added.


Home - Company - Contact - Terms - Sample Tabulations - Corporate Analyses - Run A Tabulation - Data Source - Household Data - Person Data - Free Offer Details
A time travel adventure, Time Changer is the story of Bible Professor Russell Carlisle (D. David Morin) who has written a new manuscript called "The Changing Times". His new work is about to receive the unanimous endorsement from his peers at Grace Bible Seminary until his fellow Professor, Dr. Norris Anderson (Gavin MacLeod), has a difficulty with something Carlisle has written that he feels will greatly affect the future. Using a secret time machine, Dr. Anderson sends Russell Carlisle over 100 years into the future to see where his thinking will lead. (99 min)
The above space is provided gratis, because Innovative Computing, Inc. believes that everyone should see this film.