Notes on Industry, Occupation (Occup), Ancestry (Ancstry), Race, Language (Lang2) and Place of Birth (Pob)

The Census Bureau documents the above 6 fields differently than the others, presumably because there are so many possible code values for each.  They have created appendices which contain the encodings.

In some cases, the encodings are at multiple levels (such as in Occupation, 000-202 are "Managerial and Professional", which is then broken down into sublevels, then the sublevels are broken down into the lowest levels.  This requires us to choose the encoding level that we want to use as a default, but our customers may choose to encode it any way they want to.  We have taken the lowest level categories for our default encoding.

In some cases, the encodings are have ranges that are not further broken down, such as the case where 049-052 corresponds to nuclear engineers.  In cases like this, we wondered if there were any difference between 49, 50, 51, and 52.  So we ran some analyses to find out (below).

In some cases, the opposite situation is true, where multiple labels are given to the same code.  For example, in Ancestry, the code 005 is given to BASQUE, Euskalduna, and Euzkadi.  In cases like this, we had to choose one label to go with the code.  Except where obvious errors were found (more on that below), we chose the first label given for any particular code.  Of course the customer can relabel them as they wish, just as they can recode.

Wherever codes for apprentices are given (in Occupation), there is a high level range for the profession, and there is a portion of that range given to the apprentice.  But the other portion of the range is not specified.  It would seem that the non-specified portion belongs to the professional, and the specified portion belongs to the apprentice.  But we'd rather not make assumptions, especially when we have tools that can provide answers.

In order to address our concerns (Are the ranges at the lowest level meaningful?  Did we introduce any errors when we picked out the lowest encodings?  Are there errors in the documentation?  What's going on with the apprentices?), and also to show our chosen default encodings to the customer, we have run nationwide tabulations for each of the above 6 fields. 

For each field, we accumulated weighted person count (Pwgt1), unweighted person count (People), and weighted household count at the person level (PHouseholds).  We dimensionalized each based upon both the numeric codes and the labels. 

The results allow us to:

  1. Verify that our labels match the proper codes
  2. See if ranges within a low level label have meaning (i.e. Is more than one number in the range used?)
  3. Explore potential errors in the Census Bureau documentation
  4. Look into the "apprentice situation".
  5. Have useful tabulations for future reference.

Here is what we've found:

  1. Our labels match properly.
  2. The ranges (at the lowest level) do not have meaning.  Only one number in each range is actually used.
  3. There are a few errors in the Census Bureau documentation.

Errors:

  1. The "industry" documentation specifies that the range 0-10 refers to "Agricultural production, crops".  There are nearly 100,000,000 people coded as '0', and nearly 2,000,000 people coded as '10'.  We suspect that the people coded as '0' do not have an industry associated with them.  Given that this is well over 1/3 of the nation's population, we suspect that they are mostly children and retired people.  But just to check, we will run an analysis to find out what the income of these '0' coded people is.  The documentation should read that people coded as '10' are involved in agricultural crop production.
  2. The Ancestry encodings were corrupted when we received them, so we looked at another source.  There were a few errors in that source also.  "BELOURUSSIAN" was coded as '02', when it should have been '102'.  "Webel Druze", (under Syrian) was coded as '329', it should have been '429'.  Both "West German" and "GREEK" are coded as '45'.  Some of the encodings are not in numeric order, which is inconsistent with all of the other Census Bureau files.  As a general rule, we have found errors from the Census Bureau to be few and far between.  But we are not comfortable with this file, so we run the regular tabulation (labels & numeric codes) to check our own work, then we run another tabulation which cross-references the data based upon Ancestry and Place of Birth.  We expect to see a high incidence of people born in a country which reflects their ancestry.  This settles the GREEK/West German question (45 is Greek), and also provides a resource to identify any other hidden errors (one was found - it's described here).

 

 


Home - Company - Contact - Terms - Sample Tabulations - Corporate Analyses - Run A Tabulation - Data Source - Household Data - Person Data - Free Offer Details

Copyright © Innovative Computing, Inc. 2002

A time travel adventure, Time Changer is the story of Bible Professor Russell Carlisle (D. David Morin) who has written a new manuscript called "The Changing Times". His new work is about to receive the unanimous endorsement from his peers at Grace Bible Seminary until his fellow Professor, Dr. Norris Anderson (Gavin MacLeod), has a difficulty with something Carlisle has written that he feels will greatly affect the future. Using a secret time machine, Dr. Anderson sends Russell Carlisle over 100 years into the future to see where his thinking will lead. (99 min)
The above space is provided gratis, because Innovative Computing, Inc. believes that everyone should see this film.