MAST, OLAP, and DATA MINING
The purpose of this document is to compare and contrast MAST (Multi-dimensional Analytical and Simulation Tool, which is the tool used by Innovative Computing and SliceAndDiceData.com) and OLAP (the category of multidimensional tools commonly used).
Also, because many of the issues in OLAP apply also to Data Mining tools, many of the points in this document apply equally well to both OLAP and Data Mining tools, when compared to MAST. A short discussion of MAST and Data Mining tools is found at the end of this document.
Similarities:
- Both MAST and OLAP are centered on multidimensional analysis. Both allow slicing and dicing of transactions, which is usually an aggregation of variables based upon combinations of dimensions.
For the purposes of this document, we’ll use the terms ‘variable’ and ‘dimension’ from Thomsen’s book OLAP SOLUTIONS. A variable is the number that is collected or aggregated. A dimension is one of the attributes that is used to uniquely identify one instance of the variable. For example, if you were analyzing dollars spent in a grocery store, dollars would be the variable, and dimensions would be things like customer, product, store, time of day, etc.
- Both support hierarchical dimensions (recodes).
Differences:
- MAST allows a great deal of flexibility in the creation of new dimensions and variables. OLAP is quite limited in it’s ability to create variables, and has no ability to create dimensions (in other words, the user is
restricted to the dimensions that were planned beforehand).
Because MAST works with atomic transactions (instead of OLAP style hypercubes), there are no omnipresent dimensions or variables. The user decides what dimensions and variables are desirable, and how they should be represented. If a new dimension or variable is needed that was never needed before, it is often a simple task to create. If it was used before, it has a name, and the user only need reference that name in their request. Also, if certain variables and dimensions are not wanted for a particular analysis, MAST is not hampered by them, so they do not slow down the processing.
MAST allows the creation of new variables and dimensions in a variety of ways, without altering the input data. Created variables and dimensions become ‘virtual data’. They may be referenced as if they were stored in the input file, while MAST actually creates them on the fly.
As a very simple example of virtual data, suppose you’re analyzing products sold in a department store, and you have a product code stored in the input file. You want to analyze based on color, which is not stored in the input file. Assuming that color can be derived from the product code, the MAST administrator can create a ‘color’ dimension in a few minutes, which will always be available for future use, and it will appear to you that ‘color’ is now stored in the input file. Virtual data derivations can get extremely complex, depending on the business needs, and while there’s a work-around in some OLAP tools for this simple ‘color’ example (provided that it was pre-planned), OLAP cannot handle the more complex situations.
Even simpler in MAST (because the administrator is not involved) - but impossible in OLAP or Data Mining tools - is the creation of a high level variable (household or account level) based on collections of low level variables (person or transaction level). For example, the MAST user can easily create a variable that categorizes a household based upon the number of female wage earners over 32 years old. Or MAST can just as easily categorize an account based upon the volume of transactions over $35 apiece that were executed in Denver.
- MAST works with very large data volumes - potentially billions of transactions. OLAP works with relatively small data volumes.
The OLAP tools that work with the largest data volumes are known as ROLAP tools, because they extract their data from relational databases. One of the reasons that MAST was developed was because relational databases were unable to handle the volumes of data that were being analyzed. Therefore it seems safe to say that the largest capacity ROLAP tool could not handle the data volumes which MAST can handle.
- MAST has extensive capabilities to create low-level output files. OLAP appears to have no such ability. Given that OLAP tools work from aggregates, significant low level output file creation would be impossible.
For example, suppose that you would like a list of all customers who live in Connecticut, spend over $50 per month on their phone bill, and at least $15 per month calling England between the hours of 6 pm and 12 midnight on weekdays. MAST can do that easily. Or, suppose that you had the following crisis: you created a special promotion with a complex pricing scheme. It’s almost time to print the bills, but the billing software can’t handle the promotion yet. You can run the calling data through MAST, perform a simulation which reflects the rules of the promotion, and have MAST write the transactions out with the result of the simulation on each transaction. Then run that file through your bill printing program.
- MAST has considerable simulation capabilities. While OLAP has some ability, it is not extensive.
With MAST, you can apply rates to every transaction, based upon any combination of dimensions and variables. The dimensions may be customer or detail level. For example, you might want to give a person who spent over $10 a different set of rates than one who spent less than $10. You may also create simulations based upon other simulations, or any virtual or real data. When those abilities are combined with MAST’s extensive ability to create dimensions and variables, it is difficult to imagine simulations beyond MAST’s capability.
- MAST has considerable ability to perform iterative analyses in a single run. OLAP has limited ability to perform iterative analyses.
Once MAST has derived a variable or a dimension, that data item becomes a piece of virtual data, which can be used as if it always existed in the data. It may then be used to perform an analysis or simulation, the result of which becomes another piece of virtual data, which in turn can be used to create more virtual data. This allows the user to repeatedly build upon their ideas in a single run. Many derived variables and dimensions may never be displayed, but are created by the user as stepping stones to get to their ultimate destination. While some OLAP tools have some capability to perform iterative analyses, it is our understanding that their ability is fairly limited. Because MAST performs much of it’s iterative analyses based upon virtual data that the OLAP tools couldn’t create anyway, this may be a moot point.
Conclusion (OLAP & MAST)
OLAP has been designed to allow interactive viewing of data. MAST has been designed to convert a mountain of data into useful information. Both tools use multidimensional analysis to solve their respective problems. It seems that MAST is the better tool for preparing data, and OLAP is the better tool for viewing data. Historically, aggregates from MAST have often been fed into a spreadsheet program. It seems that OLAP tools might be best suited for viewing multidimensional aggregates that are built by MAST.
MAST and Data Mining
It has been said that 70-90 percent of the time required for data mining is dedicated to file preparation. This is because there are often derived data items that are required in order to mine the data. For example, suppose that a user wants to mine the data based upon whether or not people call San Francisco between the hours of 5 pm and 10 pm. If so, a programmer has to write a program to derive this information for each account, then create new transactions that contain this newly derived information, then write those transactions out. Then the data mining tool must be reconfigured to accept the newly designed transaction, and mine based on the new data item. This is all very time consuming and expensive, and as a result, data mining probably isn't used as often as it should be.
Much of this document discusses MAST's ability to create new data items 'on the fly'. Whether they are as the example given above in data mining, or if they are simulations, or if they are some other type of easily-created virtual data item. MAST can not only create these virtual data items on the fly, but MAST can also format a new transaction and write these new data items out to a low-level file. That file could then be fed into the data mining tool, thus nearly eliminating the 70-90% file preparation time. Not only that, in many cases the data miner may find that MAST itself can tell them what they are looking for, and in those cases there would be no need to use the data mining tool at all. In either case, MAST gives greater and more flexible insight into the data than was ever possible before.
Copyright © Innovative Computing, Inc. 1998, 2000, 2001, 2003, 2004
Home -
Company -
Contact -
Terms -
Sample Tabulations -
Corporate Analyses -
Run A Tabulation -
Data Source -
Household Data -
Person Data -
Free Offer Details
|
 |
 |
 |
 |
| A time travel adventure, Time Changer is the story of Bible Professor Russell Carlisle (D. David Morin) who has written a new manuscript called "The Changing Times". His new work is about to receive the unanimous endorsement from his peers at Grace Bible Seminary until his fellow Professor, Dr. Norris Anderson (Gavin MacLeod), has a difficulty with something Carlisle has written that he feels will greatly affect the future. Using a secret time machine, Dr. Anderson sends Russell Carlisle over 100 years into the future to see where his thinking will lead. (99 min) |
 |
| The above space is provided gratis, because Innovative Computing, Inc. believes that everyone should see this film. |
|