Can you merge different data sources into one report?
There’s more data every year and more information…Read more
There are two types of hierarchical data encountered in market research. These are respondent based hierarchies and data based hierarchies. In practice, they are analysed in similar ways, but, more importantly, they need software that is capable of analysing hierarchically structured data.
This blog article explains what hierarchical data is, what software you need to analyse such data and, finally, some solutions to the task.
A good example of a respondent-based hierarchy would be a doctor/patient survey where you are surveying a doctor and some of his patients. In such a case, there would be two levels of data. There would be data relating to the doctor and a variable number of data records relating to each patient. For example, doctor data might include the type of practice, the region in which the doctor worked, attitudes to new techniques etc. The patient data might include the person’s age, gender and the length of time he/she had been visiting the practice, the frequency of visiting the practice etc.
A good example of a data-based hierarchy might be activities that someone does. If you are conducting a survey of someone’s eating out behaviour, you are likely to have respondent data and, perhaps, occasion based data. For example, the respondent data would contain details of the respondent’s age, gender, income etc. There would then be occasion based data for each eating out occasion.
There are occasions where both types are present, such as the three-level hierarchy of doctor, patients and drugs prescribed hierarchy. The doctor/patient data would be a standard respondent-based hierarchy, but each patient might have any number of drugs that are prescribed, each with different dosages, regimens and frequency, for example.
In practice, both types of hierarchy are the same. Data from the higher level is applicable to the lower level in the hierarchy, yet the reverse is not true. Each patient for a specific doctor will gain the attributes of that doctor – the region the doctor works, his specialty, his attitudes to techniques etc. The same is true for data-based hierarchies. For each eating out occasion for a specific respondent, the data relating to a respondent will be applicable. On the other hand, each eating out occasion is independent of other eating out occasions. The difference may be that respondent-based hierarchical data may be stored as a series records or in two or more data files whereas data-based hierarchical data may be embedded in a single record, though this is not always true.
There are three main options – using Microsoft Office products, using research software that mainly handles one respondent per record data and data that has full functionality for processing and tabulating hierarchical data.
Most survey analysis packages do not allow you to analyse hierarchical data. They work on the principle that there is one record for each respondent. Whilst Excel cannot help you to analyse hierarchical data unless you program Excel using VBA or recode data to multiple worksheets, Microsoft Access does understand data hierarchies. Access refers to hierarchical data as one-to-many relationships. Whilst it can manage the data, it will have limited capabilities to perform tabular analysis, particularly if it is complex. Again, VBA or recoding make this possible albeit cumbersome. Hierarchies mainly exist in Access to manage reports rather than tables.
Some software packages have the capability to produce tables based on occasions, for example. However, it may be a laborious task. If there are, for example, up to 10 eating out occasions, you may need to add data from 10 variables together to produce the one table that you want based on all eating out occasions. If this principle needs to be applied to many tables, this can then become a lengthy process. Snap and QPSMR are similar in their capabilities in this area and have tools to manage smaller or simpler hierarchies.
There may still be a problem though if you wish to process data and apply calculations to a higher level in the data. What does that mean? For example, let’s say you want to find out what percentage of the eating occasions for each respondent were in a fast food restaurant. This would mean that you need to sum the total occasions in a fast food restaurant and divide it by the number of occasions in total. The number of occasions would vary from respondent to respondent, so a calculation would have to be performed. At this point, many software tools struggle. It may be possible to output data to Excel, for example, make calculations and paste or import the data back to the main data file. However, this starts to become time consuming especially where there are a lot of variables as well as being prone to error and generally cumbersome.
You are left with very few software products that can manage hierarchical data by processing it efficiently. By efficiently, this would mean that the software would need to have the capability to read repetitive records or blocks of data without having to repeat specifications. Specifically, this means that to process, say, up to 20 eating out occasions per respondent, it means that it is approximately 20 times as much work to produce an occasion based table as a standard table. Similarly, if you want to calculate information by reading the hierarchical data as a set, this should be a simple process and not require recoding, data exports and imports or other complexities. Specifically, if you wanted to get the total cost of all eating out occasions, this should be simple or if you wanted to calculate the percentage of eating out occasions that are at fast food restaurants, this should be a simple task.
MRDCL is not the only solution, but it is one of very few packages that can handle this type of task well. It tends to the more established products like MRDCL, Quantum and Merlin that are needed for such tasks. Or, at least, if it is to be handled efficiently.
MRDCL offers a unique solution for allowing researchers and analysts to handle tabulations easily. The skilled part of the process is managing the data and it requires a more advanced product like MRDCL, Quantum or Merlin. However, MRDCL allows you to process data and then provide the data for analysis in Reflect, which is a free software product that understands hierarchies and allows you to produce tables.
This means that you can buy services from MRDC Software or any other user of MRDCL and then produce tabulations yourself using easy to use interactive tabulation software. It means that you are splitting the skilled parts of the data processing from the less skilled parts and means that you can produce as many tables as you wish easily.
If you want information or advice, please contact me. I will be pleased to advise and help.