How to use the Append tool

The Append tool joins data sets that originate from two different questionnaires. The input data sets typically include common respondents, but contain different questions.

It can be useful for bringing data together from multiple waves of tracking studies and other types of studies that target the same respondents more than once.

For example, consider a Readership study that first has a recruitment study, then a follow up study about a magazine.

The data sets compiled from these two questionnaires look like this:

Recruitment questionnaire

Wave 2 Data Fields

 

Using the append tool, you can append the data sets from these two waves into one. You need a data-element that can be used to match corresponding respondents in both datasets. This can be a numeric question, the Alternate ID used to start the interview, or the MI Pro Respondent ID (which is a unique GUID assigned to each interview).

You can also append data from a corresponding sample, and you can use the “lookup” version of the questionnaire, to add e.g. geographical data.

How to append data sets

  1. Create an Append tool in your workflow by selecting the “Append” button under “Insert new Tool”. Give it an appropriate name and description. You may alter this information later if desired.
  2. Click Edit in the Tool icon (on the left). A new dialogue opens.
  3. Under Input, specify the two datasets you want to append (labeled Dataset 1 and 2).


    Using the option buttons, you can specify two pre-defined datasets in Research Studio, or you can specify one or both of the datasets formed from the results of the previous tool in the workflow, or select corresponding sample datasets.

    To open the Select dataset dialog click … next to the dataset that you wish to specify. If the dataset you want is not stored in the current project, select the project from the project drop-down list (grey drop down list). Click OK to select the dataset.


    You may also specify any pre-filter to any data set in order to remove respondents from the append tool. To do this, click the … button next to the Filter text box to open the Filter Builder dialog.

  4. Under Output, specify where to send the results of the filter operation. To save the results as a new dataset available for later use (optional), click the Save result to database checkbox. In the top textbox, enter a name for the output data set. In the bottom textbox, enter an optional description for reference purposes. Click OK to close the Edit tool properties dialog.
  5. Under the “specification” tab, select a Type corresponding to the type of append you want to perform

    A ppend Type

    Description

    Match

    Results include only respondents that are common to both datasets.

    Merge

    Results include all respondents from both datasets.

    Master

    Results include only respondents that are present in the dataset specified as  number 1 in the Input and output tab.

    Lookup

    Results include only respondents that are present in the dataset specified as  number 1 in the Input and output tab. Duplicate data from dataset number 2 is applied to respondents in dataset number 1 based on a one-to-many correspondence between dataset number 2 and dataset number 1.

  6. Under Match, select the method by which the tool determines respondent matches between the datasets.

    Match Type

    Description

    Alternative ID

    Uses the alternative respondent ID field specified in both datasets to match respondents.

    Numeric question

    Uses a unique numeric field common to both data sets, such as a telephone number, to match respondents. Select this field from each dataset under the Dataset 1 question and Dataset 2 question drop-down lists.

  7. You may also specify what should happen if you have duplicates in any of the two data sets
  8. Save as draft and click the green “Run” workflow button. After the tool executes in the workflow, the output data set is listed under the Research Studio Analysis tab, and is available as input in any subsequent tool you create. Each time the tool executes it overwrites the previous output.
  9. When the tool has finished, a Success message is given.


  10. To see for yourself that the append operation has been successful; you may go directly to the topline and see the results.
  11. If you return to the data-sets tab and refresh the list you will see that the stored dataset is there. You may use the new data set as an input to next tool any new workflow elements that follow, regardless of whether you saved the result of the tool or not.

Best practice: After a successful run, save the workflow as a new version (and exit).

Two append examples

Regular append (Match)

The previous example should count as a “typical” match case.

  1. The respondent ID is stored as an alternative ID in the first data set
  2. The respondent ID is stored in a numeric variable

The task is to match only the respondents present in both data sets. 

The selection should consequently be:

  1. Type of match = match
  2. Respondent ID in data set 1 = alternative ID
  3. Respondent ID in data set 2 = numeric question RespID

Note: Research Studio can only match on numeric variables. Best practice is therefore to always have a unique numeric variable in each data set that is simple to match on. An altID can also be non-numeric (e.g. an e-mail address) hence if both data sets have the same altIDs you can match the data sets on non-numeric variables.

Lookup append (hierarchical match)

Sometimes you will want to append data in a hierarchical way, i.e. data in 2nd data set can be applied to several cases in the first data set.

Let’s have a look at the same data sets again:

In the second data set the respondent is the “post-code”. This corresponds with the Zip-code from the first data set.

Best practice should therefore be to have a standard geographical data set with standard categorizations for analysis.

Readership data (with demo)

Wave 2 Geographical data set

 

As in the previous append the selection should consequently be:

  1. Type of match = LookUp
  2. Respondent ID in data set 1 = Zip code
  3. Respondent ID in data set 2 = Postcode

The second data set will be appended the first with one-to-many.