How to use the Match tool
If you have two (or more) data sets from different waves of data collection, you will probably want to merge them together. This data might be results from different waves in a tracker, or data sets from different countries with the same wave.
This can be quite tedious work, resulting in unintended errors. With Research Studio you have access to a tool that can help you in a very efficient way.
The Match tool enables you to join two data sets in two ways:
- Automatically
- Manually
How to match data sets
- Create a Match tool in your workflow
- Under the Match tool name in the Workflow window, click Edit. The Tool details window opens on the Input and output tab.
- Select “Edit” in the tool icon on the left, and a new dialogue open where you can specify the two data sets you want to match. You can specify two pre-defined datasets in Research Studio, or you can specify one or both of the datasets formed from the results of the previous tool in the workflow.
- If you choose the top option button for either dataset 1 or 2, click … to open the Select dataset dialog. If the dataset you want is not stored in the current project, select the project from the Current project drop-down list. Then, click OK.
- Under Output, specify where you wish the results of the match operation to go. You may save the results to a database or use them as the input to another tool. To send the output to the next tool in your workflow click OK. To save the results as a new dataset available for later use, click the Save output to database checkbox (optional). In the top textbox, enter a name for the output. In the bottom textbox, enter a description for reference purposes (optional). After the tool executes in the workflow, the output data set is listed under the Data sets tab, and is available as input in any subsequent tool you create. Each time the tool executes it overwrites the previous output.
If you run the
- Select OK. The match tool dialog will open.
- The questionnaire corresponding to dataset 1 opens on the left side of the window, and the questionnaire corresponding to dataset 2 opens on the right. In the middle, are icons indicating the extent to which a match already exists between the corresponding elements.
- In order for the merge to succeed, the two datasets must be a reasonable match. In the following example, the two datasets must match before they can be merged. The following is an explanation of the match icons:
Symbol and message Description The structure of the corresponding elements matches between datasets.
Note: A structural match between questionnaire elements does not necessarily indicate that the elements match identically. Review for inconsistencies.
The number of rows, columns or subquestions in the corresponding question does not match between datasets. Although the merge tool can accept this discrepancy, you may want to examine the rows to see what is missing, then correct the discrepancy using the Filter & Create tool. The structure of the corresponding elements does not match between datasets. - Resolve any match discrepancies uncovered in the previous step. Place checkmarks in the checkboxes beside the elements of either database that you want to manipulate. Use the expand (+) icons to view sub-elements. Then use the following icons to perform any of the following operations on elements and sub-elements:
Symbol Description Toggles between enabling and disabling the selected element. Disabled elements do not appear in the output. Moves the selected elements up or down one row. Moves the selected elements to the row specified in the Move to textbox. If more than one element is selected, the top-most element that is selected is moved to the specified row, the second-from-top-most element is moved to the row directly underneath, and so on. Moves the selected element, and all rows underneath it, as specified. Rearranges the position of slave elements to match the master elements based on corresponding IDs or question labels and texts
Automatic match
The easiest form of matching two data sets is to do the automatic match. This can be done if one of the following is present:
- The data sets have (more or less) identical structures, i.e. minor discrepancies
- One of the data sets was created based on the other, i.e.
- It has the same Unique identifiers
- It has the same question texts / question labels
If one of these conditions are present, follow these simple steps:
- Specify the two data sets
- Select which automatic rearrange you want to use:
- Click “Go”
The best possible match between the two data sets is now shown in the user interface.
The “green” icons show that you have a full match.
The yellow icons show matching questions where Research Studio has made adaptions to get the best match. In the case of question 5 above we can see that the columns count failed. This indicates that there are discrepancies regarding the structure of the question. This should always be looked at more thoroughly to resolve the discrepancy.
To resolve column count issues:
- Select the row where the warning occurs
- Select the Columns tab to see the discrepancies. In this case we can see the following:
- The number of columns in question 5 in data set 1 is 8
- The number of columns in question 5 in data set 2 is 4
- Research Studio has made adjustments and has inserted “blank” columns for the missing categories.
When Research Studio has finished, and you have made the necessary inspection, save the tool as a draft and click the green “run workflow” button.
You may then inspect the merged data set directly from the “view topline / frequencies”.
Manual match
If it is difficult (or impossible) to get a sensible result from an automatic match, you can use the match tool to do the matching manually.
In order to perform this, you can do the following:
- Select the question(s) on either side and move them to the appropriate destination line for this question. You may choose to use the up/down arrows, the Move To box or the Rearrange tool. In the example above, we have used the Move To box to move Label 1 to line 1. The result will then be as follows:
- Label 1 has now reached its final destination. In a similar fashion, you can move questions on both sides until the questionnaire is matched correctly. The icons will turn green as you go along, or if questions have been altered, the yellow icon will appear. Always check that the question labels are the same.
- If you select more than one question at the same time, such as a series of questions that are to be clustered, you can select these and move them as described above.
- The first question will be at the destination line
- The subsequent questions will come after
- The question originally placed in the destination row will now be moved AFTER the newly moved questions
- The last option is to do a “full move”. In the example shown below, question 5 should go to line 5, question 6 to line 6 and so on. In this case use the Full Move option:
- Check ONLY the first question in the sequence
- Check the “Full move” check box
- Click “Go”
- All questions from Label 5 and below will now be moved to line 5.
Note that the Research Studio match tool will place a “blank” question to maintain the data structure where the new question has been placed.
Manual insertion of sub-questions, rows or columns
The same techniques used to move questions can also be used on other question items. In the case of question 5 above, this should be done as follows:
- Check the first row to be moved.
- Check “Full move”
- Click “Go”
- Code 2 is moved to line 3
- Repeat the “Full moves” until all the lines are in the correct location.
- When all manual matching has been done, save as draft, then click the green “run workflow” button.
- “Save as new version” and “exit”
Creating new questions
Creating new questions or variables in the data set is a two-step process:
- Create the empty “shell” with labels and structure
- Clean data into the “empty shell”.
It is possible to use the clean and define tool to perform these two steps. A better way for most situations is to use the questionnaire designer to create the empty “shells” and then use the clean and define tool to fill in variables. This is the method described here.
Create empty questions
In our “Readership study” example, we have a numeric question where respondents have provided three most preferred articles and three articles they preferred least. Our target is to create a new multiple-choice question using the same labels as in question S1.A, and then fill these with the figures mentioned in questions S2.A and S2.B.
- Create the new questions in the questionnaire editor. Best practice is to create a new questionnaire based on the data set. This way you may then reuse all labels in the question.
- When you’re finished with the new questionnaire”, save it and publish it as a normal study
- Create a new match tool
- Select the dataset, in our example the readership data set, as the master questionnaire
- Select the EMPTY data set created from the publishing of the new variables as the second data set
- Give the new data set an appropriate name and description
- In the match tool rearrange the questions automatically – or move the questions from data set 2 to the bottom of the list as shown above
- Save and run the tool.
- A new data set is created and this can be cleaned with new data using the Clean and define tool.
Re-label a questionnaire using the questionnaire matcher
This procedure can also be very useful if you want to re-label a data set.
- Create a new questionnaire based on a data set
- Publish the new, altered questionnaire normally
- Match the two data sets using the new data set as primary data set
- Run. The labels from the empty primary data set will now be applied to the secondary data set and stored under a new name.