This exercise demonstrates how to use some of the most frequently used tools in Alteryx. It is highly recommended that all users become familiar with each of these tools.
Step 1 - The first step when building a module is to bring in the data you want to analyze. In this module we will continue to process the sample address data. Choose the Input tool and drag and drop it into the module.
- Input File: Browse to the CSVGeocode_example.csv located in (\Program Files\Alteryx\2012 \Samples\SampleData).
Step 2 - Since there is no unique ID for the data we will generate a record ID dynamically. Drag and drop a RecordID tool directly below the Input tool. The tools should connect together automatically. Please review the setting for this tool. We will use the default setting for this example.
Step 3 - The next task is to check for any duplicates in the data. Drag and drop a Unique tool directly below the RecordID tool. The tools should connect together automatically.
Step 4 - In the 'properties' dialogue box for the Unique tool, check the boxes for 'CONAME' and 'ADDR'. This means that we will group all of the records that have the same values in both checked fields. The first record in the grouped data will be returned out of the unique side: all the others will be returned out of the duplicate side.
Step 5 - Place a Browse tool under both sides of the unique tool and run the module.
Step 6 - Click on the Browse tool connected to the duplicate, 'D', side of the Unique tool. There are nine records in the data set that are duplicates.
Step 7 - The next task will be to get some quantitative information about the data. One thing that we can quantify easily is the number of unique business records by ZIP Code. To do this drop a Summarize tool directly on the line below the unique, 'U', output arrow that connects to the Browse tool.
- In the properties highlight the field for 'ZIP', click 'Add' and select 'Group By'.
- Click 'Add' again and select 'Count'.
- This will return the count of records in each ZIP Code. The 'Actions' should look like it does below.
Step 8 - Run the module and select the Browse tool connected to the Summarize tool.
Step 9 - The next process is going to allow us to connect the real geography to our .csv data so that we can see where the data in our list is located at the ZIP Code level.
- Drop an Allocate Input tool, next to the Summarize tool. On the configuration for the Allocate, select the 'Pick Geography' tab and check the box under the 'ZIP Codes by County' for the state of Massachusetts.
Step 10 - Drop a Browse tool under the Allocate Input tool and run the module. Compare the data coming out of each tool. The important fields are the 'Key' field on the Allocate input and the 'ZIP' field on the .csv file.
- To best compare the results of two Browse tools click 'Options' in the properties and choose 'New Window'.
- Repeat step a above with the Browse tool attached to the Summarize tool.
- You will notice that the data in the 'Key' and the 'ZIP' field are slightly different even though both fields represent ZIP Codes. This is because the ZIP field is missing leading zeros.
Step 11 - Add a Formula tool below the Summarize tool. This tool will allow the ability to alter the values of the 'ZIP' field as shown in Exercise 3, part 2.
- From the 'Output Field' select 'ZIP' from the drop down.
- On the 'Functions' tab expand the 'String' section.
- Double click on the function 'PADLEFT(String, len, char)'
- Highlight the word 'String' and click on the variables tab. Drop down the existing variables selection and double click on 'ZIP'. Replace 'len' in the function with the digit '5', and 'char' with '"0"'
Step 12 - Now that the data is prepared we can join these data sets together and view the ZIP Codes our customers are located in on a map. Drop a Join tool below the two data process streams. Configure the properties for the Join tool.
- Choose the Join by Specific Fields radio button (selected by default).
- Select 'ZIP' from the field drop down for the Left input.
- Select 'Key' from the field drop down for the Right input.
- Uncheck the box for the left 'ZIP'.
- Place a Browse tool under the center of the Join tool. Run the module and view the data in the Browse tool.
- Notice the field with the header 'SpatialObj'. This field contains data necessary to place the information on a map
Step 13 - You can also view summarized ZIP Codes that joined to the list from the .csv data file on a map, choose the map tab in the Browse tool to view the geography associated with the data. Notice that the unmatched ZIP Codes didn't show up in the map.
Step 14 - In order fill in the gaps to make a more complete map, the data that didn't join needs to union back to the data that had matches to join to.
- Drag and drop a Browse tool and connect it to the right side of the join. Run the module and compare the outputs on each browse window.
- Draw a box around the two Browse tools and press delete.
- Select a Union tool and drop in place of both Browse tools. Make sure that the center and right sides of the Join tool are attached to the Union tool.
- Drop a new Browse under the union and run the module.
Step 15 - Now that we have used the spatial object, we no longer need the spatial data downstream. To remove the data field, drop a Select tool below the center of the Join tool. Set the properties:
- Uncheck the box next to the value for 'SpatialObj' in the 'Field' column.
- In the 'Rename' column for the 'Key' field to 'ZIP Code'.
- Place a Browse tool under the Select tool and run the module to see how the data fields were changed.
Step 16 - Use a Sort tool to rank all the ZIP Codes by the number of businesses. Drop the tool on the connection between the Browse tool and the Select tool.
- On the properties select 'Count' from the 'Name' drop down and 'Descending' from the 'Order' drop down.
- Run the module to see how the data fields have changed.
Step 17 - To view only the top ten ZIP Codes in terms of number of business, drop a Sample tool between the Sort tool and the Browse tool.
- In the 'Properties' windows select first radial button, and in the 'N=' box type 10.
- Run the module to see how the data fields were changed.
Step 18 - To calculate the total for the number of business records by zip code you would use the Running Total tool. Drop a Running Total on the connection line between the Sort tool and the Sample tool.
- With the Running Total tool still selected click the F1 key and read about how it works to get a better understanding of the tools function.
- On the configurations don't select any grouping. Check the box for Count in the 'Create Running Total' box.
- Run the module to see how the new field was calculated.
Step 19 - Save the module to the desktop for use in a later exercise. Click 'File' and select 'Save As'. Browse to the desktop and save the module using the name 'BasicTools' for easy reference later.