A scattergraph utilizes dots to represent two different numeric variables on a chart. The position of each dot on the vertical and horizontal axis is indicative of values for each point. Scattergraphs are employed to determine relationships between two variables. For instance, if you are presenting scattergraphs depicting the relationship between the diameter and height of a tree, the diameter value would be on the x-axis and the height is placed on the y-axis with each point being a single tree. A correlating scattergraph would show that the larger a tree’s diameter, the taller the tree. Outliers that are short for their diameter may need additional investigation.
When to Use a Scattergraph
The primary purposes of scattergraphs are to detect and highlight relationships between two variables. The dots report the values of specific data points but also show patterns when the data is analyzed in its entirety. Identifying correlating relationships is standard with scattergraphs. In these instances, you will want to know what a good production would be for a vertical value when given horizontal information. You typically will see the horizontal variable as an independent variable and the vertical variable as the dependent variable. These relationships can be strong or weak, positive, or negative, and linear or nonlinear.
A scattergraph can be beneficial for identifying additional patterns found in a set of data. You can categorize data points into groups based on their clustering. Scattergraphs also highlight unexpected data gaps and outlier points. This is incredibly useful when segmenting the data into various parts. With scattergraphs, you must select two columns for a data chart, each column representing a dimension within the plot. Each row from the chart equates to a single dot on the plot with its position relating to the column values.
When utilizing a scattergraph for correlational or predictive relationships between two variables, adding a trend line will show the best mathematical fit of the data. This delivers an additional signal to the strength of the relationship between two variables while highlighting unusual points impacting the trend line calculation.
Numeric Third Variable
Third variables with numeric values can be differentiated by changing the point size. Scattergraphs with point sizes based on the third variable is known as a bubble chart. Larger points equate to higher values. Color changes can also be utilized to indicate numeric values. Instead of using distinct colors for dots, you want to use a continuous sequence of colors so darker colors equate to a higher value.
A typical modification when creating scattergraphs is to add a third variable that can modify how the points are plotted. Third variables can indicate categorical values with the most common differentiation being the color change. Assigning each point, a different color shows the membership of every point to their group. One additional option for third-variable categorization is to change the shape. These various shapes can have different surface areas and sizes, which impact how groups are perceived and their importance. This is also an excellent option in instances where colors cannot be used to differentiate the third variable.
Scattergraphs are a basic charge time that can be created by any visualization tool. Coloring points according to their third, categorical variable and computing a basic trend line are common options. Even without these additions, presenting scattergraphs is a valuable tool when you must investigate the relationship between variables within your data. If you need to present this information to senior management, a scattergraph is a much easier way to visualize option than just running through the raw data.