Codecademy Logo

Principles of Data Visualization

Data Visualization Design

Data visualizations rely on effective, informed design choices to avoid being unintentionally misleading or confusing. Selecting the right chart type, including thoughtful annotations and title, and making appropriate use of color will all help to make charts that communicate clearly and accurately.

Data visualizations need appropriate axes to be truthful and legible.

Data visualizations need appropriate axes to be truthful and legible. This means avoiding decontextualized breaks and setting the right number of axis ticks – neither too few (numbers are hard to interpret) nor too many (axes are cluttered).

Two bar charts side by side. Both are titled "Event Attendance", with "Number of Attendees" on the y-axis and "Event Date" on the x-axis. Each chart has three bars (yellow, blue, orange) in increasing height from left to right. The difference is the y-scale on each chart. On the lefthand chart, the scale goes from 0 to 150 in intervals of 50. The bars show 100, 105, and 110, so are all clustered near the 100 line. The bars look relatively similar in height. On the righthand chart, the scale starts at 0 but has a break (shown by a zigzag in the axis). The numbers pick back up at 100 and increase by 5s, so the axis ticks are 0, 100, 105, 110. As such, the heights of bars representing those numbers (100, 105, 110) stretch over the whole vertical space of the graph. The bars look relatively much more different in height than in the left graph.

Data Visualization Scaling

Data visualizations need appropriate scaling to be truthful and legible. A linear scale (where numbers proceed by constant intervals) is almost always the best choice. Logarithmic scales (where numbers proceed exponentially) often cause confusion and should only be used with audiences who are very familiar with reading them.

Recall the example of Purdue pharmaceutical company using a misleading logarithmic scale to minimize the addiction risk of opioid painkillers.

Two line graphs side-by-side. The lefthand graph is titled "Painkiller prescribing information, Linear y-axis." The x-axis shows "Hours from dosing" from 0 to 12. The y-axis shows "Concentration of painkiller in bloodstream" from 0 to 140, at evenly-spaced intervals of 20. There are 5 lines representing different doses of the drug. The three lowest doses (10, 20, and 40 mg) are relatively similar and never reach above a concentration of 40 in the bloodstream. The two highest doses (80 and 160 mg) show significant spikes in concentration, reaching up to concentrations of 90 and 120 (2 or 3 times more than the lower doses). The righthand graph is titled "Painkiller prescribing information, Log y-axis." The x-axis shows "Hours from dosing" from 0 to 12. The y-axis shows "Concentration of painkiller in bloodstream" from 0 to 100 on a log scale. This means the axis runs from 0 to 100, with 10 about halfway between those two numbers. As such, all the lines appear more or less flattened out, and all of them are clustered nearer to the center of the graph. It's still obvious that higher doses result in a higher concentration in the bloodstream, but the big spike that is clear in the linear graph is completely invisible on the log scale.

Color Associations

In data visualizations, color associations pull on both helpful prior knowledge or harmful stereotypes. We tend to view darker colors as “more” and lighter colors as “less.” Color associations can also be culturally specific (for instance, red means “bad” or “stop” vs. red means “lucky” or “prosperous”), or influenced by the norms for a particular field (red means “negative financial balance”).

Color Palettes

When creating data visualizations, it’s essential to choose the right color palettes to ensure truthfulness, legibility, and accessibility. This involves correctly implementing sequential, diverging, or categorical color palettes and ensuring that there is proper color contrast in your visualizations.

Sequential color scale: light blue, medium blue, dark blue. Diverging color scale: orange, light warm gray, medium blue. Categorical color scale: orange, deep purple, light green.

Data Viz Labels

Titles, labels, and annotations are essential for clear and accessible data visualizations. They provide context, making it easier for viewers to understand the chart’s contents and purpose.

Bias in Data Visualizations

Misleading charts often arise from conscious or unconscious bias. Following sound design principles in data visualization reduces the potential for bias. Clear labeling and unbiased data representation are key to maintaining integrity. A well-designed chart not only informs but also builds trust with the audience.

Data Visualization

Data visualization is a powerful technique for conveying data insights visually. Using graphs simplifies the understanding of complex data sets by highlighting trends, patterns, and anomalies. It makes the data more accessible to audiences without requiring them to analyze raw numbers.

Visualizing Data Types

Different chart types, such as bar, line, and pie charts, offer unique ways to visualize relationships in datasets. Selecting the right type illuminates specific patterns or comparisons in the data: for example, a line chart is a great way to show change over time, since the continuous line mirrors the continuity of time (as it is conventionally experienced and understood).

Data Visualizations

Bivariate and multivariate data visualizations represent more than one variable of interest. Bivariate visualizations compare two variables, while multivariate visualizations handle three or more. For example, scatter plots and single line charts are bivariate charts, while bubble charts, multi-line charts, and stacked or grouped bar charts are all multivariate charts.

Univariate Data Charts

Univariate data visualizations depict a single variable and show characteristics like distribution, central tendency, or variability. Common examples include histograms, displaying frequency distribution, and boxplots, visualizing data spread and identifying outliers.

Data Visualization Insights

Data visualizations reveal relationships in data through visual properties like position, shape, size, and color. Effective use of these properties makes it faster to identify correlations or data patterns in visualizations. For example, color is commonly used to differentiate categories, size might indicate magnitude, and position can show patterns or trends.

Information Redundancy

Information redundancy is a technique that uses multiple visual cues to convey the same information. This enhances readability and makes visualizations universally accessible. By communicating data through multiple different forms like text, color, and shapes, we ensure that the audience is more likely to correctly and easily interpret and understand the information. Universal design principles underpin this practice, promoting better organization and prioritization of data.

Data Visualization Levels

Adjusting data visualizations to the intended audience enhances understanding. Tailoring the complexity to fit the audience can be helpful, since general audiences often benefit from straightforward graphics, and experts may appreciate detailed charts.

Vision Accessibility

When designing data visualizations, it’s crucial to consider vision accessibility. This involves using colorblind-friendly palettes, ensuring readable fonts, and providing alt text for non-decorative visual elements. Color palettes must be checked for contrast and sufficient value difference. Recommended font practices include using sans-serif types and maintaining a minimum font size of 12-14pt for increased readability.

Annotations in Data Visualization

Annotations on data visualizations enhance the viewer’s understanding by providing context. They may highlight trends, indicate outliers, or explain interesting or unusual points in the data. They guide the viewer’s interpretation of the data, so should be used thoughtfully.

Learn more on Codecademy