This post is also available in: हिन्दी (Hindi) العربية (Arabic)
Data – a collection of facts (numbers, words, measurements, observations, etc) that has been translated into a form that computers can process.
Irrespective of the industry or interests, you will almost certainly have come across a story about how “data” is changing the face of our world. It might be part of a study helping to cure a disease, boost a company’s revenue, make a building more efficient or be responsible for those targeted ads you keep seeing.
In general, data is simply another word for information. But in computing and business (most of what you read about in the news when it comes to data – especially if it’s about Big Data), data refers to information that is machine-readable as opposed to human-readable.
What is Data Visualization?
Data visualization is the practice of translating information into a visual context, such as a map or graph, to make data easier for the human brain to understand and pull insights from. The main goal of data visualization is to make it easier to identify patterns, trends, and outliers in large data sets. The term is often used interchangeably with others, including information graphics, information visualization, and statistical graphics.
Data visualization is one of the steps of the Data Science process, which states that after data has been collected, processed, and modeled, it must be visualized for conclusions to be made. Data visualization is also an element of the broader Data Presentation Architecture (DPA) discipline, which aims to identify, locate, manipulate, format, and deliver data in the most efficient way possible.
Data visualization is important for almost every career. It can be used by teachers to display student test results, by computer scientists exploring advancements in Artificial Intelligence (AI), or by executives looking to share information with stakeholders. It also plays an important role in Big Data projects. As businesses accumulated massive collections of data during the early years of the big data trend, they needed a way to quickly and easily get an overview of their data. Visualization tools were a natural fit.
Visualization is central to advanced analytics for similar reasons. When a data scientist is writing advanced predictive analytics or machine learning (ML) algorithms, it becomes important to visualize the outputs to monitor results and ensure that models are performing as intended. This is because visualizations of complex algorithms are generally easier to interpret than numerical outputs.
Advantages of Data Visualization
When considering business strategies and goals, data visualization benefits decision-makers in several ways to improve data insights. Let’s explore seven major benefits in detail:
- Better analysis: Data visualization helps business stakeholders analyze reports regarding sales, marketing strategies, and product interest. Based on the analysis, they can focus on the areas that require attention to increase profits, which in turn makes the business more productive.
- Quick action: As mentioned previously, the human brain grasps visuals more easily than table reports. Data visualizations allow decision-makers to be notified quickly of new data insights and take necessary actions for business growth.
- Identifying patterns: Large amounts of complicated data can provide many opportunities for insights when we visualize them. Visualization allows business users to recognize relationships between the data, providing greater meaning to it. Exploring these patterns helps users focus on specific areas that require attention in the data so that they can identify the significance of those areas to drive their business forward.
- Finding errors: Visualizing your data helps quickly identify any errors in the data. If the data tends to suggest the wrong actions, visualizations help identify erroneous data sooner so that it can be removed from the analysis.
- Understanding the story: Storytelling is the purpose of your dashboard. By designing your visuals in a meaningful way, you help the target audience grasp the story in a single glance. Always be sure to convey the story most simply, without excessively complicated visuals.
- Exploring business insights: In the current competitive business environment, finding data correlations using visual representations is key to identifying business insights. Exploring these insights is important for business users or executives to set the right path to achieving the business’ goals.
- Grasping the latest trends: Using data visualization, you can discover the latest trends in your business to provide quality products and identify problems before they arise. Staying on top of trends, you can put more effort into increased profits for your business.
Different Types of Data Visualization
Some of the most common forms of data visualization are:
1. Area Chart
An area chart combines the line chart and bar chart to show how one or more groups’ numeric values change over the progression of a second variable, typically that of time. An area chart is distinguished from a line chart by the addition of shading between lines and a baseline, like in a bar chart.
An area chart is typically used with multiple lines to make a comparison between groups (or series) or to show how a whole is divided into parts. This leads to two different types of area charts, one for each use case.
2. Bar Graph
A bar graph (or bar chart) is a type of graph in which each column (plotted either vertically or horizontally) represents a categorical variable. (A categorical variable is a variable that has two or more categories with no intrinsic ordering to the categories. For example, gender is a categorical variable with two categories: male and female.) A bar graph is used to compare the frequency of a category or characteristic with that of another category or characteristic. The bar height (if vertical) or length (if horizontal) shows the frequency for each category or characteristic.
Bar charts have a similar appearance as histograms. However, bar charts are used for categorical or qualitative data while histograms are used for quantitative data. Also, in histograms, classes (or bars) are of equal width and touch each other, while in bar charts the bars do not touch each other.
3. Bubble Chart
A Bubble Chart is a multi-variable graph that is a cross between a Scatterplot and a Proportional Area Chart. Like a Scatterplot, Bubble Charts use a Cartesian coordinate system to plot points along a grid where the X and Y axis are separate variables. However. unlike a Scatterplot, each point is assigned a label or category (either displayed alongside or on a legend). Each plotted point then represents a third variable by the area of its circle.
Colours can also be used to distinguish between categories or used to represent an additional data variable. Time can be shown either by having it as a variable on one of the axis or by animating the data variables changing over time.
Bubble Charts are typically used to compare and show the relationships between categorized circles, by the use of positioning and proportions. The overall picture of Bubble Charts can be used to analyze for patterns/correlations.
Too many bubbles can make the chart hard to read, so Bubble Charts have a limited data size capacity. This can be somewhat remedied by interactivity: clicking or hovering over bubbles to display hidden information, having an option to reorganize or filter out grouped categories.
4. Bullet Chart
A Bullet Chart is a variation of a Bar Chart designed to compare a single, primary measure (for example, current year-to-date revenue) to one or more other measures to enrich its meaning (for example, compared to a target), and displays it in the context of qualitative ranges of performance, such as poor, satisfactory, and good. The qualitative ranges are displayed as blocks of one hue but with varying intensity, making them discernible by those who are colour blind and restricting the use of colours on the dashboard to a minimum.
A bullet chart always uses only one data series, but a dashboard may contain several bullet charts at the same time. This kind of chart can be of great help in some cases as far as it provides the clearest presentation of the data using less space.
5. Box-and-Whisker Plots
A boxplot also called a box and whisker plot is a way to show the spread and centers of a data set. Measures of spread include the interquartile range and the mean of the data set. Measures of the centre include the mean or average and median (the middle of a data set).
The box and whiskers chart shows you how your data is spread out. Five pieces of information (the “five-number summary“) are generally included in the chart:
- The minimum (the smallest number in the data set). The minimum is shown at the far left of the chart, at the end of the left “whisker.”
- The first quartile, Q1, is the far left of the box (or the far right of the left whisker).
- The median is shown as a line in the center of the box.
- The third quartile, Q3, shown at the far right of the box (at the far left of the right whisker).
- The maximum (the largest number in the data set), shown at the far right of the box.
A cartogram is a map in which the geometry of regions is distorted in order to convey the information of an alternate variable. The region area will be inflated or deflated according to its numeric value.
Most of the time, a cartogram is also a choropleth map where regions are colored according to a numeric variable (not necessarily the one used to build the cartogram).
Cartogram aims to correct the bias that can be observed in choropleth maps: when a variable is aggregated per region, a region with very few data points will look as important as a region with many data points.
For instance, imagine you display the average salary per region on your choropleth map. A region with 3 inhabitants with a huge area will have more importance on your map than a small one with 3,000 inhabitants, which induces a strong bias. The cartogram aims to reduce this bias.
7. Column Chart
Column charts use vertical bars to show comparisons between categories or things. One axis displays the categories being compared, the other, the data values. They are effective for showing the situation at a point in time. If something can be counted, it can be represented in a column chart, for example, the number of products sold or hits on a website.
Column charts are effective for emphasizing the difference between values. Used to compare change between groups over time, column charts are easiest to read when the differences in data value are relatively large.
If you have many categories to compare, a line graph might be easier to read.
Column charts can be created in standard, stacked, or percentage formats. Each format displays the same data in a different way, making it easier to make comparisons or see patterns emerging in the data.
8. Circle View
CircleView is a new approach for visualizing multidimensional time-referenced data sets. Due to the fact, that Continuous data is changing its characteristics over time, an appropriate visualization method is essential.
Therefore this new technique is a combination of hierarchical visualization techniques, such as treemaps, and circular layout techniques, such as Pie Charts and Circle Segments.
For users interested in analyzing such kind of data it is very important to identify patterns, exceptions, and similarities, which is the main purpose this new visualization approach is developed for. CircleView supports the visualization of the changing characteristics over time to observe changes in the data. Therefore CircleView provides an intuitive and easy-to-understand visualization interface, which enables the user to gather all the information needed for analysis. Additionally, the application enables user interaction, such as different ordering options of the data, filtering methods, and the possibility to define the period of time and speed of animation, which is visualized, to show both, real-time data and historical data.
Furthermore, it is possible for the user to explore correlations and exceptions in the data, which is achieved by using similarity algorithms and ordering algorithms. So CircleView enables the user with every needed tool for an effective exploratory data analysis.
9. Funnel Chart
A funnel chart helps you visualize a linear process that has sequential connected stages. For example, a sales funnel that tracks customers through stages: Lead > Qualified Lead > Prospect > Contract > Close. At a glance, the shape of the funnel conveys the health of the process you’re tracking.
Each funnel stage represents a percentage of the total. So, in most cases, a funnel chart is shaped like a funnel — with the first stage being the largest, and each subsequent stage smaller than its predecessor. A pear-shaped funnel is also useful — it can identify a problem in the process. But typically, the first stage, the “intake” stage, is the largest.
Funnel charts are a great choice:
- when the data is sequential and moves through at least 4 stages.
- when the number of “items” in the first stage is expected to be greater than the number in the final stage.
- to calculate potential (revenue/sales/deals/etc.) by stages.
- to calculate and track conversion and retention rates.
- to reveal bottlenecks in a linear process.
- to track a shopping cart workflow.
- to track the progress and success of click-through advertising/marketing campaigns.
10. Dot Distribution Map
A dot distribution map (or a dot density map or simply a dot map) is a type of thematic map that uses a point symbol to visualize the geographic distribution of a large number of related phenomena. Dot maps are a type of unit visualizations that rely on a visual scatter to show spatial patterns, especially variances in density.
The dots may represent the actual locations of individual phenomena, or be randomly placed in aggregation districts to represent several the users can explore periods 10 years differently to individuals. Although these two procedures, and their underlying models, are very different, the general effect is the same.
11. Dual Axis Chart
A dual-axis chart (also called a multiple axes chart) uses two axes to easily illustrate the relationships between two variables with different magnitudes and scales of measurement. The relationship between two variables is referred to as correlation.
A dual-axis chart illustrates plenty of information using limited space, so you can discover trends you may have otherwise missed.
12. Gantt Chart
A Gantt chart is a type of bar chart that illustrates a project schedule. This chart lists the tasks to be performed on the vertical axis, and time intervals on the horizontal axis. The width of the horizontal bars in the graph shows the duration of each activity.
Gantt charts illustrate the start and finish dates of the terminal elements and summary elements of a project. Terminal elements and summary elements constitute the work breakdown structure of the project. Modern Gantt charts also show the dependency (i.e., precedence network) relationships between activities. Gantt charts can be used to show current schedule status using percent-complete shadings and a vertical “TODAY” line.
Gantt charts are usually created initially using an early start time approach, where each task is scheduled to start immediately when its prerequisites are complete. This method maximizes the float time available for all tasks.
13. Heat Map
A heatmap is a graphical representation of data that uses a system of colour-coding to represent different values. Heatmaps are used in various forms of analytics but are most commonly used to show user behaviour on specific web pages or webpage templates. Heatmaps can be used to show where users have clicked on a page, how far they have scrolled down a page, or used to display the results of eye-tracking tests.
Heatmaps can give a more comprehensive overview of how users are really behaving. Heatmaps are also a lot more visual than standard analytics reports, which can make them easier to analyze at a glance. This makes them more accessible, particularly to people who are not accustomed to analyzing large amounts of data.
14. Highlight Table
A highlight table does exactly as its name suggests – it adds coloured highlights for the user to read the table more intuitively and effectively. It compares categorical data using colour. Speaking plainly, it is essentially a spreadsheet with coloured cells. Although we can use a text table with coloured text, the colour and opacity of the font will distract us from reading the text.
By visualizing with a highlight table, we can identify patterns or correlations much quicker than looking at the raw data of the table. It also has high scalability, which means displaying plenty of data in a single chart. On the other hand, the highlight table limits the number of dimensions. Besides, it is hard to distinguish between small differences.
A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data. This allows the inspection of the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc. An example of a histogram, and the raw data it was constructed from, is shown below:
To construct a histogram from a continuous variable you first need to split the data into intervals, called bins. In the example above, age has been split into bins, with each bin representing a 10-year period starting at 20 years. Each bin contains the number of occurrences of scores in the data set that are contained within that bin.
16. Line Graph
Line graphs (or line charts) are best when you want to show how the value of something changes over time, or compare how several things change over time relative to each other. Whenever you hear that key phrase “over time,” that’s your clue to consider using a line graph for your data.
Line graphs are common and effective charts because they are simple, easy to understand, and efficient. Line charts are great for:
- Comparing lots of data all at once
- Showing changes and trends over time
- Including important context and annotation
- Displaying forecast data and uncertainty
- Highlighting anomalies within and across data series
17. Matrix Diagram
The Matrix Diagram shows the relationship between items. At each intersection, a relationship is either absent or present. It then gives information about the relationship, such as its strength, the roles played by various individuals or measurements. It can be shaped differently depending on how many groups are compared.
18. Mekko Chart
A Mekko chart (sometimes also called Marimekko chart) is a two-dimensional stacked chart. In addition to the varying segment heights of a regular stacked chart, a Mekko chart also has varying column widths.
Column widths are scaled such that the total width matches the desired chart width. To preserve the visual relationship between widths of different columns, there are no gaps between columns in a Mekko chart.
In fact, the baseline of a Mekko chart is a fully-fledged value axis. You can select it with the mouse, and you can use its context menu to add tick marks, tick mark labels, and an axis title. If you have enabled tick marks for the baseline, you can use the floating toolbar of the axis to switch between absolute and percentage values.
A Mekko chart can also be decorated with some of the features described in Scales and axes and Arrows and values. The labels of the Mekko chart support the label content property, which lets you choose whether you want to display absolute values, percentages, or both.
19. Network Graph
Network Visualization (also called Network Graph) is often used to visualize complex relationships between a huge amount of elements. A network visualization displays undirected and directed graph structures.
This type of visualization illuminates relationships between entities. Entities are displayed as round nodes and lines show the relationships between them. The vivid display of network nodes can highlight non-trivial data discrepancies that may otherwise be overlooked.
20. Pie Chart
A Pie Chart is a type of graph that displays data in a circular graph. The pieces of the graph are proportional to the fraction of the whole in each category. In other words, each slice of the pie is relative to the size of that category in the group as a whole. The entire “pie” represents 100 percent of a whole, while the pie “slices” represents portions of the whole.
21. Polar Area
The Polar Area chart is similar to a usual pie chart, except sectors are equal angles and differ rather in how far each sector extends from the centre of the circle. The polar area diagram is used to plot cyclic phenomena (e.g., count of deaths by month).
22. Radial Tree
A radial Tree is a tree layout algorithm that applies to any type of diagram.
This layout algorithm arranges the diagram features hierarchically and places them in a radial tree according to the specified radius parameters. It works from a root junction that it uses as the circle centre to arrange the subtrees starting from this root in concentric circles, each circle corresponding to one hierarchical level.
Root flags can be set up on diagram junctions before executing the Radial Tree layout.
If no root junction is specified, the algorithm identifies the diagram junction associated with the smallest network topology index and uses this junction as the root junction.
If a diagram junction is specified as a root junction, the radial tree uses this root junction as the centre of the concentric circles.
When several root junctions are specified in the diagram, those root junctions are placed around a first concentric circle with a fictitious centre.
23. Scatter Plot Chart
A scatter plot (also called a scatter chart, scatter graph) uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.
The example scatter plot above shows the diameters and heights for a sample of fictional trees. Each dot represents a single tree; each point’s horizontal position indicates that tree’s diameter (in centimeters) and the vertical position indicates that tree’s height (in meters). From the plot, we can see a generally tight positive correlation between a tree’s diameter and its height. We can also observe an outlier point, a tree that has a much larger diameter than the others.
24. Stacked Bar Graph
A stacked bar graph (or stacked bar chart) is a chart that uses bars to show comparisons between categories of data, but with the ability to break down and compare parts of a whole. Each bar in the chart represents a whole, and segments in the bar represent different parts or categories of that whole.
Stacked bars do a good job of featuring the total and also providing a hint as to how the total for each category value is divided into parts. The bars can be either horizontal or vertical.
A streamgraph, or stream graph, is a type of stacked area graph which is displaced around a central axis, resulting in a flowing, organic shape. Unlike a traditional stacked area graph in which the layers are stacked on top of an axis, in a streamgraph, the layers are positioned to minimize their “wiggle”. More formally, the layers are displaced to minimize the sum of the squared slopes of each layer, weighted by the area of the layer. Streamgraphs display data with only positive values, and are not able to represent both negative and positive values.
Streamgraphs and their use were popularized by Amanda Cox in a February 2008 New York Times article on movie box office revenues. Cox got the idea from then-undergraduate Lee Byron, who had used a similar method for visualizing his music listening history.
A related graph, sometimes conflated with streamgraphs, is the ThemeRiver, in which the “silhouette” of the graph is symmetrically arranged around the central axis.
Treemaps are visualizations for hierarchical data. They are made of a series of nested rectangles of sizes proportional to the corresponding data value. A large rectangle represents a branch of a data tree, and it is subdivided into smaller rectangles that represent the size of each node within that branch.
Treemaps are commonly found on data dashboards. Designers often choose to add visual variety on a dense dashboard. However, treemaps are a complex visualization and present many obstacles to quick comprehension (which is the main requirement for any information displayed on a dashboard).
Treemaps are often used for sales data, as they capture relative sizes of data categories, allowing for the quick perception of the items that are large contributors to each category. Colour can identify items that are underperforming (or overperforming) compared to their siblings from the same category.
27. Waterfall Chart
A waterfall chart is a form of data visualization that helps in understanding the cumulative effect of sequentially introduced positive or negative values. These intermediate values can either be time-based or category-based. The waterfall chart is also known as a flying bricks chart or Mario chart due to the apparent suspension of columns (bricks) in mid-air. Often in finance, it will be referred to as a bridge.
Waterfall charts were popularized by the strategic consulting firm McKinsey & Company in its presentations to clients.
Complexity can be added to waterfall charts with multiple total columns and values that cross the axis. Increments and decrements that are sufficiently extreme can cause the cumulative total to fall above and below the axis at various points. Intermediate subtotals, depicted with whole columns, can be added to the graph between floating columns.
28. Word Cloud
Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a specific word appears in a source of textual data (such as a speech, blog post, or database), the bigger and bolder it appears in the word cloud.
A word cloud is a collection, or cluster, of words depicted in different sizes. The bigger and bolder the word appears, the more often it’s mentioned within a given text and the more important it is.
Also known as tag clouds or text clouds, these are ideal ways to pull out the most pertinent parts of textual data, from blog posts to databases. They can also help business users compare and contrast two different pieces of text to find the wording similarities between the two.
Perhaps you’re already leveraging advanced data visualization techniques to turn your important analytics into charts, graphs, and infographics. This is an excellent first step, as our brains prefer visual information over any other format.