Getting Started to Data Visualization

Visualization is mostly about knowing your data set with details. Better data visualization comes with domain expertise. Without digesting relations and boundaries of the data set, and focusing on what you want to show, knowing all the data visualization tools won’t help you.

On the other hand, if you are a domain expert, even the simplest tool would create great data visuals for you. So in this article, let’s go through the must-have’s for better data visualization. Before that if you are not familiar with what data visualization is, please read this post about it first.

1- Learn The Domain

It is important to know what your data contains. For example, if your data is a collection of student grades, probably you want to see the most successful students. To do that, you have to know the grading scale. Because the data visualization tools matter if it is a letter grade or not.

Use Case; Show Top 3 Students

If you do not know how the grading scale applies, with a simple calculation, you may think showing Top 3 Students with A+ grade, would be enough, which would be probably wrong. Let’s think about it a little more.

Our sample school has 10K students, and let’s say there are 100 A+ grades. Now, you list Top 3 Students with a simple select query and you are using table view to visualize this data.

Jane Doe A+
John Doe A+
Stuart Doe A+

This list is a result of the most common ordering style. It is ordered by name since the grades are the same. According to the US grading scale, A+ means 97–100%. So the problem with this data visualization is, we do not know whether these students’ percentages are 100% or not.

Lets ask some questions to dive in detail.

  • What are the basics of this grading system?
  • Is it some kind of aggregation of letters or numbers?
  • If numbers, what is the scale? 0-100, 0-10, 0-5?
  • Why do you select these students from the whole list? Isn’t there any difference between freshman and senior?

See, even our grouping might be wrong if we don’t know the answers. So, you have to learn the details of the domain to provide useful and correct data visuals.

2- Learn Data Visualization Tools

OK, now we got some domain expertise, what’s next?

Of course, we need to know what to show and how to show. In this section, we will go through how to show part.

There are lots of data visualization tools and they are mostly easy to use components if you select according to your background. For example, data visualization tools for python are more like to seem easier if you have experience coding with python. On the other hand, most tools support different programming languages and environments somehow.

Instead of learning data visualization libraries, you have to focus on what they provide and how you can customize them for your needs. Of course, you have to learn the basics of the data visualization tools first. In this manner, let’s go through two well-known charts such as the Bar Chart and the Pie Chart.

We generally use bar charts to show and compare numbers, frequencies, etc. Similarly, pie charts are handy for showing proportional comparison. See the examples below.

default-bar-pie-chart

3- Experience what you have learned

So, we talk about learning the domain and data visualization tool. Now, we have to use this information with experience. As you can see, both charts can be used to visualize this data in terms of their capabilities. Let’s put them side by side, and see what happens in reality.

student-bar-pie-chart

As you can see, there are two data visuals with the same data telling us different things. There is a slight difference in the bar chart which shows the winner. On the other hand, the pie chart tells you that they are almost the same. But in the beginning, we had to show the Top 3 Students which also implies showing their order I assume.

As a result, the pie chart won’t help you with this task. You have to dig into details for better understanding.