Day 2: Visual & Statistical Thinking

We have several required readings for this day. Please have a look at the readings tab

FOR ASSIGNMENTS: Please go to the ‘Assignments’ tab in the menu. This is a daily summary.

In our second session, we took a look into how our minds process visual information, explored different data types, and began working with real data through Orange.

Pre-attentive Processing

We explored why data visualization is so powerful - our visual system can process certain attributes almost instantly (in less than 10 milliseconds), before conscious attention. These “pre-attentive” attributes include:

Position (most effectively perceived)
Length
Angle
Direction
Shape
Area & Volume
Color (usually least effective for encoding quantitative data)

This hierarchy of visual encoding effectiveness explains why position-based charts (like scatter plots) are often more effective than area-based (pie charts) or color-based visualizations for quantitative comparisons.

Marks & Channels

We learned about the fundamental components of visualizations:

Marks: The basic geometric elements that represent our data (points, lines, areas)
Channels: The visual properties we assign to these marks (position, size, color, etc.)

Dear Data

We explored the “Dear Data” project by Giorgia Lupi and Stefanie Posavec, where they documented aspects of their daily lives through hand-drawn data visualizations on postcards.

For our in-class activity, we created our own “Dear Data” style visualizations based on the survey responses collected yesterday. This exercise helped us think about:

Creative ways to encode multiple dimensions of data
How to create clear legends that allow others to decode our visualizations
Incorporating qualitative nuances into data representations

Types of Data: Understanding Our Ingredients

Quantitative Data: Discrete: Countable, whole numbers (e.g., number of emails)
Continuous: Measurable, can take any value (e.g., temperature, time)
Qualitative Data: Nominal: Categories without order (e.g., colors, gender)
Ordinal: Categories with order (e.g., satisfaction ratings)

We also learned about the key distinction between:

Interval data: Equal distances, no true zero (e.g., temperature in °C)
Ratio data: Equal distances with true zero (e.g., height, weight)

Understanding these types is crucial because they determine what operations and visualizations are appropriate.

Measures of Central Tendency

We explored three ways to find the “middle” of our data:

Mean: The arithmetic average (sum divided by count)
Median: The middle value when data is arranged in order
Mode: The most frequently occurring value

Each has strengths and weaknesses depending on your data’s distribution. Like Brad Pitt.

Data Analysis with Orange

We got our hands dirty with Orange, a visual programming tool for data analysis:

We explored the Adult Census Income dataset
Learned how to calculate statistics within groups using GROUP BY
Created our first visualizations:
- Bar charts for comparing values across categories
- Histograms for viewing distributions of continuous data
- Mosaic plots for examining relationships between categorical variables

With the Titanic dataset, we investigated survival patterns across different passenger groups, seeing how visualization can quickly reveal patterns that would be difficult to spot in tables.

Links from today

Dear Data Project

Tools

Orange Data Mining

The Datasaurus Dozen

Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics

Congratulations, aapko analyst hua hai. See you tomorrow!