Day 2: Visual & Statistical Thinking
We have several required readings for this day. Please have a look at the readings tab
FOR ASSIGNMENTS: Please go to the ‘Assignments’ tab in the menu. This is a daily summary.
In our second session, we took a look into how our minds process visual information, explored different data types, and began working with real data through Orange.
Pre-attentive Processing
We explored why data visualization is so powerful - our visual system can process certain attributes almost instantly (in less than 10 milliseconds), before conscious attention. These “pre-attentive” attributes include:
- Position (most effectively perceived)
- Length
- Angle
- Direction
- Shape
- Area & Volume
- Color (usually least effective for encoding quantitative data)
This hierarchy of visual encoding effectiveness explains why position-based charts (like scatter plots) are often more effective than area-based (pie charts) or color-based visualizations for quantitative comparisons.
Marks & Channels
We learned about the fundamental components of visualizations:
- Marks: The basic geometric elements that represent our data (points, lines, areas)
- Channels: The visual properties we assign to these marks (position, size, color, etc.)
Dear Data
We explored the “Dear Data” project by Giorgia Lupi and Stefanie Posavec, where they documented aspects of their daily lives through hand-drawn data visualizations on postcards.
For our in-class activity, we created our own “Dear Data” style visualizations based on the survey responses collected yesterday. This exercise helped us think about:
- Creative ways to encode multiple dimensions of data
- How to create clear legends that allow others to decode our visualizations
- Incorporating qualitative nuances into data representations
Types of Data: Understanding Our Ingredients
- Quantitative Data
- Discrete: Countable, whole numbers (e.g., number of emails)
Continuous: Measurable, can take any value (e.g., temperature, time) - Qualitative Data
- Nominal: Categories without order (e.g., colors, gender)
Ordinal: Categories with order (e.g., satisfaction ratings)
We also learned about the key distinction between:
- Interval data: Equal distances, no true zero (e.g., temperature in °C)
- Ratio data: Equal distances with true zero (e.g., height, weight)
Understanding these types is crucial because they determine what operations and visualizations are appropriate.
Measures of Central Tendency
We explored three ways to find the “middle” of our data:
- Mean: The arithmetic average (sum divided by count)
- Median: The middle value when data is arranged in order
- Mode: The most frequently occurring value
Each has strengths and weaknesses depending on your data’s distribution. Like Brad Pitt.
Data Analysis with Orange
We got our hands dirty with Orange, a visual programming tool for data analysis:
- We explored the Adult Census Income dataset
- Learned how to calculate statistics within groups using GROUP BY
- Created our first visualizations:
- Bar charts for comparing values across categories
- Histograms for viewing distributions of continuous data
- Mosaic plots for examining relationships between categorical variables
With the Titanic dataset, we investigated survival patterns across different passenger groups, seeing how visualization can quickly reveal patterns that would be difficult to spot in tables.
Links from today
Dear Data Project
Tools
The Datasaurus Dozen
Congratulations, aapko analyst hua hai. See you tomorrow!