Exploratory Analysis in R – 3 – [ Plotting system in R – Base , Lattice & ggplot2 ]

Exploratory Analysis

The 3 major plotting system in R is Base Plotting system Lattice Plotting system ggplot2 system Base: “artist’s palette” model Lattice: Entire plot specified by one function; conditioning ggplot2: Mixes elements of Base and Lattice Drawbacks of base plotting system: Can’t go back once plot has started (i.e. to adjust margins); need to plan in advance Difficult to “translate” to others once a new plot has been created (no graphical “language”) Example of a simple base plot:

  Advantages of the lattice system: Plots are created with a single…

Exploratory Analysis in R – 2 – [Plotting system in R – Bar, Histogram & Scatterplots]

Exploratory Analysis

Below are the summaries of plot to express in 2 dimensions & greater than 2 dimensions. Simple Summaries of Data : Two dimensions Multiple/overlayed 1-D plots (Lattice/ggplot2) Scatterplots Smooth scatterplots > 2 Overlayed/multiple 2-D plots: Overlayed/multiple 2-D plots; Co-Plots Use color, size, shape to add dimensions Spinning plots Actual 3-D plots (not that useful)   Multiple Boxplots: Eg:

  This boxplot gives a 2 dimensional data of the pm2.5 variable for the categories east and west, it is to be noted that the east region has an higher average than…

Exploratory Analysis in R – 1

Exploratory Analysis

Exploratory Graphs: To understand data properties To find patterns in data To suggest modeling strategies To “debug” analyses To communicate results Characteristics of exploratory graphs: They are made quickly A large number are made The goal is for personal understanding Axes/legends are generally cleaned up (later) Color/size are primarily used for information Simple Summaries of Data:One dimension: Five-number summary : Summary of a particular aspects of a given variable : Eg: summary(pollution$pm25) But it is actually six number summary with the Mean included in the output. ## Min. 1st Qu.…

Principles of Analytics Graphics

Analytics Graphics

Principle 1: Show comparisons Evidence for a hypothesis is always relative to another competing hypothesis. Always ask “Compared to What?” Principle 2: Show causality, mechanism, explanation, systematic structure How you believe the system is operating. Principle 3: Show multivariate data Multivariate = more than 2 variables, show as much data as you can. Principle 4: Integration of evidence Completely integrate words, numbers, images, diagrams Data graphics should make use of many modes of data presentation Don’t let the tool drive the analysis Principle 5: Describe and document the evidence with…

Simpson’s paradox

Source: WikiPedia Simpson’s paradox, or the Yule–Simpson effect, is a paradox in probability and statistics, in which a trend appears in different groups of data but disappears or reverses when these groups are combined. It is sometimes given the impersonal title reversal paradox or amalgamation paradox. This result is often encountered in social-science and medical-science statistics, and is particularly confounding when frequency data is unduly given causal interpretations.The paradoxical elements disappear when causal relations are brought into consideration. Many statisticians believe that the mainstream public should be informed of the…