Response Variable and Explanatory Variable. Example One To explore these concepts we will examine a few examples. For the first example, suppose that a researcher is interested in studying the mood and attitudes of a group of first year college students. All first year students are given a series of questions. These questions are designed to assess the degree of homesickness of a student. Students also indicate on the survey how far their college is from home. One researcher who examines this data may just be interested in the types…

# Category: Data Science

## Response Variable and Explanatory Variable.

## Algorithms – Every Data Scientist Should Know

Below is the snapshot of the algorithms every data scientist should know.

## Statistical Modeling, Data Analysis and Machine Learning Libraries

## Linear Algebra – Post 2 – Linear combination & Span , Linear Dependence & Linear Independence

Linear Combination and Span: Given a set of vectors, what other vectors can you create by adding and/or subtracting scalar multiples of those vectors. The set of vectors that you can create through these linear combinations of the original set is called the “span” of the set. More generally, a linear combination of n vectors v1, v2, . . . , vn is any vector of the form a1v1 + a2v2 + · · · + anvn where a1, a2, . . . , an are scalars The set of…

## Apache Hadoop Series – Post 2 – [ What is Big Data ]

To understand the need for Hadoop, we need to understand what is Big data? Big data came into existence as the collection of data is increasing at an exponential rate. For example, NASA is collecting ~1.73 GB data/second, which is very huge. Different types of data are collected on a daily basis – video , audio , text , pictures, etc So we have a need for high volume of data which has a high variety and needs to be processed at a high velocity. For example : Google needs…

## Apache Hadoop Series – Post -1 – [ Introduction on Apache Hadoop ]

Organizations all around the world have been storing data for years. Now , Organizations wants to use this data , To understand and analyze existing problems To seize new opportunities and To create more revenue , attain more profitability and to cut down the loses. The study and analysis of these vast volumes of data has given birth to the term – Big Data. The first file system which was introduced to store and process large volumes of data was invented by google called the google file system, otherwise known…

## Linear Algebra Series – Post – 1- [Vectors & Scalars]

Vectors & Scalars: The mathematical quantities that are used to describe the motion of objects can be divided into two categories. The quantity is either a vector or a scalar Vector is a representation of both magnitude and direction. Scalar , represents only the magnitude or the numeric value. For Example: The forces which operate on a flying aircraft, the weight, thrust, and aerodynmaic forces, are all vector quantities. The resulting motion of the aircraft in terms of displacement, velocity, and acceleration are also vector quantities. These quantities can be…

## Exploratory Analysis in R – 3 – [ Plotting system in R – Base , Lattice & ggplot2 ]

The 3 major plotting system in R is Base Plotting system Lattice Plotting system ggplot2 system Base: “artist’s palette” model Lattice: Entire plot specified by one function; conditioning ggplot2: Mixes elements of Base and Lattice Drawbacks of base plotting system: Can’t go back once plot has started (i.e. to adjust margins); need to plan in advance Difficult to “translate” to others once a new plot has been created (no graphical “language”) Example of a simple base plot:

1 2 3 4 5 | <em><strong>library(datasets)</strong></em> <em><strong>data(cars)</strong></em> <em><strong>with(cars, plot(speed, dist))</strong></em> |

Advantages of the lattice system: Plots are created with a single…

## Exploratory Analysis in R – 2 – [Plotting system in R – Bar, Histogram & Scatterplots]

Below are the summaries of plot to express in 2 dimensions & greater than 2 dimensions. Simple Summaries of Data : Two dimensions Multiple/overlayed 1-D plots (Lattice/ggplot2) Scatterplots Smooth scatterplots > 2 Overlayed/multiple 2-D plots: Overlayed/multiple 2-D plots; Co-Plots Use color, size, shape to add dimensions Spinning plots Actual 3-D plots (not that useful) Multiple Boxplots: Eg:

1 2 3 | boxplot(pm25 ~ region, data = pollution, col = "red") |

This boxplot gives a 2 dimensional data of the pm2.5 variable for the categories east and west, it is to be noted that the east region has an higher average than…