---
title: "Lesson 3"
runtime: shiny
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Lesson 3
========================================================
***
### What to Do First?
Open the directory.
```{r setup 2}
setwd('/Users/olgabelitskaya/version-control/reflections-ud651')
```
***
### Pseudo-Facebook User Data
Read our tsv file and create pf data.frame.
```{r Pseudo-Facebook User Data}
pf <- read.csv('pseudo_facebook.tsv', sep='\t')
names(pf)
```
***
### Histogram of Users' Birthdays
Open the library for plotting.
```{r Histogram of Users\' Birthdays}
library(ggplot2)
qplot(x=dob_day, data=pf, colour = I("blue"))
```
***
#### Useful information for qplot.
```{r ?qplot}
?qplot
```
***
### Other libraries.
```{r Library 1}
library(knitr)
```
```{r Library 2}
library(ggthemes)
theme_set(theme_minimal(7))
```
```{r Library 3}
library(gridExtra)
```
***
### Estimating Your Audience Size
My last post on Facebook is about my learning so I think it has no audience.
***
### Some useful links for plotting
#### http://statistics.ats.ucla.edu/stat/r/modules/factor_variables.htm
#### http://hci.stanford.edu/publications/2013/invisibleaudience/invisibleaudience.pdf
#### http://docs.ggplot2.org/current/
#### https://en.wikipedia.org/wiki/Web_colors
***
### Faceting
```{r Faceting 1}
ggplot(aes(x = dob_day), data = pf) + geom_histogram(binwidth = 1) +
scale_x_continuous(breaks = 1:31)
```
#### By months
```{r Faceting 2}
ggplot(aes(x = dob_day), data = pf) + geom_histogram(binwidth = 1) +
scale_x_continuous(breaks = 1:31) + facet_wrap(~dob_month)
```
***
### Information about faceting methods
```{r Faceting 3}
?facet_wrap
?facet_grid
```
***
### Moira's Outlier
#### Which case do you think applies to Moira’s outlier?
Response:
bad data about extreme cases
***
### Friend Count
```{r Friend Count 1}
summary(pf$friend_count)
```
#### Plotting this
```{r Friend Count 2}
qplot(x = friend_count, data = pf)
```
***
### Limiting the Axes and exploring with Bin Width
```{r Limiting the Axes, exploring with Bin Width}
qplot(x = friend_count, data = pf, binwidth = 10) +
scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50))
```
***
### Statistics 'by' Gender
```{r Statistics \'by\' Gender}
table(pf$gender)
by(pf$friend_count, pf$gender, summary)
```
***
### Plotting Friend Count by gender
```{r Plotting Friend Count by gender 1}
qplot(x = friend_count, data = pf) + facet_grid(gender ~ .)
```
***
```{r Plotting Friend Count by gender 2}
qplot(x = friend_count, data = pf, binwidth = 10) +
scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) +
facet_wrap(~gender)
```
### Omitting NA Values
```{r Omitting NA Values}
ggplot(aes(x = friend_count), data = subset(pf, !is.na(gender))) +
geom_histogram(binwidth = 30, color = 'red', fill = '#099DD9') +
scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) +
facet_wrap(~gender)
```
***
### Tenure
Exploring with colors.
```{r Tenure}
ggplot(aes(x = tenure), data = pf) +
geom_histogram(binwidth = 30, color = 'green', fill = '#099DD9')
```
***
#### How would you create a histogram of tenure by year?
```{r Tenure Histogram by Year}
ggplot(aes(x = tenure/365), data = pf) +
geom_histogram(binwidth = .1, color = 'purple', fill = '#00FFFF')
```
***
### Labeling Plots
```{r Labeling Plots}
ggplot(aes(x = tenure / 365), data = pf) +
geom_histogram(binwidth = .1, color = 'brown', fill = '#F79420') +
scale_x_continuous(breaks = seq(1, 7, 1), limits = c(0, 7)) +
xlab('Number of years using Facebook') +
ylab('Number of users in sample')
```
***
### User Ages
```{r User Ages}
ggplot(aes(x = age), data = pf) +
geom_histogram(binwidth = 1, color = 'red', fill = '#5760AB') +
scale_x_continuous(breaks = seq(0, 113, 5))
```
***
### Transforming Data
```{r Transforming Data 1}
summary(pf$friend_count)
summary(log10(pf$friend_count + 1))
summary(sqrt(pf$friend_count))
```
***
### Transforming Data (plotting)
```{r Transforming Data 2}
ggplot(aes(x = friend_count), data = pf) + geom_histogram(binwidth = 30, color = 'green', fill = '#099DD9')
```
***
### Transforming Data2 (plotting)
```{r Transforming Data 3}
ggplot(aes(x = log10(friend_count + 1)), data = pf) + geom_histogram(binwidth = 0.1, color = 'purple', fill = '#099DD9')
```
***
### Transforming Data3 (plotting)
```{r Transforming Data 4}
ggplot(aes(x = sqrt(friend_count)), data = pf) + geom_histogram(binwidth = 1, color = 'red', fill = '#099DD9')
```
***
```{r Transforming Data 5}
?scale_x_log10()
```
***
```{r Transforming Data 6}
p1 <- qplot(x = friend_count, data = pf)
p2 <- qplot(x = log10(friend_count + 1), data = pf)
p3 <- qplot(x = sqrt(friend_count),data = pf)
grid.arrange(p1, p2, p3, ncol=1)
```
***
### Add a Scaling Layer
```{r Add a Scaling Layer}
p4 <- ggplot(aes(x = friend_count), data = pf) + geom_histogram() + scale_x_log10()
grid.arrange(p2, p4, ncol=2)
```
***
### Frequency Polygons
```{r Frequency Polygons}
ggplot(aes(x = friend_count, y = ..count../sum(..count..)), data = subset(pf, !is.na(gender))) +
geom_freqpoly(aes(color = gender), binwidth=10) +
scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) +
xlab('Friend Count') +
ylab('Percentage of users with that friend count')
```
***
### Likes on the Web
```{r Likes on the Web 1}
qplot(x = www_likes, data = pf) + geom_histogram(color = 'red', fill = '#099DD9')
```
```{r Likes on the Web 2}
qplot(x = www_likes, data = subset(pf, !is.na(gender)),
geom = 'freqpoly', color = gender)
```
```{r Likes on the Web 3}
ggplot(aes(x = www_likes), data = subset(pf, !is.na(gender))) +
geom_freqpoly(aes(color = gender)) + scale_x_log10()
```
```{r Likes on the Web 4}
summary(pf$www_likes)
by(pf$www_likes, pf$gender, sum)
```
***
### Box Plots
```{r Friend Count by Gender}
qplot(x = friend_count, data = subset(pf, !is.na(gender)),
binwidth=25, color = gender) +
scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) +
facet_wrap(~gender)
```
#### http://flowingdata.com/2008/02/15/how-to-read-and-use-a-box-and-whisker-plot/
```{r Box Plots}
qplot(x = gender, y = friend_count, data = subset(pf, !is.na(gender)), geom = 'boxplot', color = gender)
```
***
#### Adjust the code to focus on users who have friend counts between 0 and 1000.
```{r 1}
qplot(x = gender, y = friend_count, data = subset(pf, !is.na(gender)), geom = 'boxplot', color = gender) + scale_y_continuous(limits = c(0, 1000))
```
***
```{r 2}
qplot(x = gender, y = friend_count, data = subset(pf, !is.na(gender)), geom = 'boxplot', color = gender, ylim = c(0, 1000))
```
***
```{r 3}
qplot(x = gender, y = friend_count, data = subset(pf, !is.na(gender)), geom = 'boxplot', color = gender) + coord_cartesian(ylim = c(0, 1000))
```
***
### Box Plots, Quartiles, and Friendships
```{r Box Plots, Quartiles, and Friendships}
by(pf$friend_count, pf$gender, sum)
by(pf$friend_count, pf$gender, summary)
```
***
#### Write about some ways that you can verify your answer.
```{r Friend Requests by Gender}
by(pf$friendships_initiated, pf$gender, summary)
```
***
### Getting Logical
```{r Getting Logical 1}
summary(pf$mobile_likes)
summary(pf$mobile_likes > 0)
mobile_check_in <- NA
pf$mobile_check_in <- ifelse(pf$mobile_likes > 0, 1, 0)
pf$mobile_check_in <- factor(pf$mobile_check_in)
summary(pf$mobile_check_in)
```
```{r Getting Logical 2}
sum(pf$mobile_check_in == 1) / length(pf$mobile_check_in)
```
***
### Analyzing One Variable
Reflection:
R is an amazing tool for analyzing and representing data.
I have some very basic skills for now: reading csv files, analyzing one variable, plotting, faceting, transforming, etc.
Let's go ahead for a new knowledge.
title: "Lesson 3"
runtime: shiny
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Lesson 3
========================================================
***
### What to Do First?
Open the directory.
```{r setup 2}
setwd('/Users/olgabelitskaya/version-control/reflections-ud651')
```
***
### Pseudo-Facebook User Data
Read our tsv file and create pf data.frame.
```{r Pseudo-Facebook User Data}
pf <- read.csv('pseudo_facebook.tsv', sep='\t')
names(pf)
```
***
### Histogram of Users' Birthdays
Open the library for plotting.
```{r Histogram of Users\' Birthdays}
library(ggplot2)
qplot(x=dob_day, data=pf, colour = I("blue"))
```
***
#### Useful information for qplot.
```{r ?qplot}
?qplot
```
***
### Other libraries.
```{r Library 1}
library(knitr)
```
```{r Library 2}
library(ggthemes)
theme_set(theme_minimal(7))
```
```{r Library 3}
library(gridExtra)
```
***
### Estimating Your Audience Size
My last post on Facebook is about my learning so I think it has no audience.
***
### Some useful links for plotting
#### http://statistics.ats.ucla.edu/stat/r/modules/factor_variables.htm
#### http://hci.stanford.edu/publications/2013/invisibleaudience/invisibleaudience.pdf
#### http://docs.ggplot2.org/current/
#### https://en.wikipedia.org/wiki/Web_colors
***
### Faceting
```{r Faceting 1}
ggplot(aes(x = dob_day), data = pf) + geom_histogram(binwidth = 1) +
scale_x_continuous(breaks = 1:31)
```
#### By months
```{r Faceting 2}
ggplot(aes(x = dob_day), data = pf) + geom_histogram(binwidth = 1) +
scale_x_continuous(breaks = 1:31) + facet_wrap(~dob_month)
```
***
### Information about faceting methods
```{r Faceting 3}
?facet_wrap
?facet_grid
```
***
### Moira's Outlier
#### Which case do you think applies to Moira’s outlier?
Response:
bad data about extreme cases
***
### Friend Count
```{r Friend Count 1}
summary(pf$friend_count)
```
#### Plotting this
```{r Friend Count 2}
qplot(x = friend_count, data = pf)
```
***
### Limiting the Axes and exploring with Bin Width
```{r Limiting the Axes, exploring with Bin Width}
qplot(x = friend_count, data = pf, binwidth = 10) +
scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50))
```
***
### Statistics 'by' Gender
```{r Statistics \'by\' Gender}
table(pf$gender)
by(pf$friend_count, pf$gender, summary)
```
***
### Plotting Friend Count by gender
```{r Plotting Friend Count by gender 1}
qplot(x = friend_count, data = pf) + facet_grid(gender ~ .)
```
***
```{r Plotting Friend Count by gender 2}
qplot(x = friend_count, data = pf, binwidth = 10) +
scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) +
facet_wrap(~gender)
```
### Omitting NA Values
```{r Omitting NA Values}
ggplot(aes(x = friend_count), data = subset(pf, !is.na(gender))) +
geom_histogram(binwidth = 30, color = 'red', fill = '#099DD9') +
scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) +
facet_wrap(~gender)
```
***
### Tenure
Exploring with colors.
```{r Tenure}
ggplot(aes(x = tenure), data = pf) +
geom_histogram(binwidth = 30, color = 'green', fill = '#099DD9')
```
***
#### How would you create a histogram of tenure by year?
```{r Tenure Histogram by Year}
ggplot(aes(x = tenure/365), data = pf) +
geom_histogram(binwidth = .1, color = 'purple', fill = '#00FFFF')
```
***
### Labeling Plots
```{r Labeling Plots}
ggplot(aes(x = tenure / 365), data = pf) +
geom_histogram(binwidth = .1, color = 'brown', fill = '#F79420') +
scale_x_continuous(breaks = seq(1, 7, 1), limits = c(0, 7)) +
xlab('Number of years using Facebook') +
ylab('Number of users in sample')
```
***
### User Ages
```{r User Ages}
ggplot(aes(x = age), data = pf) +
geom_histogram(binwidth = 1, color = 'red', fill = '#5760AB') +
scale_x_continuous(breaks = seq(0, 113, 5))
```
***
### Transforming Data
```{r Transforming Data 1}
summary(pf$friend_count)
summary(log10(pf$friend_count + 1))
summary(sqrt(pf$friend_count))
```
***
### Transforming Data (plotting)
```{r Transforming Data 2}
ggplot(aes(x = friend_count), data = pf) + geom_histogram(binwidth = 30, color = 'green', fill = '#099DD9')
```
***
### Transforming Data2 (plotting)
```{r Transforming Data 3}
ggplot(aes(x = log10(friend_count + 1)), data = pf) + geom_histogram(binwidth = 0.1, color = 'purple', fill = '#099DD9')
```
***
### Transforming Data3 (plotting)
```{r Transforming Data 4}
ggplot(aes(x = sqrt(friend_count)), data = pf) + geom_histogram(binwidth = 1, color = 'red', fill = '#099DD9')
```
***
```{r Transforming Data 5}
?scale_x_log10()
```
***
```{r Transforming Data 6}
p1 <- qplot(x = friend_count, data = pf)
p2 <- qplot(x = log10(friend_count + 1), data = pf)
p3 <- qplot(x = sqrt(friend_count),data = pf)
grid.arrange(p1, p2, p3, ncol=1)
```
***
### Add a Scaling Layer
```{r Add a Scaling Layer}
p4 <- ggplot(aes(x = friend_count), data = pf) + geom_histogram() + scale_x_log10()
grid.arrange(p2, p4, ncol=2)
```
***
### Frequency Polygons
```{r Frequency Polygons}
ggplot(aes(x = friend_count, y = ..count../sum(..count..)), data = subset(pf, !is.na(gender))) +
geom_freqpoly(aes(color = gender), binwidth=10) +
scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) +
xlab('Friend Count') +
ylab('Percentage of users with that friend count')
```
***
### Likes on the Web
```{r Likes on the Web 1}
qplot(x = www_likes, data = pf) + geom_histogram(color = 'red', fill = '#099DD9')
```
```{r Likes on the Web 2}
qplot(x = www_likes, data = subset(pf, !is.na(gender)),
geom = 'freqpoly', color = gender)
```
```{r Likes on the Web 3}
ggplot(aes(x = www_likes), data = subset(pf, !is.na(gender))) +
geom_freqpoly(aes(color = gender)) + scale_x_log10()
```
```{r Likes on the Web 4}
summary(pf$www_likes)
by(pf$www_likes, pf$gender, sum)
```
***
### Box Plots
```{r Friend Count by Gender}
qplot(x = friend_count, data = subset(pf, !is.na(gender)),
binwidth=25, color = gender) +
scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) +
facet_wrap(~gender)
```
#### http://flowingdata.com/2008/02/15/how-to-read-and-use-a-box-and-whisker-plot/
```{r Box Plots}
qplot(x = gender, y = friend_count, data = subset(pf, !is.na(gender)), geom = 'boxplot', color = gender)
```
***
#### Adjust the code to focus on users who have friend counts between 0 and 1000.
```{r 1}
qplot(x = gender, y = friend_count, data = subset(pf, !is.na(gender)), geom = 'boxplot', color = gender) + scale_y_continuous(limits = c(0, 1000))
```
***
```{r 2}
qplot(x = gender, y = friend_count, data = subset(pf, !is.na(gender)), geom = 'boxplot', color = gender, ylim = c(0, 1000))
```
***
```{r 3}
qplot(x = gender, y = friend_count, data = subset(pf, !is.na(gender)), geom = 'boxplot', color = gender) + coord_cartesian(ylim = c(0, 1000))
```
***
### Box Plots, Quartiles, and Friendships
```{r Box Plots, Quartiles, and Friendships}
by(pf$friend_count, pf$gender, sum)
by(pf$friend_count, pf$gender, summary)
```
***
#### Write about some ways that you can verify your answer.
```{r Friend Requests by Gender}
by(pf$friendships_initiated, pf$gender, summary)
```
***
### Getting Logical
```{r Getting Logical 1}
summary(pf$mobile_likes)
summary(pf$mobile_likes > 0)
mobile_check_in <- NA
pf$mobile_check_in <- ifelse(pf$mobile_likes > 0, 1, 0)
pf$mobile_check_in <- factor(pf$mobile_check_in)
summary(pf$mobile_check_in)
```
```{r Getting Logical 2}
sum(pf$mobile_check_in == 1) / length(pf$mobile_check_in)
```
***
### Analyzing One Variable
Reflection:
R is an amazing tool for analyzing and representing data.
I have some very basic skills for now: reading csv files, analyzing one variable, plotting, faceting, transforming, etc.
Let's go ahead for a new knowledge.
Комментариев нет:
Отправить комментарий