Babyboom

Data Read

To read “txt” files, I use R function - read.table().

For dataset babyboom, variable descriptions are as follows:

  1. V1: Time of birth recorded on the 24-hour clock
  2. V2: Sex of the child (1 = girl, 2 = boy)
  3. V3: Birth weight in grams
  4. V4: Number of minutes after midnight of each birth

Separate Data

I use function subset to divide data babyboom into 2 subset, one of these is data of girl and another is boy.

The table below is shown some data of baby_boys.

V1 V2 V3 V4
3 118 2 3554 78
4 155 2 3838 115
5 257 2 3625 177
8 422 2 2846 262
9 431 2 3166 271
10 708 2 3520 428

Summary of data

For example, the summary of babyboom is shown in below.

Using R function summary to summarize data. The results below is shwon the summarize of baby_boys and baby_girls.

##        V1               V2          V3             V4        
##  Min.   : 118.0   Min.   :2   Min.   :2121   Min.   :  78.0  
##  1st Qu.: 754.2   1st Qu.:2   1st Qu.:3198   1st Qu.: 464.2  
##  Median :1409.5   Median :2   Median :3404   Median : 849.5  
##  Mean   :1311.9   Mean   :2   Mean   :3375   Mean   : 799.6  
##  3rd Qu.:1937.5   3rd Qu.:2   3rd Qu.:3629   3rd Qu.:1177.5  
##  Max.   :2123.0   Max.   :2   Max.   :4162   Max.   :1283.0
##        V1               V2          V3             V4        
##  Min.   :   5.0   Min.   :1   Min.   :1745   Min.   :   5.0  
##  1st Qu.: 837.8   1st Qu.:1   1st Qu.:2711   1st Qu.: 507.8  
##  Median :1406.5   Median :1   Median :3381   Median : 846.5  
##  Mean   :1273.0   Mean   :1   Mean   :3132   Mean   : 773.0  
##  3rd Qu.:1804.2   3rd Qu.:1   3rd Qu.:3517   3rd Qu.:1094.2  
##  Max.   :2355.0   Max.   :1   Max.   :3866   Max.   :1435.0

Figures of data

The figure below is the histogram of number of births(boy) after midnight per hour.

The another figure is the histogram of number of births(girls) after midnight per hour.

The another figure is the histogram of number of births(both of all) after midnight per hour.

Figures below are shown Histogram of Weight(Boy, Girls respectively).

We can use Box Plot to detect outliers. The Box-Plot of babys weight after born as below, boys’s on the right and girs’s on the left.

Conclusion of Data - babyboom

According to figure above, It is obviously that one of the boys’s weight is less than the lower limit. So, I can conclude that this value is a outlier and this boy may be a premature foetus. And median of boy’s weight is more higher than girl’s. Finally, the interquartile range of girl’s weight is larger than boy’s, so girl’s weight data is more discrete than boy’s.

Airport

Data Read

This dataset is not only number but string type, so I use read.csv() function to read data.

For dataset airport, variable descriptions are as follows:

  1. V1: Airport
  2. V2: City
  3. V3: Scheduled departures
  4. V4: Performed departures
  5. V5: Enplaned passengers
  6. V6: Enplaned revenue tons of freight
  7. V7: Enplaned revenue tons of mail

I use command below to select numerical variations.

Descriptive Statistics

Using library psych to describe statistics.

##    vars   n       mean         sd     median    trimmed        mad     min
## V3    1 135   45702.44   56406.43   23519.00   34575.61   23729.01 1188.00
## V4    2 135   46453.73   57525.97   23906.00   35039.38   24117.45 1253.00
## V5    3 135 3139235.24 4587564.20 1254846.00 2172878.05 1428058.11    0.00
## V6    4 135   33640.65   80828.32    6192.36   13056.98    7865.55    7.95
## V7    5 135   11410.20   20510.77    2928.32    6680.76    3935.07    0.00
##           max      range skew kurtosis        se
## V3   322430.0   321242.0 2.38     6.82   4854.69
## V4   332338.0   331085.0 2.40     6.93   4951.05
## V5 25636383.0 25636383.0 2.62     7.91 394834.66
## V6   614223.6   614215.7 4.22    21.62   6956.59
## V7   140359.4   140359.4 3.30    13.50   1765.29

Euroweight

Read Data

To read “txt” files, I use R function - read.table().

For dataset euroweight, variable descriptions are as follows:

  1. V1: ID - this is the case number
  2. V2: weight - weight of the euro coin in grams
  3. V3: batch - number of the package

Summary of Data

##        V1               V2              V3      
##  Min.   :   1.0   Min.   :7.201   Min.   :1.00  
##  1st Qu.: 500.8   1st Qu.:7.498   1st Qu.:2.75  
##  Median :1000.5   Median :7.520   Median :4.50  
##  Mean   :1000.5   Mean   :7.521   Mean   :4.50  
##  3rd Qu.:1500.2   3rd Qu.:7.544   3rd Qu.:6.25  
##  Max.   :2000.0   Max.   :7.752   Max.   :8.00

Describe Data

##    vars    n    mean     sd  median trimmed    mad min     max   range  skew
## V1    1 2000 1000.50 577.49 1000.50 1000.50 741.30 1.0 2000.00 1999.00  0.00
## V2    2 2000    7.52   0.03    7.52    7.52   0.03 7.2    7.75    0.55 -0.19
## V3    3 2000    4.50   2.29    4.50    4.50   2.97 1.0    8.00    7.00  0.00
##    kurtosis    se
## V1    -1.20 12.91
## V2     4.42  0.00
## V3    -1.24  0.05

Acknowledgements

Thanks for knitr designed by(Xie 2015).

References

Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.name/knitr/.