My 2 cents about Lab1: Q5

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

My 2 cents about Lab1: Q5

Darth Knight
Because of the outliers, the boxplots are squeezed
and become incomparable each other.
This can be resolved by adding "outline=FALSE" argument into "boxplot" command.
It will remove the outliers so that you can easily compare the two boxplots.
Let me know if this works okay.
Reply | Threaded
Open this post in threaded view
|

Re: My 2 cents about Lab1: Q5

brice
This helps very much, but I don't understand exactly what the addition tells it to do and how it affects it
Reply | Threaded
Open this post in threaded view
|

Re: My 2 cents about Lab1: Q5

Darth Knight
It is just an appointed argument to decide whether
to see the outliers on the boxplot or not.
The default was "TRUE" which means showing; so, we can make it "FALSE".

In general, "?command" will show us its help document.
For example, by using "?boxplot", we can find the explanation
about the argument as follows:

"outline if outline is not true, the outliers are not drawn"

In many cases, the help documents can save our googling time a lot!
Reply | Threaded
Open this post in threaded view
|

Re: My 2 cents about Lab1: Q5

MegDawg97
In reply to this post by Darth Knight
For histograms the outline=false command doesnt get rid of the outliers, how would you do that for the hist?
Reply | Threaded
Open this post in threaded view
|

Re: My 2 cents about Lab1: Q5

Darth Knight
Yes, there is no such luxury in the default histogram command.

If we really want to get rid of the outliers in histograms,
we need to either find out a special package which would give you an option for the outliers
or remove the outliers in our data first and then plot it using the default histogram.

Since I cannot recall a specific package, let me introduce a function which removes the outliers in your data:

remove_outliers <- function(x, na.rm = TRUE, ...) {
  qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
  H <- 1.5 * IQR(x, na.rm = na.rm)
  y <- x
  y[x < (qnt[1] - H)] <- NA
  y[x > (qnt[2] + H)] <- NA
  y
}


If you run this entire code in RStudio and use it by "out<-remove_outlier(YOUR DATA)",
then "hist(out)" will give you the desired histogram with no outliers.

Lastly, the outlier detection rule used in this code is called "1.5*IQR rule".
You can find an explanation here: https://www.youtube.com/watch?v=FRlTh5HQORA