People often ask me "How can I learn Data Analytics?" and I often stumble upon this question 'How to become a Data Analyst" on Quora too. The answer is pretty much clearly available all over the internet. The actual issue is not how to become a data analyst but it is if we are ready to become one?
This post aims to take a newbie into the world of Data Analytics with a simple freely available public data and R (the open-source champ of Data Science).
Understanding the dataset is the primary operation any analyst should perform. We can use str() or summary() to explore the basic summary of our dataset and to see sample values.
We could clearly see that there are 1037 observations (rows/entries) and 3variables/columns and their data types - two of numeric (of which one is just a serial number and the other is Score) and name which is of character type.
When we scroll through our dataset (the read input file), we could see some users explicitly having their email id as username. Can we try to see how many such users have '@' in their username?
Let's use grepl (regular expression) to match the names containing @ symbol.
Using grepl('@',mind_lb$Name) would return us just TRUE/FALSE against each observation but what we actually need is count. so let's use table() function in R to find it out.
table(grepl('@',mind_lb$Name)) returns us the actual count (absolute figures) of usernames with @ and without @. But wouldn't it'd be better to represent in terms of percentage?
prop.table() along with table() which takes values from grepl('@',mind_lb$Name)gives us the values in decimals which in turn multiplied with 100 gives us the actual percentage of usernames with @ and without @.
Now we know that there are almost 6.5% usernames with @, obviously Gmail must be contributing the most part of it, but can we try to find if there's something else apart from Gmail?
Doesn't it seem easier to find out some valuable insights in a dataset? Data Analytics in fact easier. All you need is an open mind to see through the data and the tool and syntax you select would come handy once you start.
This is not a tutorial post but just to show a glimpse of the easiness of R and Data Analytics.
Are you ready to dive into the world of Data Analytics? If so, download R and R Studio and start today. Also, create a github account and share your code and visualization and comment the link here.