Hypothesis testing is a way by which we accept or decline a claim about the population on the basis of sample we collected. In statistical terms, it’s called making inferences about the population by using sample data collected randomly from the population. It starts by stating two hypotheses - Null Hypothesis and Alternative Hypothesis. When making claims about the population, under Null hypothesis, it’s assumed that there is no change in the population.

Continue reading

Recently, I was getting much excited about text analysis. Few days ago Julia Silge introduced a new library called tidylo, which added more excitement to the task. In this post, I’m going to show how we can isolate certain words from a set of documents. I will be showing 3 ways to extract words out of 2 sets of documents. As far as dataset is concerned, I’m using 2 great works of my 2 favourite scientists.

Continue reading

Suppose in the street if someone asks me what is meant by Machine Learning, my quick, top of my head answer will be - its an art of gathering and extracting valuable information from the past data using statistical tools to come up with a model or understanding to predict the future. It could be something like studying data on old market trend to come up with a way to predict the future trend.

Continue reading

In this post I’m doing some topic modelling. Topic modelling is a way of finding abstact topics in collection of documents. I’m using Sherlock Holmes stories and try to find out which word contributes to how much in telling what the story is about. One way to think about is certain words will play important role in defining what the story is about and the frequncy of those words play vital role in our task.

Continue reading

In this post I’m going to show some cool feature of Purrr. Purrr is an R package for functional programming. I have always been facinated by functional programming. I first heard about it while I was learning Scala. With this approach, not only it makes our code more succinct, but more expressive. There are other ways to achieve our results by using loops or functions like sapply,lapply but let’s not go into that direction.

Continue reading

Recently I have been watching Tidytuesday screencast from David Robinson. In his screencast he selects never seen before dataset and analyses it using R. Today I’m going to follow in his footsetp. In this blog post I’m going to analyse a set of data and visualize the result using ggplot2. The dataset is collected from medium.com. I’m going to breakdown all the titles of the articles into indivisual words and try to see which word is used the most in all of them.

Continue reading

Author's picture

Nabin Dangol


Data Analyst

London