When you take your machine learning models to the production level, especially in an enterprise setting, you will need your models to give you a fast and reliable response. And this is where Spark comes into the picture. Spark offers a reliable distributed/clustered computing framework that sits on top of the Hadoop framework and if
If you are on this page, chances are you have heard of the incredible capability of XGBoost. Not only it “boasts” higher accuracy compared to similar boasted tree algorithms like GBM (Gradient Descent Machine), thanks to a more regularized model formalization to control over-fitting, it enables many Kaggle Masters to win Kaggle competitions as well.
There are tons of Python-based visualisation tools out there but my favourite one has to be Seaborn. Some would say using Seaborn is a form of cheating. Well, after all Seaborn is just a wrapper of matplotlib and instead of saying Seaborn VS matplotlib, we should look at it as a upgraded, flashy version of the old
Panda’s read_table or read_csv is probably the number 1 method that comes to everyone’s mind when you need to read the rows of data into dataframe. After all, you could do that in just 2 lines: Neat huh? These 2 simple lines would go work well with many cases. But, guess what happens if you
You stumble upon some intriguing patient cancer dataset that seems to be the last remaining puzzle towards solving the human war against cancer that will make this world a better place for everyone and you excitedly download the dataset. Your data analysis usually go through these standard processes: 1) Load data 2) Do some pre-processing of
This is the start of a series, “Just another Crystal Report bug”. Anyone who has been using Crystal Report for at least a couple of months probably stumble upon a bug at least once or twice and as someone who has been actively using Crystal Report on a daily basis for over a year, I
Data science projects lives on data. Without huge amount of unbiased data to explore and play with, your seaborn graphs could be skewed, prediction models could be unreliable, and your company might even make the wrong business decision. I have a couple of toy data science projects that derive its data sources from website and