First published on http://tugrul.dbsdataprojects.com on 30th of March, 2017.
While I was exploring different data sets on kaggle.com, I have seen this data set about Video Game Sales with Ratings. Umesh from kaggle created a great kernel which explores this data set and creates different graphs such as Revenue by Game, revenue by seller etc. in different regions.
This data includes sales figures from different regions such as Japan, US/North America, Europe, Other Sales, and Global Sales. I will use the code from kaggle to create the same graphs in ggplot and will discuss the trends. As Wii Sport has been given with the Wii Console by default, I have excluded this game from the results.
My first graph shows top 10 games from different regions based on sold games units in overall.
Grand Theft Auto V is the best selling game in overall sales figures around the globe. Although Grand Theft Auto V has the top spot with 56.57 million unit sales overall, it includes all different consoles. Second game Super Mario Bros has sold 45.31 million unit sales and it is also used Nintendo consoles such as Wii, DS, 3DS so we can argue that it is one of the best selling games.
There are a few trends that looks interesting from this table. Although GTA V is the best selling game in overall, it is not ranked in top 5 sellers in Japan with 1.42 million units of sales. Pokemon Red/Blue games has sold over 10 millions copies in Japan has become the best performer in top 10. Also, best selling game in North America from top 10 games is Super Mario Bros instead of GTA V. Also Tetris is very popular in North America as 73% of them has been sold in this region.
Continue reading “Top 10 Video Games Around The World”
First published on http://tugrul.dbsdataprojects.com on 25th of March 2017.
Sql stands for Structured Query Language and is being used to query and manipulate relational databases. Most of the Relational Database Management Systems use SQL as standard database language. I will be using MS SQL in these examples and learning process.
Dr Edgar F. Codd is known as the “Codd Father” of the relational databases. He described a relational model for databases in 1970. First SQL appeared in 1974 and IBM has worked to develop the ideas of Codd and released a product System/R. In 1986, IBM developed first prototype of relational database and it was standardized by ANSI.
Capabilities of SELECT statements
SELECT statements can give us a projection, we can get a subset of a column. Secondly, you can filter the number of rows with SELECT and also you can join different tables by primary and foreign keys. It allows to get data from different tables and show as a table.
Basis SELECT statement identifies the columns o be displayed and you also need to add FROM to tell which tables you will get the data from. Continue reading “Learning SQL – Part 1”
First published on http://tugrul.dbsdataprojects.com on 14th of November 2016.
I tried to learn R before this module through Coursera, I wasn’t able to continue to the course after second week as I found it a bit hard. Although one of my favorite character Homer Simpson would say “You tried your best and failed miserably. The lesson is, never try“, with Data Management and Analytics module I have started using/learning R again.
I have started my re-learning progress with CodeSchool‘s Try R online course. It was a good reminder for different features of R and I’ve learnt creating different graphs, using factors etc. during that 8 chapters of R adventure.
After completing that eight chapter I was ready to get real life data and conquer the world with my beautiful data stories. Obviously, it didn’t happen, yet! I have joined a few DBS Analytics Society meetings on Saturdays and started to analyse different data sets with R. Although I could have done most of those analysis in Excel in a short time, this time I am willing to learn R so I am still wrestling with it.
While I was looking for interesting data sets to analyze, I have found that Reddit and Kaggle.com websites were really useful to find different data sets. Also fivethirtyeight.com provides a lot of different data sets in their GitHub account but they are very good to find out everything from a data set so there are not many things that you could add to the story they tell.
For my first attempt to analyze data with R, I have decided to go with Simpsons data from kaggle.com and I could easily say that reading this article by Todd Schneider motivated me too.
Although there are many different outcomes in that article, I have decided to try something different and wanted to check how many times Simpsons Family characters have been used in title of episodes. Then I will try to compare how many people watched those episodes and what is the IMDb rating of the episodes.
Continue reading “Learning R with Simpsons”
First published on http://tugrul.dbsdataprojects.com on 4th of November 2016.
So after a few weeks of Data Management and Analytics class and having been working on with R, I have attended to the DBS Analytics Society meeting on 22nd of October.
Thanks to Darren, we had some pastries for breakfast, eating them while drinking a double shot coffee woke me up on a Saturday morning.
Darren prepared us four different quizzes although I could have finished only two of them in 2 hours, it was a very helpful meeting to practice R with different data sets.
First quiz was about basic R commands and how to use them. It was relatively easier than the second quiz. I have uploaded my code to my Github account with the questions. I got one mistake in my first trial as first question was asking for sum of the output where I gave the output as the answer.
Continue reading “Secret of the Name “AshleyMadison.com””
First published on http://tugrul.dbsdataprojects.com on 24th of October 2016.
Being from a country where we had five elections/referendums in last 5 years, I have seen many different maps with the election results and I always wondered how they created those maps. I saw my first Fusion Table based map when Panama Papers hit the news. This map shows all the addresses from Ireland which were mentioned in Panama Papers and I remember admiring how quickly Gavin Sheridan created that map, literally 10 minutes after his first tweet about the addresses.
Although I was impressed with an efficient tool such as Fusion Tables, I haven’t used it until this year. My first attempt to create a map from a Fusion Table was during our Application of Cloud Technologies module, just a week before Data Management & Analytics class. It was a good preparation for Data Management & Analytics class and for this assignment. During the class, we have created a US Population Density Map and were asked to do a similar version of that map for Ireland.
To create the Irish Population Density map, I was given two different data and we have been asked to turn them into information. The first data set was from Central Statistics Office (CSO) website. 2011 Census population data was enough to get the population by county and gender. I have done a bit of cleansing, by removing break down of city data for big cities and get every county in one lane. Dublin, Cork, Galway, Waterford, and Limerick population were given by county and by City and County. I have removed those lines and I also added South Tipperary and North Tipperary data to one line. Also after uploading my KML file, I have realized that Laoighis was spelled differently in KML data, so I have changed that in my Fusion table into to Laois.
Continue reading “People, People Everywhere”