Secret of the Name “”

First published on on 4th of November 2016.

So after a few weeks of Data Management and Analytics class and having been working on with R, I have attended to the DBS Analytics Society meeting on 22nd of October.

Thanks to Darren, we had some pastries for breakfast, eating them while drinking a double shot coffee woke me up on a Saturday morning.

Darren prepared us four different quizzes although I could have finished only two of them in 2 hours, it was a very helpful meeting to practice R with different data sets.

First quiz was about basic R commands and how to use them. It was relatively easier than the second quiz. I have uploaded my code to my Github account with the questions. I got one mistake in my first trial as first question was asking for sum of the output where I gave the output as the answer.

Second quiz was more challenging. It was analyzing the baby names in US. We had two different data sets, first data set Baby Names US 1900 and second data set US Baby Names 2000. First a few questions were about reading the data into R and then calculating how many babies were born in a given year and then 10 most popular names by year and gender. The most difficult question for me was 12th question. We have been asked to find out top 10 changed male baby names. Here is my code to solve the question with a bit help from Darren.

Which are the 10 most changed male baby names

I have changed the third column names to n1900 and n2000 before merging the data so that I wouldn’t have two columns with same name. After that I have merged both data sets and changed NA values to “0”. To find the change for a name between 1900 – 2000 I have subtracted number of births in 2000 from number of births in 1900. To prevent any negative values I have also used abs function so that I can have absolute values of each difference.

After that I have ordered the data and then created a new variable where I created a subset with difference in Male names. Here are the  top 10 changed Male names between 1900 and 2000:

“Jacob, Michael, Matthew, Joshua, Christopher, Nicholas, Andrew, Daniel, Tyler, Brandon”

Next question was to get the same for Females and here are the top 10 changed Female names between 1900 and 2000:

“Emily, Hannah, Madison, Ashley, Alexis, Samantha, Jessica, Sarah, Taylor, Lauren”

Do you think it is a coincidence that, the biggest affair website, are using 2 of the top 4 most changed female baby names between 1900 and 2000? I don’t know, but let’s find out.

There were no Female babies named Ashley or Madison back in 1900 while there were 17,993 Ashley and 19,966 Madison in 2000. So, do you think Darren Morgenstern had checked any data while he was choosing the name back in 2001 while he was establishing the platform?

Also, here are two graphs that compares names between 1900 and 2000 for each gender.

Female 1900 v 2000
Male 1900 v 2000

Although I have done only two of the four quizzes yet, I am planning to finish the rest. I will post my challenges to the blog when I finish them.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s