Last Earnings, Twitter Inc. soared the most since its market debut in 2013 after it posted the first revenue growth in four quarters, driven by improvements to its app and added video content that are persuading advertisers to boost spending on the social network√Ę??-Blo omberg
Twitter has one of the biggest data sets in the world. It is much different from Facebook from the aspect that Twitter is real time. Twitter data sets are awesome troves of information and provide great insights. Working on some Twitter data set and providing valuable insights can be a good portfolio project to showcase. One can get twitter data here.
The interview process usually consists of phone interview with the hiring manager. On site interviews consists of meeting with Engineers/Data Scientists. The questions are usually algorithmic in nature including some machine learning questions, math/application based questions and one system design question around working on a distributed system to deliver high scale machine learning.
Given a 2-column file with user codes and counts, retrieve the top-k users based on a score that is a function of the number of times they appear on the file and these counts.
Given a list of all followers in format: 123, 345;234, 678;345, 123;...where the first column contains the Id of the follower, and the second one is the Id of who's followed, find all mutual follows(pair 123, 345 in the example above). Do the same in the case, when this list does not fit into the memory.
Design a system to find top 10 twitter hashtags in the most recent 1 min, 10 min, 1 hr...
Given Twitter user data, how would you measure engagement?
How can you illustrate a tree-based system with a SQL query?
How to combine two datasets?
What features would you use to build recommendation algorithm for users?
What would you change in Twitter App?
How would you test if the proposed change is effective or not? (related to previous question)
Find the median of a large dataset.
If you got the job at Twitter and got access to all of its data what kind of data analysis would you like to perform?
Reflecting on the Questions
Twitter has a list of complex coding questions from a data science perspective. Twitter Data Blog has a collection of great use cases and Github repos which can be useful to do some hands on work on the platform. This will definitely help learn more about the platform and also answer some of the Twitter specific questions. I would strongly encourage checking those out.