...

Full Bio

Why Do Software Developers Participate in Hackathons? Are Hackathons For Beginners?

today

How To Choose Right Programming Language For Your Career

today

10+ Free Java Programming Language Books For Beginners - Download, Pdf, and HTML

yesterday

Top 5 Reasons Why Switched From Python To Go

yesterday

Top 10 Reasons To Learn Data Science & Get Highest Salary

yesterday

Which Programming Languages in Demand & Earn The Highest Salaries?

325698 views

Top 10 Best Countries for Software Engineers to Work & High in-Demand Programming Languages

275430 views

50+ Data Structure, Algorithms & Programming Languages Interview Questions for Programmers

200004 views

100+ Data Structure, Algorithms & Programming Language Interview Questions Answers for Programmers - Part 1

176790 views

Why I Studied Full-time 8 Months For A Google Programming Language Interview

147708 views

### Ultimate Python Quickstart Guide For Data Science

- Install Anaconda
- Open Jupyter Notebook
- Start New Notebook
- Try Math Calculations
- Import Data Science Libraries
- Import Your Dataset
- Explore Your Data
- Clean Your Dataset
- Engineer Features
- Train a Simple Model
- Next Steps

- First, we imported Python's math module, which provides convenient functions (e.g. math.sqrt()) and math constants (e.g. math.pi).
- Second, 2*2*2*2... or "two to the fourth"... is written as 2**4. If you write 2^4, you'll get a very different output!
- Finally, the text following the "hashtags" (#) is called comments. Just as their name implies, these text snippets are not run as code.

- First, we imported the Pandas library. We also gave it the alias of pd. This means we can evoke the library with pd. You'll see this in action shortly.
- Next, we imported the pyplot module from the matplotlib library. Matplotlib is the main plotting library for Python. There's no need to bring in the entire library, so we just imported a single module. Again, we gave it an alias of plt.
- Oh yea, and the %matplotlib inline command? That's Jupyter Notebook specific. It simply tells the notebook to display our plots inside the notebook, instead of in a separate screen.
- Finally, we imported a basic linear regression algorithm from scikit-learn. Scikit-learn has a buffet of algorithms to choose from. At the end of this guide, we'll point you to a few resources for learning more about these algorithms.

- df is where we stored the data. It's called a "dataframe," and it's also a Python object, like the variables from Step 4.
- .isnull() is called a method, which is just a fancy term for a function attached to an object. This method looks through our entire dataframe and labels any cell with a missing value as True. (Tip: Try running df.head().isnull() and see what you get!)
- Finally, .sum() is a method that sums all of the True values across each column. Well... technically, it sums any number, while treating True as 1 and False as 0.

- Numerical ones are pretty self-explanatory... For example, "number of years of education" would be a numerical feature.
- Categorical features are those that have classes instead of numeric values.... For example, "highest education level" would be a categorical feature, and the classes could be: ['high school', 'some college', 'college', 'some graduate', 'graduate'].

- The variables to drop... (e.g. ['Y1', 'Y2'])
- Whether to drop from the index ( axis=0) or the columns ( axis=1)