...

Full Bio

Listed Key Characteristics Of Cloud Computing

64 days ago

A Strong Determination Of Machine Learning In 2K19

104 days ago

Data Science: A Team Spirit

111 days ago

If You Are A Beginner Then Have Handy These Machine Learning Books To Gain Knowledge

112 days ago

Industry Automation Is Gearing Up Various Companies

114 days ago

These Computer Science Certifications Really Pay Good To You

119943 views

List Of Top 5 Programming Skills Which Makes The Programmer Different From Others?

117291 views

Which Programming Language Should We Use On A Regular Basis?

107760 views

Cloud Engineers Are In Demand And What Programming Language They Should Learn?

89487 views

Python Opens The Door For Computer Programming

68400 views

### Is it Necessary to Know Big Data Before Data Analytics?

- Gathering data from different resources.
- Cleaning and pre-processing the data.
- Studying statistical properties of the data.
- Using Machine Learning techniques to do forecasting and derive insights from the data.
- Communicating the results to decision makers in an easy to understand way.

- HDFS: HDFS known as Hadoop Distributed File System is the file system used by Hadoop. HDFS gives a view of single directory structure to the user while under the hood the file system is distributed in nature.

- Map-Reduce: It is the distributed programming environment provided by Hadoop. Map-Reduce is used to implement the application logic that will use the data stored on HDFS to produce results. Map-Reduce is based on parallel computing. The normal program that you as a programmer write for conventional system will not work on Map-Reduce. For Map-Reduce you have to convert your serial program to a parallel version.

- Hadoop is written in Java and thus has APIs available for Java language.
- For other languages there is a utility known as Hadoop Streaming through which other languages could talk to Hadoop.
- Hadoop mainly works on Linux platform, however recently support for windows is also added.

- Basic statistics: Summary statistics, Correlations, Stratified sampling, Hypothesis testing, Random data generation.
- Classification and regression: linear models (SVM, logistic regression, linear regression), naive Bayes, decision trees, ensembles of trees (Random Forests and Gradient-Boosted Trees), isotonic regression.
- Collaborative filtering: alternating least squares (ALS)
- Clustering: k-means, Gaussian mixture, power iteration clustering (PIC), latent Dirichlet allocation (LDA), streaming k-means,
- Dimensionality reduction: singular value decomposition (SVD), principal component analysis (PCA)
- Feature extraction and transformation
- Frequent pattern mining: FP-growth
- Optimization: stochastic gradient descent, limited-memory BFGS (L-BFGS)

- Collaborative Filtering: User-Based Collaborative Filtering, Item-Based Collaborative Filtering, Matrix Factorization with ALS, Matrix Factorization with ALS on Implicit Feedback, Weighted Matrix Factorization, SVD++.
- Classification: Logistic Regression trained via SGD, Naive Bayes / Complementary Naive Bayes, Random Forest, Hidden Markov Models, Multilayer Perceptron.
- Clustering: Canopy Clustering, k-Means Clustering, Fuzzy k-Means, Streaming k-Means, Spectral Clustering.
- Dimensionality Reduction: Singular Value Decomposition, Lanczos Algorithm, Stochastic SVD, PCA (via Stochastic SVD), QR Decomposition.
- Topic Models: Latent Dirichlet Allocation
- Miscellaneous: RowSimilarityJob, ConcatMatrices, Collocations, Sparse TF-IDF Vectors from Text, XML Parsing, Email Archive Parsing, Lucene Integration, Evolutionary Processes.

- Converts Pig Latin statements to Map-Reduce under the hood.
- Allows user defined functions. Thus you could write your custom functions and use them while quering using Pig Latin.
- Easy to use. It requires very less lines of code compared to Map-Reduce for same task.
- Mainly suitable ETL jobs.
- Uses lazy evaluation. It means part of code is only executed only when it is needed.
- Supports creation of data pipelines in form of Directed Acyclic Graphs.