Skip to content

GalKoaz/Data-visualization

Repository files navigation

Data Visualization

Content
  1. About The Project
  2. Languages and Tools
  3. Contact
  4. Acknowledgements
  5. Support

About The Project

Introduction to Data Science Final Project 2021 - Data Visualization

In this project we will use all the tools we learned during the course on machine learning, in the First notebook we asked to improve the notebook from last semester to the maximum accuracy we can get with all tools we know until today, in the second notebook we ask to deal with "Fashion Mnist" data set to create the best model can predict the image which each image have label of item, in the third notebook we ask to deal with Cat vs Dogs data frame: a lot of pictures of dogs and cat we try the best to create the best model to predict which the pictures is dog or cat with the best model and features we choose. In the last notebook we ask to deal with hands predicted movevent with 3 status (spontany,alone,sync) the big issue is to know how to load and prepare your data clean and visualization.

In each Notebook i Used , StandardScaler , PCA: dimensionality reduction (each notebook I tryed to reach the maximum accuracy while with the minimum dimension. GridSearching to find the best estimators to reach the highest accuracy.

Classifications Models : KNN , Random Forests, XGBoost, Voting , Bagging, Stacking, Logistic Regression, Gaussian Naive Bayes.

"The world is just a one Big Data problem 😄"


Improving NoteBook

Last semester we were given a submission classification project that tests our ability to predict accuracy percentages using classified models. When we had very few tools in our arsenal, today we have a lot of tools in our arsenal that we can work with to get and predict high accuracy.

In This Notebook my Accuracy on the best model is 100% .

The Notebook Also Uploaded to Kaggle and get bronze medal voted !! if you are want to see and learn check out https://www.kaggle.com/galkoaz/mushroom-classification-knn-randomforest-100

Jump to my Improving NoteBook click here


Fashion Mnist

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

The original MNIST dataset contains a lot of handwritten digits. Members of the AI/ML/Data Science community love this dataset and use it as a benchmark to validate their algorithms. In fact, MNIST is often the first dataset researchers try. "If it doesn't work on MNIST, it won't work at all", they said. "Well, if it does work on MNIST, it may still fail on others."

For more details click here

Final Score: 0.8872 with 84 components

Jump to my Fashion Mnist NoteBook click here


Cats vs Dogs

Web services are often protected with a challenge that's supposed to be easy for people to solve, but difficult for computers. Such a challenge is often called a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) or HIP (Human Interactive Proof). HIPs are used for many purposes, such as to reduce email and blog spam and prevent brute-force attacks on web site passwords.

Asirra (Animal Species Image Recognition for Restricting Access) is a HIP that works by asking users to identify photographs of cats and dogs. This task is difficult for computers, but studies have shown that people can accomplish it quickly and accurately. Many even think it's fun! Here is an example of the Asirra interface:

Asirra is unique because of its partnership with Petfinder.com, the world's largest site devoted to finding homes for homeless pets. They've provided Microsoft Research with over three million images of cats and dogs, manually classified by people at thousands of animal shelters across the United States. Kaggle is fortunate to offer a subset of this data for fun and research.

Final Score: 0.6786 with 349 components

Jump to my Cats vs Dogs NoteBook click here


Hands Movement

The purpose of the Project is to classify between three different situations in the way people communicate with each other.The first is Spontaneous (autonomous) situation in which two people move their hands freely in front of each other. The second is a synchronous movement in which the two people move their hands together and the third is a movement in position Own. Where only one side moves the hands.

explanation:

First, we want to see what our data frame looks like.

Second, we will look for null values and an imbalance in the number of hands right to left in the number and also in the type ie (2,1) & (Right, Left), if there is an imbalance we will use the functions we have already registered to delete the unwanted number of hands and their type. In the RefreshData function to reset the indexes and correct null values. We will perform the process in each of the files and finally make sure that our data is indeed clean. We will connect the "RightHand" file to the "Alone" file by using the MergeRightHand function.

Finally, with the help of the MergeR function, we will delay all the files into one large data frame with which we can proceed to the next step.

Models area - we tryed to get the best accuracy with KNN , Random Forests, XGBoost, Voting , Bagging, Stacking, Logistic Regression.

Final Score: 0.8912 with 25 components

Jump to my Hands Movement NoteBook click here


Languages and Tools

Acknowledgements

Contact

Gal - koazgal@gmail.com

Project Link: https://github.com/GalKoaz/Data-visualization

Support

Give a ⭐️ if this project helped you in some way (For The Good Karma 😇)!

About

Data visualization - Data Science

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published