TOC
Use Cases
3D Recognition
Semantic Segmentation
Audio Recognition
Speech to Text
Data Augmentation
Design
Games
Gesture Recognition
Using wearable sensors (phones, watches etc.)
Apps
Code repositories
- https://github.com/droiddeveloper1/android-wear-gestures-recognition
- https://github.com/drejkim/AndroidWearMotionSensors
Hyperparameter Tuning
Image Recognition
- MobileNetV2: The Next Generation of On-Device Computer Vision Networks, 2018
- Large-Scale Evolution of Image Classifiers by Esteban Real et al, 2017
- Rethinking the Inception Architecture for Computer Vision by Christian Szegedy et al, 2015
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications by Andrew G. Howard et al, 2017
- Deep Residual Learning for Image Recognition by Kaiming He et al, 2015
- Going Deeper with Convolutions by C. Szegedy et al, 2014
- ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky et al, 2012
- Xception: Deep Learning with Depthwise Separable Convolutions by François Chollet, 2017
- ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, 2012
Face Recognition
Food Recognition
Image Captioning
Person Detection
Semantic Segmentation
- What do we learn from region based object detectors (Faster R-CNN, R-FCN, FPN)? 2018
- What do we learn from single shot object detectors (SSD, YOLOv3), FPN & Focal loss (RetinaNet)? 2018
- Design choices, lessons learned and trends for object detections?
- Semantic Image Segmentation with DeepLab in Tensorflow, 2018
- model DeepLab-v3+ built on top of CNN
- https://github.com/tensorflow/models/tree/master/research/deeplab
- has Checkpoints and frozen inference graphs
- Deeplab demo on python
- support adopting MobileNetv2 for mobile devices and Xception for server-side deployment
- evaluates results in terms of mIOU (mean intersection-over-union)
- use PASCAL VOC 2012 and Cityscapes semantic segmentation benchmarks as an example in the code
- https://github.com/lankastersky/deeplab_background_segmentation (not working android app)
- Rethinking Atrous Convolution for Semantic Image Segmentation by Liang-Chieh Chen et al, 2017
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features by Liang-Chieh Chen et al, 2017](https://arxiv.org/abs/1712.04837)
- present a model, called MaskLab, which produces three outputs: box detection, semantic segmentation, and direction prediction
- built on top of the Faster-RCNN object detector
- evaluated on the COCO instance segmentation benchmark and shows comparable performance with other state-of-art models
- Mask R-CNN by Kaiming He et al, 2017
- https://github.com/facebookresearch/Detectron
- see links to articles at the end of the page
- extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition
- simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps
- easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework
- outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners
- uses the area under the precision recall curve (AP) metrics
- A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN, 2017
Interpretability
Programming and ML
Predict defects
Searching code
Writing code
NLP
Chatbots
Crossword question answerers
Database queries
Named entity resolution
Also known as deduplication and record linkage (but not entity recognition which is picking up the names and classifying them in running text)
Reverse dictionaries
Other name is concept finders
Return the name of a concept given a definition or description:
Sequence to sequence
- Smart Compose: Using Neural Networks to Help Write Emails, 2018
- Introducing Semantic Experiences with Talk to Books and Semantris by Rey Kurzweil et al, 2018
- Keras LSTM tutorial – How to easily build a powerful deep learning language model by Andy, 2018
- Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models by Louis Shao et al, 2017
- trained on a combined data set of over 2.3B conversation messages mined from the web
- The model: LSTM on tensorflow
- Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features by Matteo Pagliardini et al, 2017
- the model: Sent2Vec based on vec2vec
- Skip-Thought Vectors by Ryan Kiros et al, 2015
- based on RNN encoder-decoder models
- Sequence to Sequence Learning with Neural Networks by Ilya Sutskever et al, 2014
- the model: seq2seq based on LSTM
- Distributed Representations of Sentences and Documents by Quoc V. Le, Mikolov, 2014
- Distributed Representations of Words and Phrases and their Compositionality by Tomas Mikolov et al, 2013
- word2vec based on Mikolov’s Skip-gram model
- Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks by Richard Socher et al, 2010
- based on context-sensitive recursive neural networks (CRNN)
- see Reverse dictionaries
- How to calculate the sentence similarity using word2vec model
Semantic analysis
Spelling
Summarization
Text classification
Text to Image
Text to Speech
Personality recognition
- Mining Facebook Data for Predictive Personality Modeling (Dejan Markovikj,Sonja Gievska, Michal Kosinski, David Stillwell)
- Personality Traits Recognition on Social Network — Facebook (Firoj Alam, Evgeny A. Stepanov, Giuseppe Riccardi)
- The Relationship Between Dimensions of Love, Personality, and Relationship Length (Gorkan Ahmetoglu, Viren Swami, Tomas Chamorro-Premuzic)
Robotics
Search
Transfer Learning
Uber
Video recognition
Pose recognition
Object detection
Here are video-specific methods. See also Semantic Segmentation.
Scene Segmentation
Detects when one video (shot/scene/chapter) ends and another begins
Video Captioning
Video Classification
- Learnable pooling with Context Gating for video classification by Antoine Miech et al, 2018
- The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge, 2017
- Hierarchical Deep Recurrent Architecture for Video Understanding by Luming Tang et al, 2017
- Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? by Kensho Hara et al, 2017
- https://github.com/kenshohara/video-classification-3d-cnn-pytorch
- trained on the Kinetics dataset from scratch using only RGB input
- pretrained ResNeXt-101 achieved 94.5% and 70.2% on UCF-101 and HMDB-51
- Appearance-and-Relation Networks for Video Classification by Limin Wang et al, 2017
- https://github.com/wanglimin/ARTNet
- trained on the Kinetics dataset from scratch using only RGB input
- 70.9% and 94.3% on HMDB51 UCF101
- Five video classification methods implemented in Keras and TensorFlow by Matt Harvey, 2017
- https://github.com/harvitronix/five-video-classification-methods
- Video Understanding: From Video Classification to Captioning by Jiajun Sun et al, 2017
- Video Classification using Two Stream CNNs, 2016 code based on articles below
- Two-Stream Convolutional Networks for Action Recognition in Videos
- Fusing Multi-Stream Deep Networks for Video Classification
- Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification
- Towards Good Practices for Very Deep Two-Stream ConvNets
- Beyond Short Snippets: Deep Networks for Video Classification by Joe Yue-Hei Ng et al, 2015
- In order to learn a global description of the video while maintaining a low computational footprint, we propose processing only one frame per second
- Large-scale Video Classification with Convolutional Neural Networks by Andrej Karpathy et al, 2014
Visualization
Multiple Modalities
Open problems
- Recycled goods (not solved, no dataset)
- Safety symbols on cardboard boxes (not solved, no dataset)
- Distributed Training: You can’t choose the number of workers and parameter servers independently
- Job Startup Latency: Up to 5 minutes single node
- Hyper Parameters Tuning: In-Preview, and only supports the built-in algorithms
- Batch Prediction: Not supported
- GPU readiness: Bring your own docker image with CUDA installed
- Auto-scale Online Serving: You need to specify the number of nodes
- Training Job Monitoring: No monitoring
- https://github.com/google-ar/arcore-android-sdk
- https://github.com/google-ar/sceneform-android-sdk
- Cloud Anchors android codelab
- https://github.com/google-ar/arcore-ios-sdk
iOS framework from Apple to integrate machine learning models into your app.
Apple framework used with familiar tools like Swift and macOS playgrounds to create and train custom machine learning models on your Mac.
Pros:
- let users train their own custom machine learning algorithms from scratch, without having to write a single line of code
- uses Transfer Learning (the more data and customers, the better results)
- is fully integrated with other Google Cloud services (Google Cloud Storage to store data, use Cloud ML or Vision API to customize the model etc.)
Cons:
- limited to image recognition (2018-Q1)
- doesn’t allow to download a trained model
- Powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models on Google Cloud Platform. It runs on Google Compute Engine and connects to multiple cloud services easily so you can focus on your data science tasks.
- Built on Jupyter (formerly IPython), which boasts a thriving ecosystem of modules and a robust knowledge base.
- Enables analysis of your data on Google BigQuery, Cloud Machine Learning Engine, Google Compute Engine, and Google Cloud Storage using Python, SQL, and JavaScript (for BigQuery user-defined functions).
Intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. Cloud Dataprep is serverless and works at any scale. Easy data preparation with clicks and no code.
- Samples & Tutorials
- Samples for usage
- Distributed Training: Specify number of nodes, types, (workers/PS), associated accelerators, and sizes
- Job Startup Latency: 90 seconds for single node
- Hyper Parameters Tuning: Grid Search, Random Search, and Bayesian Optimisation
- Batch Prediction: You can submit a batch prediction job for high throughputs
- GPU readiness: Out-of-the box, either via scale-tier, or config file
- Auto-scale Online Serving: Scaled up to your specified maximum number of nodes, down to 0 nodes if no requests for 5 minutes
- Training Job Monitoring: Full monitoring to the cluster nodes (CPU, Memory, etc.)
- Automation of ML: AutoML - Vision, NLP, Speech, etc.
- Specialised Hardware: Tensor Processing Units (TPUs)
- SQL-supported ML: BQML
- entiry recognition: extract information about people, places, events, and much more mentioned in text documents, news articles, or blog posts
- sentiment analysis: understand the overall sentiment expressed in a block of text
- multilingual support
- syntax analysis: extract tokens and sentences, identify parts of speech (PoS) and create dependency parse trees for each sentence
- Detect Faces (finds facial landmarks such as the eyes, nose, and mouth; doesn’t identifies a person)
- Scan barcodes
- Recognize Text
- speech recognition
- word hints: Can provide context hints for improved accuracy. Especially useful for device and app use cases.
- noise robustness: No need for signal processing or noise cancellation before calling API; can handle noisy audio from a variety of environments
- realtime results: can stream text results, returning partial recognition results as they become available. Can also be run on buffered or archived audio files.
- over 80 languages
- can also filter inappropriate content in text results
- Supports more than 100 languages and thousands of language pairs
- automatic language detection
- continuous updates: Translation API is learning from logs analysis and human translation examples. Existing language pairs improve and new language pairs come online at no additional cost
- Label Detection - Detect entities within the video, such as “dog”, “flower” or “car”
- Shot Change Detection - Detect scene changes within the video
- Explicit Content Detection - Detect adult content within a video
- Video Transcription - Automatically transcribes video content in English
- Object recognition: detect broad sets of categories within an image, ranging from modes of transportation to animals
- Facial sentiment and logos: Analyze facial features to detect emotions: joy, sorrow, anger; detect logos
- Extract text: detect and extract text within an image, with support of many languages and automatic language identification
- Detect inapropriate content: fetect different types of inappropriate content from adult to violent content
Experiments Frameworks
Tools to help you configure, organize, log and reproduce experiments
Jupyter Notebook
Lobe is an easy-to-use visual tool (no coding required) that lets you build custom deep learning models, quickly train them, and ship them directly in your app without writing any code.
- Annotate images for computer vision tasks using AI
- https://github.com/supervisely/supervisely
- Data visualization tool created by Tableau Software.
- Connects to files, relational and Big Data sources, allows transforming data into dashboards that look amazing and are also interactive.
- TensorFlow Hub
- https://github.com/tensorflow/models/tree/master/research
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android
- TF Classify
- TF Detect
- TF Stylize
- TF Speech
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/examples
- TF Classify
- TF Detect
- TF Speech
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/java/demo
- TF classify using tflite model
- Freeze tensorflow model graph
- TensorFlow Estimator APIs Tutorials
Apple python framework that simplifies the development of custom machine learning models. You don’t have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.
Playgrounds
- Vision Kit - Do-it-yourself intelligent camera. Experiment with image recognition using neural networks on Raspberry Pi.
- Voice Kit - Do-it-yourself intelligent speaker. Experiment with voice recognition and the Google Assistant on Raspberry Pi.
IDEs
- https://colab.research.google.com
- Weka
Repositories
- https://github.com/bulutyazilim/awesome-datascience
Models
Decision Trees
Pros:
- can model nonlinearities
- are highly interpretable
- do not require extensive feature preprocessing
- do not require enormous data sets
Cons:
- tend to overfit
- fixed by building a decision forest with boosting
- unstable/undeterministic (generate different results while trained on the same data)
- fixed by using bootstrap aggregation/bagging (a boosted forest)
- do mapping directly from the raw input to the label
- better use neural nets that can learn intermediate representations
Hyperparameters:
- tree depth
- maximum number of leaf nodes
Distillation
Embedding models
- https://github.com/Hironsan/awesome-embedding-models
- gensim’s word2vec (embedded words and phrases)
- gensim’s doc2vec
- https://github.com/jhlau/doc2vec
- see recursive autoencoders
- see bag-of-words models
Evolutionary Algorithms
Metrics of dataset quality
- Statistical metrics
- descriptive statistics: dimensionality, unique subject counts, systematic replicates counts, pdfs, cdfs (probability and cumulative distribution fx’s)
- cohort design
- power analysis
- sensitivity analysis
- multiple testing correction analysis
- dynamic range sensitivity
- Numerical analysis metrics
- number of clusters
- PCA dimensions
- MDS space dimensions/distances/curves/surfaces
- variance between buckets/bags/trees/branches
- informative/discriminative indices (i.e. how much does the top 10 features differ from one another and the group)
- feature engineering differnetiators
Neural Networks
Approaches when our model doesn’t work:
- Fetch more data
- Add more layers to Neural Network
- Try some new approach in Neural Network
- Train longer (increase the number of iterations)
- Change batch size
- Try Regularisation
- Check Bias Variance trade-off to avoid under and overfitting
- Use more GPUs for faster computation
Back-propagation problems:
- it requires labeled training data; while almost all data is unlabeled
- the learning time does not scale well, which means it is very slow in networks with multiple hidden layers
- it can get stuck in poor local optima, so for deep nets they are far from optimal.
Capsule Networks
Convolutional Neural Networks
Deep Residual Networks
Distributed Neural Networks
Feed-Forward Neural Networks
Gated Recurrent Neural Networks
Generative Adversarial Networks
Long-Short Term Memory Networks
Recurrent Neural Networks
Symmetrically Connected Networks
Reinforcement Learning
Guidelines
- Stanford CS-230 cheatsheets
- The top concepts of Deep Learning, CNNs and RNNs summarized in 3 short pages
- AI Transformation Playbook by Andrew Ng, 2018
- Steps for transforming your enterprise with AI, which I will explain in this playbook:
- Execute pilot projects to gain momentum
- Build an in-house AI team
- Provide broad AI training
- Develop an AI strategy
- Develop internal and external communications
- AI at Google: our principles, 2018
- Rules of Machine Learning: Best Practices for ML Engineering by Martin Zinkevich, 2018
- Practical advice for analysis of large, complex data sets by PATRICK RILEY, 2016
- What’s your ML test score? A rubric for ML production systems by Eric Breck, 2016
- Machine Learning: The High Interest Credit Card of Technical Debt by D. Sculley et al, 2014
- Complex Models Erode Boundaries
- Entanglement
- Hidden Feedback Loops
- Undeclared Consumers
- Data Dependencies Cost More than Code Dependencies
- Unstable Data Dependencies
- Underutilized Data Dependencies
- Static Analysis of Data Dependencies
- Correction Cascades
- System-level Spaghetti
- Glue Code
- Pipeline Jungles
- Dead Experimental Codepaths
- Configuration Debt
- Dealing with Changes in the External World
- Fixed Thresholds in Dynamic Systems
- When Correlations No Longer Correlate
- Monitoring and Testing
- Principles of Research Code by Charles Sutton, 2012
- Patterns for Research in Machine Learning by Ali Eslami, 2012
- Lessons learned developing a practical large scale machine learning system by Simon Tong, 2010
- The Professional Data Science Manifesto
- Machine Learning Glossary
Deep learning
- Deep Learning: A Critical Appraisal by Gary Marcus, 2018
- Deep learning thus far is data hungry
- Deep learning thus far is shallow and has limited capacity for transfer
- Deep learning thus far has no natural way to deal with hierarchical structure
- Deep learning thus far has struggled with open-ended inference
- Deep learning thus far is not sufficiently transparent
- Deep learning thus far has not been well integrated with prior knowledge
- Deep learning thus far cannot inherently distinguish causation from correlation
- Deep learning presumes a largely stable world, in ways that may be problematic
- Deep learning thus far works well as an approximation, but its answers often cannot be fully trusted
- Deep learning thus far is difficult to engineer with
- Software 2.0 by Andrej Karpathy, 2017
Interview preparation
MOOC
Google oriented courses
- https://developers.google.com/machine-learning/crash-course/
- for beginners, explains hard things with simple words
- from google gurus
- uses TensorFlow and codelabs
- https://www.coursera.org/specializations/gcp-data-machine-learning
- shows how to use GCP for machine learning
Books
## NLP
Statistics
Datasets
- https://ai.google/tools/datasets/
- https://toolbox.google.com/datasetsearch
- Microsoft Research Open Data
- users can also copy datasets directly to an Azure based Data Science virtual machine
3D
- ScanNet - RGB-D video dataset annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations
- SceneNet - Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth
Audios
- The VU sound corpus - based on https://freesound.org/ database
- AudioSet - consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos
Images
Videos
Research Groups
Cartoons
The Browser of a Data Scientist