Will technology replace the human resource some day?
Most experts, in most tech events, are put through the wringer by the audience with a question like above. Breakthrough inventions like Internet of Things (IoT), Artificial Intelligence, AR/VR, Image recognition are seen as the brilliant technologies, yet technology can never replace human brain. Because the execution and implementation of this technology, their control is only possible when it is in the right hands- and that’s what humans are for.
One such innovation has been recently announced by Google for Android app development- Google Cloud Vision API. This provides companies an advanced way to see through their advanced machine learning algorithms.
‘Vision’here literally means the ability to see as the API has full potential to comprehend the contents of an image and process individual pieces separately, returning a unified result really fast. This new Cloud’s intelligence API will scan the whole image and video for performing any analytics like a simple object identification, face recognition, landmark detection, emotional analysis and more. As a developer, if you’re developing an Android app that requires facial recognition or an enterprise software for the same, you’ve got an amazing API to integrate and get the functionality up and running faster than ever before.
One thing to note is that this REST-based Google API will work well for iOS apps as well as cloud storage. Having an SDK support for Java, Go Land, Node.JS, Python and JSON format, this is a more accurate and lightweight API than other image recognition APIs like OpenCV, OCR reading libraries. These latter APIs require a huge amount of data to be present before performing object matching, which makes increases the APK size and inaccurate data. On the contrary, Cloud Vision API needs no heavy project files to be included. Only the fundamental dependencies: google-api-client-android, google-http-client-gson and google-api-services-vision are sufficient for the project.
How to Include Cloud Vision API in your project
This will be interesting for all the Android app developers and Android application development companies out there to gain insight on how to enable yourself to use Google Cloud Vision API on Android. Follow the below steps:
- Either create a new project in Google Cloud Console or use an existing one
- If you have an unused account, you can proceed with a free trial. Note that it may ask you for your credit card information. Don’t shy away from entering the details because you won’t be charged for the free trial. Enable Billing for the project
- Use this link to enable the Google Cloud Vision API Or
- Open the API Manager section from the burger menu
- Select and Enable “Google Cloud Vision API”
- From the side bar, go to the credentials menu
- Click on credentials drop down menu and select OAuth Client ID
- From here, Select the Application type as Android
- Give a suitable name to this new application like Android app for Client Vision API
- Enter your SHA1 fingerprint in the desired format
- Give a package name to your app, located in defaultConfig block of gradle
- Click on Create
- And you’re through. It’s done
Before moving further, add the following dependencies to your build.gradle file.
build.gradle (app)
1
2 3 4 5 6 7 8 9 10 11 12 13 14 |
dependencies {
compile fileTree(dir: ‘libs’, include: [‘*.jar’]) testCompile ‘junit:junit:4.12’ compile ‘com.android.support:appcompat-v7:23.4.0’ compile ‘com.google.android.gms:play-services-base:9.0.2’ compile ‘com.google.android.gms:play-services-auth:9.0.2’ compile ‘com.google.apis:google-api-services-vision:v1-rev16-1.22.0’ compile (‘com.google.api-client:google-api-client-android:1.22.0’) { exclude module: ‘httpclient’ } compile (‘com.google.http-client:google-http-client-gson:1.20.0’) { exclude module: ‘httpclient’ } } |
Note that while accessing the API on Android to perform image analysis, developers need end-user’s consent for accessing their pictures from the device.
Various image analysis techniques supported in Cloud Vision API
Typically relying on deep machine learning techniques, Google’s Cloud Vision API can help search individual video frames. It is an entirely new way of seeking out key subject items within the archived videos and tag it, which can bring monetization opportunities to the old videos, especially for media companies that have massive amount of media files and data. They can benefit from their old content.
Additionally, it can also identify bulk of entities from Google’s Knowledge Graph and integrate it with metadata from Google Image Search- it’s like grouping several images in one packet.
Few of the most important roles Vision API fits the best:
Landmark detection
Scanning through the picture, the Vision API automatically identifies the location you are at and add a caption to it. Despite having a location data embedded in these images as shot by DSLRs without GPS capabilities, you can still see the landmark names using Cloud Vision API. These landmark also powers ‘Search by Image’ feature in Google images.
Face detection
Cloud Vision API has the ability to detect broad features /set of objects from the image like the face, a flower, a bag, a table or anything else. It also marks the number of faces present in a photograph and identifies the placement of individual facial features.
Logo detection
By using the new Cloud Vision API, users can identify logos. This still needs improvement and Google is trying its hands-on making this work like other functionalities.
Image Palette
Android Cloud Vision also identifies the dominant colors and notify it to the user. Though we have a Palette library on Android to do this task, this can be an add on to the system.
Sentiment Analysis
In this feature, the API recognizes the emotional attributes of people in the image like laughter, smile, sorrow, anger and other emotions. This can add a fun element to the image and can be integrated with social media apps tagging friends and cousins with different emotions.
Text detection
Optical Character Recognition (OCR) detects text within images and videos, in addition to automatic language identification spanning through a diverse set of languages.