ML Kit is a collection of machine learning APIs that can be used for both Android and iOS. ML Kit allows developers to easily integrate machine learning capabilities into their Android and iOS apps without extensive knowledge of machine learning algorithms and techniques. You can use this library to perform machine learning tasks, such as text recognition, image labeling, barcode scanning, text translation, and more.
Example: Text Recognition API to scan and extract text.
- Setting Project
All ML Kit Api requires Android Api level 19 or higher, so make sure your minSdkVersion is set to at least 19 or higher.
If you are using Gradle v7.6 or higher, modify your settings.gradle to include the following repositories:
pluginManagement {
repositories {
google()
...
}
}
dependencyResolutionManagement {
repositories {
google()
...
}
}
Now your app/build.gradle add the following dependencies:
dependencies {
// This is the dependency for text recognition
implementation 'com.google.android.gms:play-services-mlkit-text-recognition:18.0.2'
// This is the dependency for object detection
Implementation 'com.google.mlkit:object-detection:17.0.0'
}
Next, sync your project with Gradle Files and it will download the library.
The machine learning models are downloaded dynamically via Google Play Services upon first usage, but you can change this so that the models are downloaded when the app is installed. To do so, add the following code to your AndroidManifest.xml file:
<application ...>
<meta-data
android:name="com.google.mlkit.vision.DEPENDENCIES"
android:value="ocr" />
<!-- This will cause the model for text recognition(ocr) to be downloaded -->
</application>
- Adding Text Recognition
The ML Kit Text recognition API is designed to recognize and extract text from images or videos in a variety of languages and formats.
Using FilePicker
Create a variable to get the text recognition client.
private val recognizer by lazy {
TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
}
If you want to set your own executor, you can pass custom options to it:
val options = TextRecognizerOptions.Builder()
.setExecutor(executor)
.build()
TextRecognition.getClient(options)
Here, I defined this variable in an Activity, so I used the lazy function to defer the creation of this object until it is actually needed. In your Activity or Fragment, register an activity result contract for picking the image:
private val pickImage = registerForActivityResult(
ActivityResultContracts.OpenDocument()
) { uri ->
if (uri != null) {
processImage(uri) // We'll define this later
} else {
Toast.makeText(this, "Error picking image", Toast.LENGTH_SHORT).show()
}
}
Now, decide where you’d like to launch the image picker and call pickImage.launch(arrayOf(“image/*”)). In my case, I’m launching it from a button:
pickImageButton.setOnClickListener { pickImage.launch(arrayOf("image/*")) }
Next, define the method that will use the recognizer we created earlier to process the image:
// uri of the file that was picked
private fun processImage(uri: Uri) {
val image = InputImage.fromFilePath(this, uri)
recognizer.process(image)
.addOnSuccessListener { result -> extractResult(result) }
.addOnFailureListener { Log.e("MainActivity", "Error processing image", it) }
}
The TextRecognizer.process function returns a Task. You can observe its results by adding the success and failures listeners. Now, let’s create a method that receives the success result:
private fun extractResult(result: Text) {
recognizedTextView.text = result.text
}
For now, I’m just getting the extracted text from the results and displaying that in a TextView.
Using Camera
Start by adding the below code to your AndroidManifest.xml file:
<provider
android:name="androidx.core.content.FileProvider"
android:authorities="${applicationId}.provider"
android:exported="false"
android:grantUriPermissions="true">
<meta-data
android:name="android.support.FILE_PROVIDER_PATHS"
android:resource="@xml/provider_paths" />
</provider>
Next, inside res/xml, create a file called provider_paths.xml and add the following content:
<?xml version="1.0" encoding="utf-8"?>
<paths>
<cache-path
name="cache_folder"
path="." />
</paths>
Now, go back to the Activity or Fragment and define a URI for where you want the image to be stored:
private val imageUri by lazy {
val tempFile = File.createTempFile("ml_image", ".png", cacheDir).apply {
createNewFile()
deleteOnExit()
}
FileProvider.getUriForFile(
applicationContext,
"${BuildConfig.APPLICATION_ID}.provider",
tempFile
)
}
Let’s also register the takePicture contract for taking pictures. The benefit of doing this is that you won’t need to write your own code to take pictures:
private val takePicture = registerForActivityResult(
ActivityResultContracts.TakePicture()
) { success ->
if (success) {
processImage(imageUri)
} else {
Toast.makeText(this, "Error taking picture", Toast.LENGTH_SHORT).show()
}
}
You’ve seen the processImage before, everything after that is the same.
By offering pre-trained models and an easy API, the ML kit makes it easier to incorporate machine learning into mobile applications. Since many of the intricacies are abstracted away, developers concentrate on creating their apps without having to have an in-depth understanding of model training or machine learning techniques.