Gemini, the first multimodal artificial intelligence created by Google, is capable of “moving” 360° between different types of information such as texts, audio, video, images and lines of code.
We will make a couple of examples for Gemini Pro Vision.
Installation and Configuration of Google Cloud CLI
- You must have a Google Cloud access account, otherwise create one on the official website.
- Let’s set up gcloud CLI locally, so follow the Cloud SDK documentation. Google Cloud CLI requires Python 3.8 to 3.12
- In the last step of the installation it will ask you to log in, enter “y” and then, as soon as prompted, enter the correct number of your project to use. Write down the project ID which we will then insert into Python.
- If you are not logged in, try again with the gcloud auth application-default login command.
Vertex AI SDK for Python
- Open your favorite editor, in our case VS Code, create your virtual environment on Python and then from the terminal launch the command pip install google-cloud-aiplatform>=1.38
- Download the two files from our repository https://github.com/Impesud/generative-AI/tree/main/gemini and open them with your editor.
- At “Project ID” enter your chosen project ID. Remember that on Google Cloud you will have to actively pre-load the Vertex APIs.
Send Multimodal Prompt Requests
- We will send two multimodal prompt requests with images to the Gemini Pro Vision (gemini-pro-vision) model. The Gemini Pro Vision model supports prompts that include text, code, images, and video, and can output text and code.
- With the code present on gemini_image_from_uri.py we send an image via URI, while with gemini_image_from_url.py we send an image via URL.
- Run your files (example: python filename) from your terminal.
- The first file will give you the answer:
1
2
3
4
|
role: "model"
parts {
text: "The image shows a table with a white surface...
}
|
Instead, the second file will return you:
1
2
3
4
|
role: "model"
parts {
text: "The Colosseum is an oval amphitheater in the center of the city of Rome...
}
|
Once you are done with Gemini, I recommend deactivating the APIs used so as not to have extra expenses.
Don’t miss the next articles dedicated to Generative AI!
Is consulting about these new and innovative tools useful to you? Contact us.
Subscribe on LinkedIn