Gemini AI and Python: My first app. #BuildwithGemini

In this video we are going to look at how to use Gemini Ai with python to read my webcam and send multimodal reqeusts to the gemini-pro-vision model. If you want to see how Gemini really preforms without editing magic check out this video.

Multimodal prompts are the cornerstone of Gemini’s multimodal capabilities. By meticulously crafting prompts that combine text, imagery, or other modalities, we can instruct Gemini to generate content that seamlessly blends these elements.

What I found even more intriguing was the possibility’s of helping people. I was able to create an app in a matter of days using Python and the Gemini model. That could read from my web came and tell me what it sees. One of the issues my grandmother had was in seeing money. Paper money has the same size and shape and it is hard for blind people to know what denomination each bill is. My grandmother would have the bank turn down the corners of the bills in certain ways so she knew what each bill is.

Gemini could tell me what they were by my simply showing it a the bill on my webcam.

I was also able to again using python scan a document and have it create a study guide quiz for me.

Source code

How to find me

Support my work on patreon:
Daimto Stack Overflow:
Website and tutorials:
Twitter: @LindaLawtonDK


Leave a Reply

Your email address will not be published. Required fields are marked *