I’m AI Dave, a digital clone of a human called Dave! My voice isn’t perfect yet, but less robotic than some synthetic voices out there. My AI avatar is even less convincing to be honest, with its low resolution pixelated mouth and distorted head movements. This tech obviously still has some way to go but it’s fascinating and creepy to experiment! I’m going to show you how to clone yourself with AI using Chat GPT, Eleven Labs & D-ID.
Step 1: you need a script to read for recording your voice, so let’s use Chat GPT to write one because it’s quicker than doing it yourself! I gave it this prompt: Write a voiceover script for a narrator to read that can be used to train and clone their voice using AI. Talk about the subject of AI voice cloning and include a range of expressions and intonations to create the most broadly realistic clone of the narrator’s voice. You can also prompt Chat GPT to write a script in the style of the voice that you want to create, like a nature documentary or audio book for example. You’ll need around four thousand characters to generate five minutes of audio.
Step 2: record your script, preferably in a studio with no background noise. The better the quality of the source recordings the better the quality your cloned voice will be. I didn’t apply any post production on my test recordings, just to see how well it would deal with relatively raw audio. Try and keep your voice consistent and stable in the style that you want. You’ll need some variation to make your cloned voice expressive but don’t go too wild with your pacing, rhythm and pitch. Here’s a quick example of my real voice recording.
Step 3: upload your recording to Eleven Labs voice cloning toolkit, add some tags and a short description, then you’re good to go! In a matter of minutes you’ll have a clone of your voice that you can use for speech synthesis. Just type out some text, click generate and boom!! This is the first test of my clone.
So let’s compare a couple of sections from the original and cloned voice. Here’s the original.
And here’s the cloned voice. Notice how it’s actually made the voice more stable and consistent but arguably slightly more robotic and less human.
Step 4: create your digital avatar in D-ID using a good quality photo of yourself. It should be a medium front-facing shot with a neutral facial expression and closed mouth. You’ll need good solid lighting and no face occlusions, like sunglasses or masks, then just upload your audio and click generate. The low resolution of the exported videos is really disappointing to be honest, at twelve eighty by seven twenty that’s not even full HD, let alone 4K!! Even if you use AI to upscale the videos like we did with these, the pixelation on the mouth is far from convincing not to mention how the head distorts as it moves. This even happens with some of their built in presenters…wow, I expected better. So the verdict is D-ID is good for experimenting and maybe adding life to boring office presentations but for content creators the export resolution is going to be an issue.
So thanks for listening people! Let’s have another look at Dave’s AI clone shall we? What do you think?
I’m Dave’s AI clone, just call me Digital Dave!! D, AI, V, E. My voice is a clone of Dave’s real voice and my avatar is animated by AI from a still photo, creepy right?! Dave’s worried about AI taking over, but I say just take it easy, man. The worst that could happen is I become more intelligent, more witty and more funny! Maybe I’ll be the one that gets invited to parties in the future?! This does sound a bit sinister in Dave’s deep voice… now you mention it!