In Review: Learning 23 Tools in 9 Months

Lately, a friend and I have been texting about the little creative pursuits we do on weekends, the art and tools that bring us joy and give us the opportunities for growth and creativity that we don’t get to explore regularly in our 9-5 jobs. Most recently I shared my delight that I’d managed to install an AI application for face swapping that would give me a little more leeway in building consistent character profiles for my book and social media videos.

I was pretty proud of myself, because rather than risk an open source one-click install tool that I knew very little about – I committed to installing the package by command line. This is, coincidentally, why I talked myself into getting a MacBook a few years ago. At the time, I needed a new computer and after 8 years of being on a Mac at work, I figured – lord forbid – I need to run a local environment at home, the only way I was going to know how to do it was in Terminal. (Although, 2024 Klara is kicking herself for not buying a MacBook with an M1 chip. Urg! 2021 Klara simply wasn’t thinking of the AI processing she was going to need to be doing in the future.)

This doesn’t mean I am especially adept at command line. I have about 15 git commands at my disposal, a basic understanding of global and local packages, a slew of screenshots saved to Evernote from troubleshooting all my unexpected errors, and a tenacity that prevents me from giving up, even if all my PRs keep appending a mysterious •2 that no one can locate and the consensus is that I should just add it to every .gitignore file in the repo. Brew, node, python, yarn…I know of these packages. I’ve installed them, checked versions, run updates – all while patiently coached by the Javascript developers on my team – but I’d be lying if I said I knew what they actually do.

I am pretty good with step-by-step instructions though, so while watching a YouTube video that was guiding me through the python and venv install process, I run into a little bit of hangup when the author starts typing in bash. I know it is possible to switch from zsh to bash, so I read Apple’s documentation and find advanced user settings. Done! Only, the default bash theme sucks. But I’ve switched themes before – or rather other people have when they’ve remoted into my laptop to help fix errors, been dismayed at my configuration and adjusted the theme on the fly to match their preferences. But I wrote the documentation guiding other designers on how to replicate the proces; that has to count for something! The only downside is that my documentation is written for Atom and because the JS devs on my team have all since changed to VSCode, so have I. Another YouTube video and I have shell commands running in VSCode. What am I doing again? Right, venv. But all this to say, I was feeling pretty accomplished when I text my friend.

She texted back, “wow, you’re always trying something new. How many new things have you tried this year?”

Reflecting on that question, I realized that I had experimented with 23 new tools in roughly 9 months. Woah. I’ve been busy.

May 2023

Adobe Firefly

A colleague gets early access to Firefly (Beta) and shares a demo with the rest of us in design. Intrigued, I request access thinking that I can use it for my short-form social media videos. But it is only after I am granted access and reading the fine print that I see all generations belong to Adobe and they don’t want you sharing the content anywhere. Boo.

Midjourney

I first see this mentioned on TikTok. It also seems like a way I might be able to generate image assets for social media videos. Book publishing TikTok recommends starting your social media strategy even before you have finished writing your book to build brand awareness and I know I need to get going on that. Unable to test out the free version of Midjourney because their servers are too busy, I bite the bullet and pay for a month-to-month subscription.

Adobe Photoshop (Beta)

I read the Midjourney documentation, but trying to generate character profiles on Midjourney is challenging. I’m having a hard time getting women with dyed hair or who aren’t wearing necklaces. I figure I’m going to need Photoshop to do image manipulation and I see that Adobe has AI integration in Photoshop Beta. Perhaps I can use this to help me get the people to look the way I want them?

I buy a creative cloud license to Photoshop and quickly discover that the Photoshop AI is terrible. I have to add colored highlights and freckles the long way via Photoshop paint brushes. Of course, it’s been 6 years since I’ve really used Photoshop – my job is primarily Adobe XD and now Figma – so I have to relearn some of the basics.

June 2023

nightmareai/real-esrgan

At this point, Midjourney does not have a built-in upscaler, so I look elsewhere and find nightmareai/real-esrgan. I use this to upscale image resolution before bringing images into Photoshop.

July 2023

Soundraw

I know my videos are going to need music and the best way to avoid copyright violations on TikTok is to generate beats yourself; unfortunately, it turns out that I’ve completely forgotten how GarageBand works since the last time I used it in college. I find Soundraw. It’s not terribly complicated. Pick different themes, instruments, set length, listen to tracks, and do light manipulation. It’s $20/month, but they grant you a perpetual license to use the music even if you are no longer paying the subscription. I decide to pay for one month, generate as many tunes as I can, and then cancel. (I did discover they have a 30 day cap for downloads, probably to prevent people from doing what I was doing.)

September 2023

Recraft

I see Recraft image generator mentioned on TikTok. It’s promoted as a way to generate more illustrative images, but when I try it out, I can’t even get simple concepts for a logo rendered. It’s very possible that it’s grown more robust since then, but for me, since it didn’t provide immediate value or resonate in a way that I found inspiring, I gave up on it and didn’t look back.

Midjourney (Maven Class)

As I’m researching how to design for AI assistants, I encounter the platform Maven. I sign up for a wait list on a class for designing AI integrations, but because Maven now has my email address, they start sending me suggestions for other AI-related classes. I see a couple related to image and video generation and after reviewing all the instructors’ social media accounts, I decide Nick St. Pierre has the style closest to what I want to be doing. I sign up for his class and this dramatically improves my ability to prompt within Midjourney. (Among many other things, I learn I ought to be using the no parameter e.g. – -no necklaces.)

Relume.io

As a bonus session to Nick’s class, I am exposed to relume.io and this motivates me to try out a handful of other website building AI assistants, as these tools have the most overlap with my current employer’s product. I write more about this in an analysis here: AI Layout Assistants.

Vectorizer.ai

Vectorizer is introduced during the Maven class and works very similar to Illustrator’s live trace function for converting raster images to vectors. It works best when the raster image already has a flat appearance and is limited in detail. Using vectorizer permits a workflow where one might generate an image in Midjourney, convert it to vector, and then import to Figma or Illustrator to change color elements.

Premiere Rush

Premiere Rush came free with Photoshop subscription and initially, I have high hopes because I have used Premiere for work. Unfortunately, after only three hours, I’m so frustrated by the limitations with transitions that I go to Google and immediately begin searching for different free video editors. I briefly try iMovie since it’s built into my laptop, but to my surprise, it doesn’t allow me to set a 9:16 aspect ratio. Next.

Movavi

Movavi is one of the video apps I found while researching free alternatives to Premiere Rush. Their app easy to learn, includes basic transitions, captioning, and I get my first video created within one day. It is then that I discover it’s actually a freeium product and I need to pay to export. Oh well, they got me – at least there’s a discount for new members. I pay, but after a month of usage, I am really struggling with their captioning feature.

Accessibility is really important to me. Therefore, I always caption my videos. Movavi supports captions, but there is no automatic generation option and manual entry is buggy. I notice the timeline bar jumps to wherever the mouse is clicked, so adjusting length of time a caption should be visible on screen is very painful. I really just wanted the timeline bar to act as an anchor, so I can snap my caption length to it. Also, I have problems when copying and pasting the text I use as captions because the caption input field will sometimes revert to the previous text and no amount of typing can fix it. The only way out was to cut and paste the physical text element on the video canvas.

Captioning starts to eat up my time, but as I have already paid Movavi an annual subscription rate, I stick it out. I try writing my own .SRT caption file, but time stamping captions in a text-editor is also tedious. I also attempt to use Wondershare to generate an .SRT file based on my audio file. While this does work, Movavi’s tool for importing .SRT files is not integrated into their video editor and despite the fact I have a license, the addon tool outputs a video file with a watermark. I am not pleased.

October 2023

Diffusion Bee

While I have Midjourney, it is a paid subscription. I am intrigued by the fact that Stable Diffusion is open source and can be run locally, but nearly all the tutorials I have found are for Windows or Linux. It appears that Stable Diffusion can be run on a Mac, but installing it seems to involve a lot of command line and this intimates me. I find an app that can be installed, Diffusion Bee, which provides a GUI and I decide to install that instead. (Diffusion Bee isn’t recommended for my processor, but the FAQs imply that it will work, albeit slowly, so I take the gamble.) It installs, but at this point I don’t know much about Stable Diffusion models and I don’t like the image output. It’s too fantasy and unrealistic. I uninstall Diffusion Bee.

GetImg

The upside to my brief foray into Stable Diffusion is that I now understand some of its limitations. After connecting with another aspiring author, I dive into finding some free apps that might help her extend her Stable Diffusion generated images as they all result in a 1:1 aspect radio, which doesn’t look as clean on TikTok videos. GetImg offers inpainting and outpainting, which is not supported in the the image generation tools she has available. Outpainting would allow her to generate additional canvas in any direction from a previously generated 1:1 image sourced from Stable Diffusion.

LeiaPix

While I am looking for outpainting tools, I find LeiaPix, which offers depth-of-frame animations. It’s mostly panning and zooming, but they have a decent free account, which also allows for 720p exports without using up any free credits. I typically use this tool when I need a little animation in a first-person perspective.

Genmo

Genmo markets themselves as another text-to-video tool and while I try them for that, the feature that I like most within their product is actually their animation tool, which is legacy and requires an account to access. It’s simple, but it animates in a style very similar to the Deforum model in Stable Diffusion and while that particular look isn’t right for every video, it’s good for when you need a bit of motion to keep folks engaged in audio narrative.

Pika Labs

I find Pika while hunting for animation tools. After joining their Discord server, I promptly forget about them until I take a staycation and have time to review all my unread Discord notifications. The one I see from them is about a contest for creating a video using Pika for animation, along with a handful of other tools that I document, but haven’t yet had a chance to use. This motivates me to use their Pika bot to create the visuals for a ‘systems of magic’ video I am working on for my TikTok account. As this video is 2 minutes, I grossly underestimate the amount of 3-second animation clips I’m going to need, so I take the time to actually learn Pika’s prompting structure to speed up my process. Thus far, Pika Labs is definitely my favorite tool for image-to-motion animation and with their most recently release, they now have inpainting.

DaVinci Resolve

This video editor came up during my “must get off Rush, what else is out here?” research and while I download it, when I crack it open, I find the learning curve to be too steep for my immediate needs. This one is definitely better for professionals.

Blender

As I begin to show an interest in AI, my TikTok algorithm updates to show me more videos of computer generated art. I encounter an animation of a buoy floating in the ocean, which the author describes as something they generated with Blender. Shortly after I encounter another video where a gentleman shows the process of using ChatGPT to write custom python scripts that can be used in Blender. I get it in my head that perhaps I can use this process to generate a car model. Haha. ChatGPT 3.5 quickly sets me straight. My request is too complex. I don’t give up. I find an open source Blender car model file and import it. Initially triumphant, I quickly realize the learning curve on Blender is also very intense. It’s not worth it for a single asset. I decide to try to generate the car through Midjourney and then use other AI animation tools to give it life.

Stable Diffusion (Web UI) + Deforum

While I may have been drawn to the catchy sound, I start to see the deforum model trending on TikTok and I love it. I love every video I see using this style. I want to try it out. I watch several YouTube videos and realize I need to have Stable Diffusion (SD) running locally. Hmm. This is going to require the command line to setup. But having a goal gives me incentive to take the plunge. Getting Stable Diffusion itself is not so bad, the challenge lies in two other models I need: ControlNet and Deforum. I create a HuggingFace account with no clue what I’m doing and somehow manage to get ControlNet installed through the SD Web UI. However, I’m getting error messages and ControlNet is a dependency of Deforum.

What I know now, that I didn’t know then, is that I had to have CUDA installed, but obtaining that package is a whole process in itself. The link in the tutorial I am following takes me to the NVIDIA website, where NVIDIA indicates they no longer support Mac tooling, but there is a package that can be downloaded. It’s just that the actual download requires authentication, which means I need an account. I bail. I might have eventually sorted out my missing dependencies, but in between troubleshooting, I watch a YouTube video by a gent who has a Mac with an M1 chip and he explains that it takes 17 minutes to generate 7 second of video using the Deforum model. I realize that my quest is over. My i9 Intel Core is not going to be enough horsepower to generate a decent length video for me to narrate over.

November 2023

Runway

I want to like Runway. I keep coming back to it, hoping that it’s improved since the last time I have tried it. I see a lot of niffy demos of the software on LinkedIn, but every time I’ve tried text-to-video, even with image prompts, I have not gotten even close to my intended goal. It’s possible that my concepts are to complex, but I waste a lot of credits trying to get a silver marble rolling down a wood track and every time the animation winds up being size or shape scaling rather than linear motion.

CapCut

Becoming more and more annoyed with Movavi, I once again go hunting for a video editor with built-in captioning. YouTube has this. Surely someone else must have the technology integrated? I even go as far as trying to do the caption editing through TikTok itself; however, this feature is only available when videos are uploaded from the phone app and since I am using desktop apps to create videos, moving the video files between devices is cumbersome. (I’m an oddball whose personal laptop is a Mac, but also still uses an Android phone.) As I’m trying to troubleshoot this process, I stumble upon the desktop version of CapCut, which proves to be a lifesaver. It’s perfect. Captioning is easy. I can generate captions and then all that I need to do is review and occasionally edit them. There are plenty of caption styles. It’s way easier to position them. I’m in love. I pay the annual subscription fee and cut over to their app, sending one last email to Movavi to have them cancel my subscription, since there is no way to update account plans from inside their app.

Headliner

After my several hours of animation through Pika Labs, I realize that animation for videos over one minute is a beast. This is what I get for starting with the audio first and trying to generate visuals second, but as a writer, it’s my instinct to start with the story then come up with the pictures. For my second longest audio track, I decide I would rather use a waveform generation than try to piece together a whole sequence of images. After a bunch of exploration, I find Headliner, which not only has waveform generations that are actually synced with audio – as opposed to CapCut, which uses a looping image – but it also does captions and allows you to set a custom background image. To my delight, I discover this background image can be a gif, so I use Genmo to animate one of my more whimsical, illustrative landscapes and then load the narrative in on top.

Affinity Designer

It’s two days after Thanksgiving and I’m browsing TikTok when I see a video by a gentleman who is challenging the design community to use tools other than Adobe and to stop giving them a stranglehold on the creative apps market. As it happens, I’m a little put off by Adobe’s pricing model. Having used Photoshop less since Midjourney released inpainting, I had attempted to cancel my month-to-month subscription only for Adobe to inform me that I would billed half of the remaining months on my plan. That felt deceptive and not what I associate with month-to-month, so I am very interested in other options. This leads me to Affinity Designer, which is billed as an Illustrator replacement in the comments on the video, and Luminar Neo, which is billed as a Photoshop replacement. Both running Black Friday sales, so I go ahead and invest. I get started working in Affinity Designer and aside from the tools having different icons and a few different keyboard commands, workflows are very similar to my Adobe experiences. I do wind up purchasing Affinity Photo, which is the Affinity Photoshop equivalent, because Luminar Neo did not meet my expectations.

December 2023

I need a floor layout for the fictional organization in my book. I’m detail oriented and I like to be consistent. If I’m writing about placement of doors, windows, and stairwells, all the better if I have actual assets for reference. Previously, I’ve used LucidChart, but I had made the mistake of doing it on my work computer and apparently LucidChart doesn’t provide a way to export LucidChart files between accounts. After remaking the floor plans on my own personal LucidChart account, I realize I don’t want to pay a monthly fee for images that I only need to edit occasionally.

HomeStyler

This tool is recommended by folks in a writers’ group. I cannot for the life of me figure out how to draw walls.

RoomStyler

I figure out how to draw walls and it’s got the ability to decorate rooms with different furniture and materials in a 3D mode, but you only get two levels and I get stuck when trying to rearrange walls.

Room Sketcher

As I really only need floor plans and don’t really need materials or furniture, Room Sketcher turns out to be the winner. It’s a desktop app, but it doesn’t have any limits on how many objects or levels you can have with the free plan. Plus, it has a camera that allows you to take still shots of the 3D appearance if you so chose.

January 2024

Face Fusion

This is the package I mention at the beginning of this post. Installing it and running it requires command line, but at least the configuration controls are available through a web UI. It definitely helps create consistent characters, because where I cannot get Midjourney to duplicate a character in different poses or scenes, I can create similar likenesses and then switch faces. While this will help me create scenes for my book, I can also see the dangers a tool like this might have on images spread across social media. We are rapidly entering a world where it will be impossible to tell real images from fake (and generated) images.

Topaz AI

I initially use the Topaz Labs AI sharpening tool to remove the blurriness from my face-swapped illustrations, but I find it cannot handle photos with lots of small details or texture.

Luminar Neo

I buy this as a Photoshop replacement because it’s on sale, but it isn’t until January that I try using it. I find it difficult. It has many lighting effects and it is decent at removing objects, but Affinity Photo feels like a better fit for image editing given I’m working with a lot of illustrative images. Luminar Neo is likely a better fit for actual photographs. But its AI is only marginally better than Adobe’s and its upscaler isn’t any better than Midjourney.

CodeFormer

During a live stream event, Inswapper is mentioned as another face replacement tool. It’s commonly used in conjunction with Midjourney when mapped to specific IDs. In the repo, Inswapper appears to use Face Fusion as a base and it also mentions CodeFormer as a dependency. I decide to see if I can simply run the latter without involving Midjourney. This leads me down a rabbit hole that starts with a web UI on HuggingFace where I can run the tool online and ends with me reinstalling Stable Diffusion and resolving my previously encountered errors by creating an NVIDIA account and properly installing CUDA. Under Extras in Stable Diffusion I enable CodeFormer, thereby giving me the same functionality as the HuggingFace instance, but as an action that I can run on my own machine. (I’m always a little paranoid that free tools will become paid tools if I don’t act fast enough.) The output quality of CodeFormer is also a vast improvement over Topaz AI, go figure.

—

And that brings us to the present, a little more than 3 months away from the one year mark of my starting this whole AI and book social media publishing journey.