Programmatic video editing with Swift ⁄ Frame-by-frame video editing pipeline

Hi there! 👋

Welcome to my feed on video editing with Swift and AVFoundation. Today, we’ll delve into the frame-by-frame video editing pipeline.

Frame-by-frame processing is an essential aspect when programming common video editing tasks. If you’re developing a video editor app using Swift on iOS or macOS, creating a functional video pipeline is indispensable. This pipeline should be capable of requesting frames for rendering, storing them in files, or displaying them in previews.

Tasks

In general, when developing a video editor app, you’ll encounter the following common tasks:

Creating a video from images
Applying overlays or watermarks to videos
Adding captions and subtitles to videos
Cropping and resizing videos
Combining multiple videos
Implementing animated video transitions
Adding animations
Applying video filters and effects

The solution to these tasks requires setting up a frame-by-frame video processing pipeline.

Requirements

To build a frame-by-frame video processing pipeline, we need:

An image container supporting transform and pixel change operations
A delivery mechanism for source frames with timings and the ability to alter frames

Solutions

I believe CIImage could work well as an image container since it’s convertible to CVPixelBuffer and vice versa, which is quite convenient. Additionally, CIImage offers transformation capabilities, and CIFilters can be applied.

As for the delivery mechanism, iOS and other Apple platforms offer two approaches that meet our requirements:

AVVideoComposition which invokes AVAsynchronousCIImageFilteringRequest for each frame
AVVideoComposition with a custom video compositor implementing AVVideoCompositing

In this post, we’ll focus on the AVAsynchronousCIImageFilteringRequest approach.

Download source code

AVAsynchronousCIImage

FilteringRequest

Let’s start with the simplest approach. You can modify the source code from this post or download it by clicking the button above. To set up frame-by-frame video processing using this approach, simply initialize AVVideoComposition with a request handler:

func buildComposition() async -> AVVideoComposition {
    let asset = #your video wrapped in AVAsset
    let composition = AVMutableVideoComposition(asset: asset) { request in
       let sourceImage = request.sourceImage
       // TODO: edit video frame here by changing source image
       request.finish(with: sourceImage, context: nil)
   }

   return composition
}

This video composition essentially does nothing to the video; it merely passes the source video frame to the resulting video. Let’s add a red rectangle over the video. Upgrade your buildComposition function:

AVMutableVideoComposition(asset: asset) { request in
    let rect = CGRect(x: 0.0, y: 0.0, width: 300.0, height: 300.0)
    let image = CIImage.red.cropped(to: rect)
    let targetImage = image.composited(over: request.sourceImage)

    request.finish(with: targetImage, context: nil)
}

After running the code, you’ll get a video with a red rectangle added.

Time

To get the time of the frame being processed, access the compositionTime property of the request:

let time = request.compositionTime

With the CIImage representing the frame and CMTime representing the frame timing, all that’s left to do is to edit the CIImage depending on your task.

Applying a Filter Effect to the Video Frame

For example, you can apply a CIFilter to the frame using the following code:

private func buildComposition() async -> AVVideoComposition {
    let filter = CIFilter.hexagonalPixellate()
    filter.scale = 50.0

    let composition = AVMutableVideoComposition(asset: asset) { request in
        let sourceImage = request.sourceImage

        filter.inputImage = sourceImage

        let outputImage = filter.outputImage ?? sourceImage

        request.finish(with: outputImage, context: nil)
    }

    return composition
 }

Animation

You can even animate the CIFilter using the following code:

private func buildComposition() async throws -> AVVideoComposition {
    let maxScale = 200.0
    let filter = CIFilter.hexagonalPixellate()
    let duration = try await asset.load(.duration)
    let composition = AVMutableVideoComposition(asset: asset) { request in
        let sourceImage = request.sourceImage
        let time = request.compositionTime
        let ratio = time.seconds / duration.seconds

        filter.scale = Float(max(1.0, maxScale * ratio))
        filter.inputImage = sourceImage

        let outputImage = filter.outputImage ?? sourceImage

        request.finish(with: outputImage, context: nil)
    }

    return composition
}

The CIFilter has an input parameter, scale, which we animate. maxScale defines the highest possible value. The ratio changes from 0.0 to 1.0 while the video is playing or being exported. So, frame by frame, the scale number increases.

This approach seems straightforward, doesn’t it?

Yes, if your task is to edit a single video. It could involve cropping and resizing, adding captions and subtitles, or applying filters.

The main disadvantage of this approach is that it operates with a single video track and is not suitable if you want to combine several videos and animate transitions. For combining multiple videos, inserting images as frames, or implementing animated transitions, consider the next approach – custom video compositor implementing AVVideoCompositing.

However, if your task is to edit a single video, the AVAsynchronousCIImageFilteringRequestapproach is suitable.

Conclusion

In this post, we learned how to set up a frame-by-frame processing video pipeline with Swift and AVFoundation. AVAsynchronousCIImageFilteringRequest can be suitable when the task pertains to a single video. This could include tasks such as cropping and resizing, applying filters, or adding captions.

Feel free to explore and experiment further with these concepts!

Frame-by-frame video editing pipeline