Hi there! 👋
Welcome to my feed on video editing with Swift and AVFoundation. Today, we’ll delve into the frame-by-frame video editing pipeline.
Frame-by-frame processing is an essential aspect when programming common video editing tasks. If you’re developing a video editor app using Swift on iOS or macOS, creating a functional video pipeline is indispensable. This pipeline should be capable of requesting frames for rendering, storing them in files, or displaying them in previews.
Tasks
In general, when developing a video editor app, you’ll encounter the following common tasks:
- Creating a video from images
- Applying overlays or watermarks to videos
- Adding captions and subtitles to videos
- Cropping and resizing videos
- Combining multiple videos
- Implementing animated video transitions
- Adding animations
- Applying video filters and effects
The solution to these tasks requires setting up a frame-by-frame video processing pipeline.
Requirements
To build a frame-by-frame video processing pipeline, we need:
- An image container supporting transform and pixel change operations
- A delivery mechanism for source frames with timings and the ability to alter frames
Solutions
I believe CIImage could work well as an image container since it’s convertible to CVPixelBuffer and vice versa, which is quite convenient. Additionally, CIImage offers transformation capabilities, and CIFilters can be applied.
As for the delivery mechanism, iOS and other Apple platforms offer two approaches that meet our requirements:
- AVVideoComposition which invokes AVAsynchronousCIImageFilteringRequest for each frame
- AVVideoComposition with a custom video compositor implementing AVVideoCompositing
In this post, we’ll focus on the AVAsynchronousCIImageFilteringRequest approach.
AVAsynchronousCIImage
FilteringRequest
Let’s start with the simplest approach. You can modify the source code from this post or download it by clicking the button above. To set up frame-by-frame video processing using this approach, simply initialize AVVideoComposition with a request handler:
func buildComposition() async -> AVVideoComposition {
let asset = #your video wrapped in AVAsset
let composition = AVMutableVideoComposition(asset: asset) { request in
let sourceImage = request.sourceImage
// TODO: edit video frame here by changing source image
request.finish(with: sourceImage, context: nil)
}
return composition
}
This video composition essentially does nothing to the video; it merely passes the source video frame to the resulting video. Let’s add a red rectangle over the video. Upgrade your buildComposition
function:
AVMutableVideoComposition(asset: asset) { request in
let rect = CGRect(x: 0.0, y: 0.0, width: 300.0, height: 300.0)
let image = CIImage.red.cropped(to: rect)
let targetImage = image.composited(over: request.sourceImage)
request.finish(with: targetImage, context: nil)
}
After running the code, you’ll get a video with a red rectangle added.
Time
To get the time of the frame being processed, access the compositionTime
property of the request
:
let time = request.compositionTime
With the CIImage representing the frame and CMTime representing the frame timing, all that’s left to do is to edit the CIImage depending on your task.
Applying a Filter Effect to the Video Frame
For example, you can apply a CIFilter to the frame using the following code:
private func buildComposition() async -> AVVideoComposition {
let filter = CIFilter.hexagonalPixellate()
filter.scale = 50.0
let composition = AVMutableVideoComposition(asset: asset) { request in
let sourceImage = request.sourceImage
filter.inputImage = sourceImage
let outputImage = filter.outputImage ?? sourceImage
request.finish(with: outputImage, context: nil)
}
return composition
}
Animation
You can even animate the CIFilter using the following code:
private func buildComposition() async throws -> AVVideoComposition {
let maxScale = 200.0
let filter = CIFilter.hexagonalPixellate()
let duration = try await asset.load(.duration)
let composition = AVMutableVideoComposition(asset: asset) { request in
let sourceImage = request.sourceImage
let time = request.compositionTime
let ratio = time.seconds / duration.seconds
filter.scale = Float(max(1.0, maxScale * ratio))
filter.inputImage = sourceImage
let outputImage = filter.outputImage ?? sourceImage
request.finish(with: outputImage, context: nil)
}
return composition
}
The CIFilter has an input parameter, scale
, which we animate. maxScale
defines the highest possible value. The ratio changes from 0.0
to 1.0
while the video is playing or being exported. So, frame by frame, the scale
number increases.
This approach seems straightforward, doesn’t it?
Yes, if your task is to edit a single video. It could involve cropping and resizing, adding captions and subtitles, or applying filters.
The main disadvantage of this approach is that it operates with a single video track and is not suitable if you want to combine several videos and animate transitions. For combining multiple videos, inserting images as frames, or implementing animated transitions, consider the next approach – custom video compositor implementing AVVideoCompositing.
However, if your task is to edit a single video, the AVAsynchronousCIImageFilteringRequestapproach is suitable.
Conclusion
In this post, we learned how to set up a frame-by-frame processing video pipeline with Swift and AVFoundation. AVAsynchronousCIImageFilteringRequest can be suitable when the task pertains to a single video. This could include tasks such as cropping and resizing, applying filters, or adding captions.
Feel free to explore and experiment further with these concepts!
Tags
AVFoundation, AVAsynchronousCIImageFilteringRequest, AVVideoCompositing, CIFilter, CIImage, Swift