Abusing iOS’ Screenshot Cropping Mechanism

August 25, 2020 • • •

Alongside the slew of new features added in iOS 13 was a small (but delightful!) one: when cropping an image in the screenshot preview window, the crop boundaries would now subtly snap to certain elements in the image:

There was some speculation around how this feature worked — common guesses on a few sites included edge detection, machine learning, or just… something simpler.

Turns out, it’s that last one! Shoutout to Ryder Mackay for owning the only real google result for SSScreenshotMetadataHarvester — the class that handles this logic, and the focus of this post.

We’ll look at how the metadata harvester works, how to feed extra data to it, and how to completely break this form of screenshot cropping.

Note: Like so many things on this site, this is much more of a curiosity than something you should actually do in a production app, especially given how awkward the implementation is. If you did want to leverage Apple internals — why waste it on something like this? Add some confetti or something instead.

`SSScreenshotMetadataHarvester`

SSScreenshotMetadataHarvester (pronounced “ssssscreenshot-metadata-harvester”, as far as I’m concerned), is the primary class responsible for collecting screenshot information from your app.

When a screenshot is taken, SSScreenshotMetadataHarvester will work with your app to generate a full-page PDF-based screenshot if you provide one, but otherwise, it will simply collect two pieces of metadata: the current NSUserActivity’s title, and the bounds of any “important views” that are present on the screen.

You can easily see this bounds-collection system in action using lldb. First, set a breakpoint on _contentRectsForMetadata:

(lldb) b +[SSScreenshotMetadataHarvester _contentRectsForMetadata]
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.

Note that the breakpoint wasn’t able to resolve to an actual address — that’s because ScreenshotServices.framework is loaded dynamically, and (in my case) I haven’t taken a screenshot yet. But lldb will resolve the correct location as soon as the framework is loaded later anyways.

Next, we’ll take a screenshot, and execution will pause. We can then step out of _contentRectsForMetadata and inspect the return value in register x0:

// Breakpoint triggered in `_contentRectsForMetadata`:
(lldb) thread step-out
(lldb) po $x0
<__NSArrayM 0x282c29f80>(
NSRect: {{0, 0}, {375, 778}}
NSRect: {{0, 44}, {375, 734}}
)

And there we have it! That array contains the rects that our screenshots will snap to when cropping.

The views we’re collecting bounds from belong to two categories:

The root view of any view controllers that are currently added to the hierarchy
If visible, the frame of the on-screen keyboard (sometimes)

The second category here is somewhat interesting — the keyboard’s frame is based on a call to +[UIPeripheralHost visiblePeripheralFrame], which doesn’t account for the iPad’s floating or split keyboard modes.

We’re not here to talk about keyboards though, so let’s look at the more interesting side of the metadata: view controllers’ view bounds.

View Controller Traversal

The algorithm for finding each view frame traverses the view controller hierarchy by looking at each view controller’s child view controllers and presented view controller (if any).

For each of these, the bounds of the view controller’s root view is taken, minus any safe area insets; the bounds are then converted to the window’s coordinate space and appended to a list.

The resulting algorithm looks something like this:

/**
 Recursively returns the frame (relative to the key window)
 of each root view controller view available
 under `viewController`.
 */
func getFrames(in viewController: UIViewController) -> [CGRect] {
    // Check if view is loaded; no frames if not
    guard viewController.isViewLoaded else {
        return []
    }

    // Get the view's bounds, minus any safe area insets
    let safeAreaBounds = viewController.view.bounds.inset(
        by: viewController.view.safeAreaInsets)

    // Convert the bounds so they're relative to the window
    let convertedBounds = viewController.view.convert(
        safeAreaBounds,
        to: UIApplication.shared.keyWindow!)

    // Return this view's bounds...
    return [convertedBounds]
        // Plus those within any child view controllers...
        + viewController.children.flatMap(getFrames)
        // Plus those within the presented view controller
        + (viewController.presentedViewController.map(getFrames) ?? [])
}

/**
 Get the frames of all root view controller views available
 under the key window's root view controller.
 */
func getAllViewControllerFrames() -> [CGRect] {
    // Start with the root view controller
    let rootViewController = UIApplication.shared
        .keyWindow!
        .rootViewController!

    // Return all root view frames contained within it
    return getFrames(in: rootViewController)
}

So in a nutshell, view controllers are really what drives the snap areas here.

Not Abusing iOS’ Screenshot Cropping Mechanism

To start with, given that view controllers determine snap areas, it’s worth pointing out that strategic use of child view controllers can actually get you nicer snap areas somewhat naturally.

You can even take this to some extremes, like making collection view cell contents backed by their own view controllers, as discussed in Soroush Khanlou’s original article, or William Boles' excellent follow-up. In fact, William’s demo project acts as a great example of this behavior:

I wouldn’t advocate for structuring your app any differently just for the minor benefit of nicer cropping for screenshots — especially something as involved as view controllers within cells.

But, if you find yourself in a scenario where you’re deciding whether or not to use child view controllers in a given situation — keep in mind that this is one interesting advantage, however small!

Defining Arbitrary Snap Areas

There are two options for modifying the screenshot snap areas, both of which are a bit terrible.

The first option is to simply add empty child view controllers defining the areas you’re interested in. By manipulating the view controller’s root view bounds and safe area insets, you can add arbitrary snap areas. You don’t even need to add the view to the parent’s view hierarchy — only a call to -[UIViewController addChildViewController:] is needed.

However, this is… questionable at best. In my case, I’d much rather just swizzle. The setup is made a bit more complex due to us not being able to use a category (since the class isn’t available at link or launch time), and due to it being a class method.

There are a few ways around this, like forcing a dependency on ScreenshotServices.framework, but this all-in-code setup is the cleanest I’ve come up with:

// Holds the original implementation of
// `-[SSScreenshotMetadataHarvester _contentRectsForMetadata]`
static id(*originalImplementation)(id self, SEL _cmd);

+ (void)setupSwizzle {
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        // Manually load ScreenshotServices in case
        // it hasn't been loaded yet (i.e., a screenshot
        // hasn't been taken in this session)
        dlopen("/System/Library/PrivateFrameworks/"
               + "ScreenshotServices.framework/"
               + "ScreenshotServices", RTLD_NOW);

        // Get the original `_contentRectsForMetadata` method
        Method originalMethod = class_getClassMethod(
            NSClassFromString(@"SSScreenshotMetadataHarvester"),
            NSSelectorFromString(@"_contentRectsForMetadata"));

        // Along with our replacement method...
        Method replacementMethod = class_getClassMethod(
            self,
            @selector(swizzled_contentRectsForMetadata));

        // The standard `method_exchangeImplementations`
        // is problematic for class methods outside
        // of categories. Instead, we save the original
        // implementation, then replace it.
        originalImplementation = (void *)method_getImplementation(originalMethod);
        method_setImplementation(
            originalMethod,
            method_getImplementation(replacementMethod));
    });
}

+ (id)swizzled_contentRectsForMetadata {
    NSMutableArray *rects
        = [originalImplementation(self, _cmd) mutableCopy];

    // Modify `rects` as needed

    return rects;
}

With this setup, we can now add arbitrary bounds as desired — but that by itself is not too interesting. Instead, let’s look at how we can use this ability to modify the screenshot cropping system even more.

Building Stronger Snaps

There’s a fun consequence of the snapping implementation in that, if multiple snap areas are in close proximity, they’ll essentially merge together.

More concretely, once the cropping tool has snapped to one rect, it will continue snapping to rects further along in the returned values array.

This means order matters; if we return snapping rects with origin.x values of…

origin.x = 10, 11, 12, 13, 14

… then cropping to anywhere near the 10-14 range will result in a snap to 14. But if we reverse the rect order…

origin.x = 14, 13, 12, 11, 10

… then we will snap to 10 instead!

We can also add rects approaching from both directions to hone in on an area from either side. This gives us the ability to inflate the snap distance from the default (of only a few points) to any arbitrarily large margin:

// In `swizzled_contentRectsForMetadata`...

// The end rect for which we want
// to have a more agressive snap
CGRect snapToBounds = [testView
    convertRect:testView.bounds
         toView:testView.window];

// The distance on each
// side we want to snap by
static CGFloat margin = 30;

// First, start with a rectangle
// 30 points _outside_ our view
CGRect outerRect = CGRectInset(
    snapToBounds,
    -margin,
    -margin);

// Iteratively shrink the rectangle
// and add to our resulting rects
for (int i = 0; i < margin; i++) {
    [rects addObject:@(outerRect)];
    outerRect = CGRectInset(outerRect, 1, 1);
}

// Next, start with a rectangle
// 30 points _inside_ our view
CGRect innerRect = CGRectInset(
    snapToBounds,
    margin,
    margin);

// Iteratively grow the rectangle
// and add to our resulting rects
for (int i = 0; i < margin; i++) {
    [rects addObject:@(innerRect)];
    innerRect = CGRectInset(innerRect, -1, -1);
}

The above setup allows us to provide a much more aggressive snap area:

Entirely Prevent Cropping an Area

More aggressive snap areas are an interesting use case — maybe actually useful for some apps (though not nearly worth the hacky setup required).

But we still haven’t completely exploited this knowledge yet. There’s one more logical step here, which is to flood an entire rect with snap guides, rather than just an outer boundary.

We’ll start with an empty rect in the middle of our desired end bounds, then slowly grow it until the entire area is filled:

// Define the end rect we want to reach
CGRect endRect = [testView
    convertRect:testView.bounds
         toView:testView.window];

// Start with an empty rect at
// the center of our end rect
CGRect currentRect = CGRectMake(
    CGRectGetMidX(startingRect),
    CGRectGetMidY(startingRect),
    0,
    0
);

// Add our currentRect and grow it
// until it matches the endRect
while (!(CGRectEqualToRect(currentRect, endRect))) {
    [rects addObject:@(currentRect)];

    // Grow by 1 point on each side,
    // without going past `endRect`
    currentRect = CGRectIntersection(
        CGRectInset(currentRect, -1, -1),
        endRect);
}

With this setup, we can ensure that an entire portion of the screen remains intact when cropping — for the most part, it will either be entirely visible, or not at all.

There is some nuance here — the behavior is slightly different on the horizontal axis, and a sliver of the area can still be cropped if snapping would result in too small of a cropped area. But it does work as a (fairly user-hostile) way to ensure that parts of a view, like a footer or watermark, cannot be cropped out in the screenshot editor.

Note: There is actually one further logical step here, since our current trend is to add more and more rects — and that’s to see what happens when we just add millions of them.
…
iOS then crashes a few seconds after you take a screenshot. Just for the record.

Practicality

So, is there any benefit to this knowledge?

If we ignore the complexities & hackiness of the implementation for a moment, there might actually be some interesting use cases for app-defined snap areas — from snapping to items in a vertically-scrolling list, to aligning with images, to one of my favorite potential use cases — easily-croppable screenshots for spreadsheets or other grid-like applications.

If we do take into account the complexities & hackiness — then dear lord, I hope not.

But do let me know if you find a use for this regardless!