Convincing Xcode to Map Vim Keys

I've tried to make it clear in the title, but hopefully you haven't gotten to this page thinking that Apple provides a built-in way to add custom key mappings in Xcode 13's new Vim Mode. Unfortunately, this functionality just isn't easily configurable.

Instead, this is one of those “but everything's a little configurable with enough swizzling & pointer arithmetic” kind of posts.


Table of Contents


Background

First things first: Xcode 13 is the first Xcode release to offer a built-in Vim Mode!

Xcode 13's new Vim Mode

Xcode 13's new Vim Mode

This is huge news - I've only really gotten into vim in the last few years, but it's one of those things that I don't think I could ever leave behind at this point.

I'm a big fan of the XVim plugin for this reason, so seeing vim functionality added directly to Xcode is great. This is especially true since XVim is the last Xcode plugin I still commonly use - proper vim functionality in Xcode might mean not having to worry about managing plugins at all in the future.

That being said, while this is a significant step forward either way, it's not exactly a full implementation, as indicated by Apple's own fairly toned-down description:

Many common key combinations and editing modes familiar to Vim users are supported directly within the code editor…

Now I really want to emphasize here - I'm still already loving this feature.

Of course there are some bits of functionality that I've found myself missing when using it so far - apparently I like to jump to specific line numbers using commands like :25 a lot - but this feature has launched with enough functionality to cover what I want to do the majority of the time.

Except for one big thing, which I have so far been unable to get over - I do normally have my escape key mapped to something else.

Well, in insert mode, at least. That part of my .vimrc file looks like this:

imap jk <Esc>

… which is really wonderful - it maps a sequence of keys - the j key, followed by the k key in quick succession - to exit insert mode. It's a common mapping, and for good reason - it's just a quick double-tap on the home row to switch modes.

The keys used - along with the exact behavior, where you get a sort of “fake” j until it resolves based on another keypress or a timeout - makes this a difficult thing to implement using software outside of Xcode.

But not only is it a core part of my muscle memory now (I was surprised how much the characters jk kept appearing in Xcode without this mapping available), but also something that I just don't want to get rid of; I found this to be my preference ages ago, and I don't want to learn something else just to appease Xcode.

So instead, let's see if we can bring Xcode over to our side.


Finding Xcode's Vim Implementation

As is often the case with this sort of thing, we only really have a couple of steps to cover:

  1. Figure out where this logic is being handled
  2. Change said logic

So let's start with the first. When using Vim Mode, Xcode includes a new bottom bar that lists some currently-available actions:

Xcode's Bottom Bar in Vim Mode

Xcode's Bottom Bar in Vim Mode

One good starting option is to search for those action names and try to find where they're used - it's not a super-precise method, but it's fast and often works.

% cd /Applications/Xcode-13.0b4.app
% rg --binary "yank \(copy\)" # rg = `ripgrep`
Binary file Contents/SharedFrameworks/SourceEditor.framework/Versions/A/SourceEditor matches

A single match in a framework called SourceEditor - if anything I'd expect a match in a localized file, containing something like "YANK_DESCRIPTION" = "yank (copy)", after which we would search for YANK_DESCRIPTION. But hey, works for me!

Next we can open SourceEditor in a disassembler like Hopper, and then search for our “yank (copy)” string again. This gives a result that appears to be referenced by a function named SourceEditor.ViYankToEndOfParagraphDownCommand.init:

; DATA XREF=_$s12SourceEditor33ViYankToEndOfParagraphDownCommandC13eventConsumerAcA0c5EventL0C_tcfcTf4gg_n+414
000000000043ffd0    db    "yank (copy) line"

A class name starting with Vi is good news! We have more digging to do, but the SourceEditor framework seems like the right place to start.


Finding Input Handling Logic

Tracing through the rest of this using a disassembler is probably more effort than it's worth.

Instead, let's open up Xcode 13, attach a debugger to it, and put a breakpoint on all of the SourceEditor.ViYankToEndOfParagraphDownCommand methods. That way, if we perform a yank-to-end-of-paragraph in Vim Mode, we should hit our breakpoint.

% lldb

(lldb) process attach --name Xcode
Process 20279 stopped

(lldb) b -r "SourceEditor.ViYankToEndOfParagraphDownCommand"
Breakpoint 1: 22 locations.

(lldb) continue
Process 20279 resuming

It is at this point that I am realizing that I have no idea how to yank to the end of a paragraph. Google says y}?

That doesn't seem to hit the breakpoint though, and it's unclear if it's from me doing the wrong thing or if this breakpoint didn't land somewhere useful.

Looking back in Hopper for other SourceEditor.Vi-prefixed classes, SourceEditor.ViReplaceCharacterCommandHandler jumps out as a good second option. I definitely know how to replace a character (should be r, followed by the replacement) and this class has a CommandHandler suffix rather than just Command - sounds like something that definitely should have a method called on it when a replacement happens.

After setting up the breakpoint and trying to replace a character, we're in luck!

Target 0: (Xcode) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.6
  * frame #0: 0x12adc4ec0 SourceEditor`protocol witness for SourceEditor.CommandHandler.selectionMode.getter : Swift.Optional<SourceEditor.SourceEditorView.SourceEditorSelectionMode> in conformance SourceEditor.ViReplaceCharacterCommandHandler : SourceEditor.CommandHandler in SourceEditor
    frame #1: 0x12afb1882 SourceEditor`SourceEditor.CommandInterface.performCommandWithSelector(_: ObjectiveC.Selector, sender: Swift.Optional<Any>, arguments: Swift.Optional<Any>...) throws -> () + 242
    frame #2: 0x12ade19fe SourceEditor`SourceEditor.SourceEditorView.perform(_: Swift.Optional<ObjectiveC.Selector>, with: Swift.Optional<Any>) -> Swift.Optional<Swift.Unmanaged<Swift.AnyObject>> + 286
    frame #3: 0x12ade1b0e SourceEditor`@objc SourceEditor.SourceEditorView.perform(_: Swift.Optional<ObjectiveC.Selector>, with: Swift.Optional<Any>) -> Swift.Optional<Swift.Unmanaged<Swift.AnyObject>> + 126
    frame #4: 0x12afc7b20 SourceEditor`SourceEditor.ViCommand.perform(actions: Swift.Array<ObjectiveC.Selector>, context: SourceEditor.ViEventConsumer.Context) -> SourceEditor.ViCommand.PerformResult + 288
    frame #5: 0x12afc7c1b SourceEditor`SourceEditor.ViCommand.perform(context: SourceEditor.ViEventConsumer.Context) -> SourceEditor.ViCommand.PerformResult + 43
    frame #6: 0x12afeb894 SourceEditor`SourceEditor.ViEventConsumer.(perform in _383C3123AEDAAFA7B0BF64D9906E584E)(command: SourceEditor.ViCommand, context: SourceEditor.ViEventConsumer.Context) -> SourceEditor.ViCommand.PerformResult + 52
    frame #7: 0x12afebfbb SourceEditor`SourceEditor.ViEventConsumer.(handle in _383C3123AEDAAFA7B0BF64D9906E584E)(commands: Swift.Array<SourceEditor.ViCommand>, context: SourceEditor.ViEventConsumer.Context) -> Swift.Optional<SourceEditor.ViCommand.PerformResult> + 1211
    frame #8: 0x12afec603 SourceEditor`SourceEditor.ViEventConsumer.handleKeyEvent(_: __C.NSEvent, in: SourceEditor.SourceEditorView) -> Swift.Bool + 1443
    frame #9: 0x13b38fe1a IDESourceEditor`IDESourceEditor.IDEViEventConsumer.handleKeyEvent(_: __C.NSEvent, in: SourceEditor.SourceEditorView) -> Swift.Bool + 938
    frame #10: 0x12b0de385 SourceEditor`SourceEditor.SourceEditorView.keyDown(with: __C.NSEvent) -> () + 405
    frame #11: 0x12b0de47f SourceEditor`@objc SourceEditor.SourceEditorView.keyDown(with: __C.NSEvent) -> () + 47
    frame #12: 0x7fff22e2f908 AppKit`-[NSWindow(NSEventRouting) _reallySendEvent:isDelayedEvent:] + 6482
    frame #13: 0x7fff22e2dd96 AppKit`-[NSWindow(NSEventRouting) sendEvent:] + 347
    frame #14: 0x104f16674 IDEKit`-[IDEWorkspaceWindow sendEvent:] + 154
    frame #15: 0x7fff22e2cc11 AppKit`-[NSApplication(NSEvent) sendEvent:] + 3021
    frame #16: 0x104f5a189 IDEKit`-[IDEApplication sendEvent:] + 857
    frame #17: 0x7fff23104f71 AppKit`-[NSApplication _handleEvent:] + 65
    frame #18: 0x7fff22c9506e AppKit`-[NSApplication run] + 623
    frame #19: 0x1039f4e84 DVTKit`-[DVTApplication run] + 54
    frame #20: 0x7fff22c6924c AppKit`NSApplicationMain + 816
    frame #21: 0x7fff203c1f3d libdyld.dylib`start + 1
    frame #22: 0x7fff203c1f3d libdyld.dylib`start + 1

From this stacktrace, we can see the following flow:

  1. Starting at the bottom, some general app/window logic that's slowly bubbling up an NSEvent (frame #17 up through #12)
  2. Xcode-specific handling of that event in SourceEditor.SourceEditorView.keyDown (frames #11 and #10)
  3. Vim-specific handling of that event in IDESourceEditor.IDEViEventConsumer.handleKeyEvent (frame #9)

There's more vim handling above that as well, but things become messier as we go.


Where to Hook

Now that we have even the smallest insight into what the control flow for a vim command looks like, it's clear that we're probably not going to be able to understand the whole system without a lot of effort: there are events, event consumers, event consumer contexts, commands, command interfaces, command handlers… I'd probably have trouble wrapping my head around it even with source code & documentation.

Instead, let's find a smaller area to focus on. We mostly just want to tell Xcode that it's actually receiving different inputs. That's not all we need to do - at some point, we will have to figure out how to do things like check which vim mode we're currently in - but we can burn that bridge when we get to it.

Let's start with the IDESourceEditor.IDEViEventConsumer.handleKeyEvent method that we called out above. It's the lowest vim-specific frame in the stack; low enough that it's only really dealing with NSEvent, so whatever logic handles mapping a keypress to a command likely happens after this point.

We can disable our original breakpoint and add a new one on this IDEViEventConsumer.handleKeyEvent method:

(lldb) br disable 2
1 breakpoints disabled.

(lldb) b IDESourceEditor.IDEViEventConsumer.handleKeyEvent
Breakpoint 3: where = IDESourceEditor`IDESourceEditor.IDEViEventConsumer.handleKeyEvent...

Back in Xcode, we can now hit any key to reach our breakpoint, since we're no longer breaking on character-replacement-logic; I'll start by using the mouse to move to Line 1, and then hitting the j key, which should move us down to Line 2; and the breakpoint is indeed hit!

We can now print out some information about the event we're working with:

Target 0: (Xcode) stopped.

(lldb) po $arg1
NSEvent: type=KeyDown
          loc=(544.636,595.392)
         time=111097.2
        flags=0x100
          win=0x7fb9ace853b0
       winNum=7437
         ctxt=0x0
        chars="j"
   unmodchars="j"
       repeat=0
      keyCode=38

There's our j keypress - makes sense so far. If we type continue in lldb, and go back to Xcode, we'll see we've moved down to Line 2.

Now let's see if we can actually impact the result of a keypress. Back in Xcode, we can hit j again, starting the move down to Line 3.

In lldb though, we'll skip out on this method's logic entirely and simply return early. Based on the stacktrace above, our IDEViEventConsumer.handleKeyEvent method is expected to return a Bool - likely indicating whether or not it actually handled the key event. Let's start by returning true early:

Target 0: (Xcode) stopped.

(lldb) thread return true
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1

(lldb) continue

After checking back in Xcode, we're still on Line 2! This makes sense - we've effectively said that IDEViEventConsumer.handleKeyEvent has successfully handled the key event, but didn't actually do anything with it.

Let's try the inverse. We're still on Line 2; hitting j again brings us back to our breakpoint:

Target 0: (Xcode) stopped.

(lldb) thread return false
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1

(lldb) continue

And looking back in Xcode… hm.

Xcode screenshot with a rogue j character at the start of line 2

Xcode screenshot with a rogue j character at the start of line 2

Saying that IDEViEventConsumer did not handle the key event meant a j character was inserted into our line, despite us having been in normal mode (in fact, we still are).

This points to one of two possibilities:

  1. IDEViEventConsumer.handleKeyEvent is expected to handle all keypresses when Vim Mode is enabled. Our thread return false meant that we fell back to some non-vim-mode handler, which naturally just inserted a j character; this wouldn't happen in normal use.
  2. IDEViEventConsumer.handleKeyEvent actually only handles some keypresses in vim mode, like for navigation or switching between modes. General typing in insert mode is handled by some other event consumer later in the chain.

That distinction might affect our path forward here, so let's see if we can figure out which is true. The easiest way to do that is to check what values this method is returning in different scenarios.

Back in Xcode, another j press to hit our breakpoint - this time, let's step out of the frame, and then check the rax register to see what value was returned:

Note: If you're trying to replicate the steps in this post on an M1-based Mac, the return value here should be in $x0 instead; note that I haven't tested any part of this post on an M1 though - sorry!

(lldb) finish
* thread #1, queue = 'com.apple.main-thread', stop reason = step out

(lldb) p $rax
(unsigned long) $7 = 1

That 1 indcates that our method returned true, and was responsible for handling our j press accordingly. Next, let's enter insert mode; we can follow the same process to see a return value of 1 again for the i keypress that entered the mode.

Finally, another j press - but now when we check the return value:

(lldb) finish
* thread #1, queue = 'com.apple.main-thread', stop reason = step out

(lldb) p $rax
(unsigned long) $9 = 0

That confirms it - IDEViEventConsumer.handleKeyEvent is not actually responsible for handling events in insert mode. In that case, we'll want to hook into things one level down on the stack, in SourceEditor.SourceEditorView.keyDown.

This might make some things harder (like figuring out vim-specific context like what mode we're in; presumably, that would have been easier to access from a class called IDEViEventConsumer!), but it's necessary in order for us to manipulate events within insert mode.

Then again, it might make some things a lot easier - from the stacktrace above, it looks like there is an @objc entrance to SourceEditor.SourceEditorView.keyDown, meaning we should be able to swizzle it. That'll save us a good amount of headache over trying to hook a swift-only method.


Making an Xcode Plugin

This could probably be split out into its own post at some point, since I always struggle to find a somewhat up-to-date set of steps for making an Xcode Plugin. I think there are a few templates around that could be useful, but I usually prefer to know what I'm changing, especially when there are few enough steps, so:

  1. Start with File > New Project, and pick the macOS Bundle template. Project Name (I'm using XcodeVimMap) and Organization Identifier are up to you, but change the Bundle Extension to xcplugin.

  2. Add a new Obj-C Header / Implementation file using File > New > File, Cocoa Class. I'll use my project name for this class name as well - XcodeVimMap.

    Here's what my implementation file looks like to start with:

    #import "XcodeVimMap.h"
    
    @implementation XcodeVimMap
    
    + (void)pluginDidLoad:(NSBundle *)plugin {
        NSLog(@"[XcodeVimMap] Plugin Loaded");
    }
    
    @end
    
  3. In the Info.plist of the bundle, add a new DVTPlugInCompatibilityUUIDs key of type array. Add the associated UUID of the Xcode app version you're developing for as a string item within the array; you can get the UUID using:

    defaults read /Applications/{PATH_TO_XCODE}.app/Contents/Info DVTPlugInCompatibilityUUID
    
  4. Add a boolean key named XC4Compatible with a value of 1.

  5. Add a boolean key named XCPluginHasUI with a value of 0.

  6. Update the Principal Class key to the name of the class you created. This is not strictly needed if you only have one class, but can save you some pain later.

  7. If this is the first Xcode plugin you're installing for this copy of Xcode, re-sign Xcode (this can cause issues with your Xcode installation, but I'm not really qualified to know what they are or speak much about them; but probably do some research is this if your first foray into plugins).

  8. Build the plugin and copy the resulting product into ~/Library/Application\ Support/Developer/Shared/Xcode/Plug-ins/.

  9. Launch Xcode from the command line, searching the output for news from your plugin:

    /Applications/{PATH_TO_XCODE}.app/Contents/MacOS/Xcode 2>&1 | grep XcodeVimMap
    

    If you set up everything correctly, you should get a prompt within Xcode asking you if you want to load the bundle (Unexpected code bundle "..."). Note that the button you want is the non-highlighted one. After loading the bundle, you should see the XcodeVimMap Loaded output in your terminal.

    Note: If you see any output about the plugin not being loaded, such as Skipping plug-in at path '.../XcodeVimMap.xcplugin', it's likely related to the Info.plist values specified above; double-check them before anything else.

If everything worked, there are two additional steps you can do to make development quicker:

  1. In your target's Build Settings, change Installation Build Products Location to $(HOME), and Installation Directory to /Library/Application Support/Developer/Shared/Xcode/Plug-ins. This will prevent you from having to copy the product over after each build.

  2. In your scheme's settings, on the Run tab, and the Info sub-tab, change your Executable to be the copy of Xcode that you're developing for. This will let you hit the Run button to launch another instance of Xcode after building, see its log output, and debug it. Note that as of writing, you'll also have to uncheck Enable user interface debugging in the Options sub-tab, or you'll get an error message about being unable to load a debugger plugin.

    Xcode is fairly slow with a debugger attached, so if the debugger isn't a core part of your workflow, it's worth unchecking the Debug executable box in the Info sub-tab - though we will be using it for parts of this post.

With all that out of the way, we now have a general Xcode plugin set up - we can now start adding some vim-specific code to it.


Swizzling the Event Handler

We determined earlier that SourceEditor.SourceEditorView.keyDown was likely the ideal method to target here, so let's start by swizzling it in our plugin.

First, let's add a breakpoint on our NSLog statement in pluginDidLoad: so we can see if the class we want to swizzle is actually loaded by this point:

(lldb) po NSClassFromString(@"SourceEditor.SourceEditorView")
 nil

Looks like the class isn't loaded yet. We know it's defined in the SourceEditor framework, which we saw the path for near the top of this post; we can load it early ourselves using dlopen:

// #import <dlfcn.h> for dlopen

NSString *xcodePath = [[NSBundle mainBundle] bundlePath];
NSString *sourceEditorPath = [xcodePath stringByAppendingPathComponent:@"Contents/SharedFrameworks/SourceEditor.framework/Versions/A/SourceEditor"];
dlopen([sourceEditorPath cStringUsingEncoding:NSUTF8StringEncoding], RTLD_NOW);

NSLog(@"[XcodeVimMap] SourceEditor Loaded");

If we add a breakpoint on this new NSLog statement and re-run, we can confirm our class is now loaded:

(lldb) po NSClassFromString(@"SourceEditor.SourceEditorView")
SourceEditor.SourceEditorView

We should now be able to set up our swizzle accordingly. Note that I'm using a bit of a weird format where we keep a reference to the original implementation in a function pointer rather than just using method_exchangeImplementations; I ran into some oddness with the usual setup, and would rather save the related debugging effort for elsewhere in this post - something tells me I'll need it.

// #import <AppKit/AppKit.h> for NSEvent
// #import <objc/runtime.h> for runtime fun

// Holds the original method implementation
static BOOL(*originalKeyDown)(id self, SEL _cmd, NSEvent *event);

// Log keypress events, then forward to original implementation
- (BOOL)swizzled_keyDown:(NSEvent *)event {
    NSLog(@"[XcodeVimMap] Got Characters: %@", event.characters);
    return originalKeyDown(self, _cmd, event);
}

+ (void)pluginDidLoad:(NSBundle *)plugin {
    // ...

    Method originalMethod = class_getInstanceMethod(
        NSClassFromString(@"SourceEditor.SourceEditorView"),
        NSSelectorFromString(@"keyDown:"));

    Method replacementMethod = class_getInstanceMethod(
        [self class],
        @selector(swizzled_keyDown:));

    // Save the original implementation so we can
    // call it from `swizzled_keyDown`
    originalKeyDown = (void *)method_getImplementation(
            originalMethod);

    // Replace the method
    method_setImplementation(
        originalMethod,
        method_getImplementation(replacementMethod));
}

Now we can Build & Run, open a source file, and start typing:

[XcodeVimMap] Plugin Loaded
[XcodeVimMap] SourceEditor Loaded
[XcodeVimMap] Got Characters: G (// move to end of file)
[XcodeVimMap] Got Characters: i (// switch to insert mode)
[XcodeVimMap] Got Characters: H
[XcodeVimMap] Got Characters: e
[XcodeVimMap] Got Characters: l
[XcodeVimMap] Got Characters: l
[XcodeVimMap] Got Characters: o
[XcodeVimMap] Got Characters:  
[XcodeVimMap] Got Characters: W
[XcodeVimMap] Got Characters: o
[XcodeVimMap] Got Characters: r
[XcodeVimMap] Got Characters: l
[XcodeVimMap] Got Characters: d
[XcodeVimMap] Got Characters: !

Hooray! We've just built the world's worst Xcode keylogger!


Testing Our New Capabilities

Now that we have a hook into the root of Xcode's source editor input handling, there are a few things we should try out, since we'll need them for our final implementation; better to make sure they work now rather than later.

Sending Delayed Inputs

As mentioned earlier in this post, the remap functionality we're looking for adds a ‘fake’ j when the key is pressed, before fully committing to it; this way, vim can determine if the next key is a k (indicating we should exit) or if the user really just meant to type j.

There are two rough proxies of that that I can think of:

  1. Don't insert a j at all when we receive a keypress, until either the next key is pressed or a time threshold has passed. If the next key is k, don't insert the j at all, just exit.
  2. Do insert a j under the assumption that it is wanted. If the next key is k, delete the j, then exit.

I like the first option a lot more - anything that deals with deleting content starts to get tricky, even in a situation where you have full control over things - which we don't. Doing something simple like sending a backspace event leads to problems if, for example, the user has clicked on another line or highlighted some text with their mouse in the meantime (I guess that might introduce issues in either case, but still; option 1 feels the least destructive).

In either case, we need the ability to send an event at a time when it didn't originally occur. This is probably perfectly fine, but might as well test while we're here.

An easy way to do so would be to send the same input multiple times, with one coming after a delay. This isn't guaranteed to work if these events used some sort of nonce-based system, but that seems unlikely, and we can debug if needed. So let's try it:

- (BOOL)swizzled_keyDown:(NSEvent *)event {
    NSLog(@"[XcodeVimMap] Got Characters: %@", event.characters);

    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, 1 * NSEC_PER_SEC), dispatch_get_main_queue(), ^{
        originalKeyDown(self, _cmd, event);
    });

    return originalKeyDown(self, _cmd, event);
}

And the result, after a bit of typing:

HelloH elwlo world!orld!

Sending Multiple Inputs per Frame

This is another one where I'd be surprised if we encountered issues, but better to confirm upfront.

If we want to send a queued j key as soon as we see a non-k key pressed, we likely want to send both at the same time. Let's use essentially the same setup, without the delay, to confirm that Xcode allows for this:

- (BOOL)swizzled_keyDown:(NSEvent *)event {
    NSLog(@"[XcodeVimMap] Got Characters: %@", event.characters);

    originalKeyDown(self, _cmd, event);
    return originalKeyDown(self, _cmd, event);
}

And the result:

HHeelllloo  WWoorrlldd!!

Modifying Inputs

If we want to be able to map some inputs on to others, we need to make sure we can actually modify and send different events than the ones we're originally given.

NSEvent is not mutable, but we can make a helper method to clone a given event with some differences applied. This won't work well if NSEvent has any internal state that isn't surfaced here, but we can start with this and worry about that if it comes up.

+ (NSEvent *)modifiedEvent:(NSEvent *)event withCharacters:(NSString *)characters {
    return [NSEvent keyEventWithType:event.type
                            location:event.locationInWindow
                       modifierFlags:event.modifierFlags
                           timestamp:event.timestamp
                        windowNumber:event.windowNumber
                             context:nil // event.context is deprecated and only returns `nil`
                          characters:characters
         charactersIgnoringModifiers:characters // unclear if we have to worry about this distiction
                           isARepeat:NO
                             keyCode:event.keyCode]; // wagering that no code actually reads this, so we don't have to change it accordingly
}

Now we can write something to try changing some events around. This should swap the j and k keys, both in normal mode (where navigating up and down across lines will be reversed) and in insert mode:

- (BOOL)swizzled_keyDown:(NSEvent *)event {
    if ([event.characters isEqualToString:@"j"]) {
        // Swap "j" with "k"
        NSEvent *modifiedEvent = [XcodeVimMap modifiedEvent:event
                                             withCharacters:@"k"];
        return originalKeyDown(self, _cmd, modifiedEvent);
    } else if ([event.characters isEqualToString:@"k"]) {
        // Swap "k" with "j"
        NSEvent *modifiedEvent = [XcodeVimMap modifiedEvent:event
                                             withCharacters:@"j"];
        return originalKeyDown(self, _cmd, modifiedEvent);
    }

    // If neither "j" nor "k", just call the original
    return originalKeyDown(self, _cmd, event);
}

Navigating in normal mode is indeed inverted, and the result of a quick typing test in insert mode:

kacj in the box

Creating an Escape Key Event

One last thing we should check - Escape key events might be handled differently.

First, it may be useful to compare a normal key event to an Escape key event. Here's the debug description of each, respectively:

// "q" keypress
NSEvent: type=KeyDown
          loc=(429.085,168.524)
         time=123860.4
        flags=0x100
          win=0x7ff7364feef0
       winNum=9305
         ctxt=0x0
        chars="q"
   unmodchars="q"
       repeat=0
      keyCode=12

// Escape keypress
NSEvent: type=KeyDown
          loc=(429.085,168.524)
         time=123860.4
        flags=0x100
          win=0x7ff7364feef0
       winNum=9305
         ctxt=0x0
        chars="" // different
   unmodchars="" // different
       repeat=0
      keyCode=53 // different

Ok, chars and unmodchar are both empty, and only the keyCode is different outside of that. I omitted that from the modifiedEvent: method earlier, so looks like we'll have to add it in.

Except… that doesn't work! Even after adding the keycode in, and getting a modified event whose debug description looks exactly like an original Escape key event, it doesn't actually work for exiting insert mode. There must be some underlying state that's not exposed here that's set for Escape key events. Capturing an Escape key event and replaying it in the future works, which supports this theory as well.

I was stuck on this for a while - I started looking at what private ivars NSEvent might have, trying to figure out what part of the vim implementation was responsible for checking the escape button press (with no luck), and so on.

At some point I ended up looking at a dumped NSEvent.h header to see what else might be relevant, and saw that there was an undocumented -[NSEvent _isEscapeKeyEvent:] method - looking at its implementation revealed the issue:

(lldb) po escapeEvent.characters


(lldb) p escapeEvent.characters.length
(NSUInteger) $3 = 1

I was too easily misled by the event's debug description; these weren't empty strings at all. They're @"\x1b" - a string containing only the escape character, which uses key code 0x1b.

Yea, that one's on me.

Good news is that [XcodeVimMap modifiedEvent:event withCharacters:@"\x1b”] does indeed work! Remapping from any key to this modified event, using a similar setup to the j and k swap above, lets us use that key to leave insert mode. That should cover everything we need from the NSEvent modification & replay side.


Determining Which Vim Mode We're In

If the end goal is to make a j input followed by a k input leave insert mode, then we have to have a way to tell if we're actually in insert mode. If we didn't care, we probably could have done this whole thing with software outside of Xcode.

But having an app-wide (or system-wide) delay on my j key sounds terrible, even if just for navigating in vim's normal mode. To do this properly, we really need to be able to find which mode we're in.

Some part of Xcode's Vim system has this knowledge, since it's making decisions about what keys should do at any given point in time; this might require some digging to find, but it's easier to start with objects we've seen before.

In the stacktrace we saw earlier for handling a Character Replace, there were a bunch of interesting classes referenced - one that really jumps out by name is SourceEditor.ViEventConsumer.Context.

From that same stacktrace, it looks like there's a SourceEditor.ViCommand.perform(...) function that gets this context as its first argument - let's switch back to the debugger and set a breakpoint there:

(lldb) b -r SourceEditor.ViCommand.perform
Breakpoint 1: 9 locations.

Now after hitting a movement key in Xcode:

Target 0: (Xcode) stopped.

(lldb) po $arg1
ViEventConsumer.Context<mode: .normal,
                   keyStroke: ViKeystroke(
keyEquivalents: [ViKeyEquivalent(characters: "l", modifierFlags: [])]),
   currentKeyEquivalentIndex: 0,
                    register: nil,
                       count: nil,
             partialCommands: [],
             operatorCommand: nil,
           lastMotionCommand: nil,
    lastMotionCommandRanges = nil>

First of all, this debug description switching from : to = for the very last property breaks my heart (and, coincidentally, my syntax highlighting). And second of all - there's our Vim Mode, right at the top!

There are a few hoops that we're going to have to jump through here. We're starting with a SourceEditor.SourceEditorView instance from our swizzled method; we need to use that to get an IDESourceEditor.IDEViEventConsumer instance (the owner of the method we were first investigating hooking). We can presumably use that to get this ViEventConsumer.Context instance, and from there, we need a way to access this mode property's value.

Let's start with the middle hoop, going from IDESourceEditor.IDEViEventConsumer to ViEventConsumer.Context.

Searching Ivars

We can start by pausing in the same method that we were working with previously:

(lldb) b IDESourceEditor.IDEViEventConsumer.handleKeyEvent
Breakpoint 1: where = IDESourceEditor`IDESourceEditor.IDEViEventConsumer.handleKeyEvent...

Now we can press any key to hit our breakpoint, and can then access our IDESourceEditor.IDEViEventConsumer instance using $r13 (which holds self for Swift functions on x86).

A good thing to check first is if we can find an ivar or method for accessing our context. I'm more used to iOS, where there's a nice built-in way to do this via -[NSObject _ivarDescription] and -[NSObject _methodDescription], added by an NSObject category in UIKitCore, but those won't help us on macOS.

There are also some lldb scripts that could help with this, but after some trying, they don't seem to work properly in this scenario in particular.

Let's just look at the ivars manually:

(lldb) e -- unsigned int $ivarCount = 0;

(lldb) e -- Ivar *$ivars = (Ivar *)class_copyIvarList((Class)[(id)$r13 class], &$ivarCount);

(lldb) p/d $ivarCount
(unsigned int) $ivarCount = 6

Great, we've found some options! Are any of them our context?

(lldb) p (char *)ivar_getName($ivars[0])
(char *) $1 = "sourceEditorView"

(lldb) p (char *)ivar_getName($ivars[1])
(char *) $2 = "hasPartiallyMatchedCommands"

(lldb) p (char *)ivar_getName($ivars[2])
(char *) $3 = "$__lazy_storage_$_statusViewItem"

(lldb) p (char *)ivar_getName($ivars[3])
(char *) $4 = "$__lazy_storage_$_suggestionsViewItem"

(lldb) p (char *)ivar_getName($ivars[4])
(char *) $5 =  "$__lazy_storage_$_toggleRegistersViewerShownBottomBarItem"

(lldb) p (char *)ivar_getName($ivars[5])
(char *) $6 = "registersViewerPopover"

… seems like not.

We do have one thing to try before moving on though - class_copyIvarList does not return ivars belonging to superclasses. Does IDESourceEditor.IDEViEventConsumer have a superclass?

(lldb) po [[$r13 class] superclass]
SourceEditor.ViEventConsumer

It does, and it's still vim-related, so it has a good chance of being the thing that owns the context. Using the same setup as before, but this time looking at this superclass’ ivars:

(lldb) e -- Ivar *$superivars = (Ivar *)class_copyIvarList((Class)[[(id)$r13 class] superclass], &$ivarCount);

(lldb) p/d $ivarCount
(unsigned int) $ivarCount = 7

(lldb) p (char *)ivar_getName($superivars[0])
(char *) $7 = "context"

Aha! There's our context!

We can write a basic method to help get this object by its name. It looks a lot like object_getInstanceVariable, but that function is not callable under ARC, hence writing our own instead:

+ (id)getIvar:(NSString *)ivarName from:(NSObject *)object {
    const char *ivarNameCString = [ivarName cStringUsingEncoding:NSUTF8StringEncoding];
    Ivar ivar = class_getInstanceVariable([object class], ivarNameCString);
    return object_getIvar(object, ivar);
}

And then test it in the same place in lldb, after recompiling:

(lldb) e -- id $context = [XcodeVimMap getIvar:@"context" from:(id)$r13]

(lldb) po $context
ViEventConsumer.Context<mode: .normal, ... >

Works great!

Note: One thing that surprises me here is that [(id)$r13 valueForKey:@“context”] does not work for this same purpose. My understanding is that valueForKey: should fall back to searching for ivars of the same name (both with and without underscore prefixes). This class isn't overriding the default valueForKey: implementation, and doesn't have accessInstanceVariablesDirectly overridden to return NO. Direct ivar access works either way, but still - let me know if you know why valueForKey: doesn't work here!

After a bit of searching through the ivars of VIEventConsumer.Context, it looks like we can then get the current vim mode using a similar strategy, this time by looking for an ivar named "mode".

One note here is that object_getIvar has a return type of NSObject *, but this is somewhat broken for primitive types - they aren't wrapped in an NSNumber or the like, but rather, their values are simply returned as-is by the method. Not a pointer to anything, just the raw scalar value. We have to do a bit of casting nonsense accordingly.

(lldb) e -- uint8_t $mode = (uint8_t)(long)[XcodeVimMap getIvar:@"mode" from:$context]

// In normal mode...
(lldb) p/d $mode
(uint8_t) $mode = 3

// Or in insert mode...
(lldb) p/d $mode
(uint8_t) $mode = 5

We've now successfully jumped through two of our three remaining hoops.

Manually Bridging a Swift Array

We now need to get access to our IDESourceEditor.IDEViEventConsumer instance from the SourceEditor.SourceEditorView that we have available to us in our hook. The above strategy of looking at ivars also pays off here.

There is an ivar named eventConsumers of type… uh…

(lldb) e -- Ivar $ivar = class_getInstanceVariable([self class], "eventConsumers");

(lldb) po object_getIvar(self, $ivar)
0x00007fd087110830

(lldb) po [object_getIvar(self, $ivar) class]
error: Execution was interrupted, reason: Attempted to dereference an invalid ObjC Object or send it an unrecognized selector.
The process has been returned to the state before expression evaluation.

(lldb) po ivar_getTypeEncoding($ivar)
<no value available>

So there's an ivar there, and it has a non-nil value, but it's unclear of what. Not being able to send it a class message makes sense - that seems plausible for certain Swift-defined types (including an array, which seems most likely based on its name) - but <no value available> from ivar_getTypeEncoding? Is that even allowed?

Let's look at what's actually visible in-memory from that location, using lldb to essentially treat each group of eight bytes as an id:

(lldb) memory read -c 10 -t id 0x00007fd087110830
(id) 0x7fd087110830 = 0x00007fd0d18550d8
(id) 0x7fd087110838 = 0x0000000000000003
(id) 0x7fd087110840 = 0x000000000000000d
(id) 0x7fd087110848 = 0x0000000000000020
(_TtkC15IDESourceEditor18IDEViEventConsumer *) 0x7fd087110850 = 0x0000600003622a00
(id) 0x7fd087110858 = 0x000000012eb68358
(_TtC8DeltaKit9ChangeBar *) 0x7fd087110860 = 0x00007fd0e15e35d0
(id) 0x7fd087110868 = 0x000000013eb71890
(_TtC12SourceEditor18SourceEditorGutter *) 0x7fd087110870 = 0x00007fd0d1426870
(id) 0x7fd087110878 = 0x000000012eb64ac8

We can actually see our IDEViEventConsumer there, just 0x20 bytes in!

Directly after it appears to be a pointer to some information related to that class’ protocol witness table for the SourceEditor.SourceEditorViewEventConsumer protocol:

(lldb) image lookup -a 0x000000012eb68358
      Address: SourceEditor[0x000000000049c358] (SourceEditor.__DATA_CONST.__const + 42072)
      Summary: SourceEditor`protocol witness table for SourceEditor.ViEventConsumer : SourceEditor.SourceEditorViewEventConsumer in SourceEditor

… a concept I don't understand well enough to do anything other than acknowledge that each item in this array appears to be stored in pairs of pointers (one to the instance itself, and the other related to the protocol witness table).

The first line of the output also clearly shows a pointer, though lldb doesn't print out any info about it here. Let's try to do so manually:

(lldb) po 0x00007fd0d18550d8
_TtGCs23_ContiguousArrayStorageP12SourceEditor29SourceEditorViewEventConsumer__$
// demangled: Swift._ContiguousArrayStorage<SourceEditor.SourceEditorViewEventConsumer>

So this is indeed an array (or at least, “array storage”?) but how do we get it back into a reasonable format?

I tried a few quick options - like passing this pointer to a Swift-based function and converting it there, or calling _bridgeToObjectiveC manually - but no luck. Possibly because this is not technically an array, or possibly because I'm doing something completely wrong.

I would spend more time digging, but it's hard to justify when the instance we want is right there. So let's just figure out how to read it ourselves.

First we need to know how many objects are in the array - we could make a guess based on some of the first few values printed by lldb above (0x03? 0x0d? 0x20?), but we also know that SourceEditor.SourceEditorView.keyDown must look at this memory to get the IDEViEventConsumer instance itself, so let's see if we can figure out what it does.

Looking through that method in a disassembler, just a bit before the offset in the method that we jump from based on our initial stacktrace, we can see the following instructions:

mov   r15, qword [rax+0x10]
test  r15, r15
je    loc_3633ae

This looks like an initial check to see if our array is empty; it's reading the value at 0x10 bytes from the rax register (which presumably, by this point, holds a pointer to our array storage); then test r15, r15 is essentially checking if the value is less than or equal to zero. This implies that the array length is kept at offset 0x10 - in our case, that corresponds to the 0x0d value in the above lldb's output, i.e., 13 items in the array.

We now have enough information to read this data and convert it to an NSArray instance ourselves:

+ (NSArray *)arrayFromSwiftArrayStorage:(void *)swiftArrayStorage {
    // Create a mutable array to hold each encountered element
    NSMutableArray *results = [NSMutableArray new];

    // Read array length at offset 0x10
    long arrayLength = *(long *)((char *)swiftArrayStorage + 0x10);

    // Get each element of the array, every 0x10 bytes, starting at offset 0x20
    for (long i=0; i<arrayLength; i++) {
        void **elementPtr = (void **)((char *)swiftArrayStorage + 0x20 + (0x10 * i));
        id element = (__bridge NSObject *)(*elementPtr);
        [results addObject:element];
    }

    return results;
}

There's more casting, dereferencing, and double pointers there than I'm usually happy with, but before we judge it too harshly, let's see if it works:

(lldb) e -- id $eventConsumerStorage = [XcodeVimMap getIvar:@"eventConsumers" from:self]

(lldb) po [XcodeVimMap arrayFromSwiftArrayStorage:$eventConsumerStorage]
<__NSArrayM 0x60000af3f270>(
    <IDESourceEditor.IDEViEventConsumer: 0x600000bd7100>,
    DeltaKit.ChangeBar,
    <SourceEditor.SourceEditorGutter: 0x7fe4dd5213c0>,
    <DebuggerUI.DBGColumnBreakpointLayoutVisualization: 0x6000002059a0>,
    SourceEditor.StructuredEditingController,
    SourceEditor.SourceEditorLineAnnotationManager,
    SourceEditor.FoldingController,
    <SourceEditor.SourceEditorViewDraggingSource: 0x600000ebe800>,
    SourceEditor.SourceEditorEditAssistant,
    SourceEditor.ContextualMenuEventConsumer,
    <DVTSourceEditor.DVTSourceEditorTextFindPanel: 0x7fe4dd524210>,
    <IDESourceEditor.SourceCodeEditor: 0x7fe4fe18e200 representing: <DVTExtension 0x6000007d2c40: Xcode.IDEKit.EditorDocument.PegasusSourceCode from com.apple.dt.IDE.IDEPegasusSourceEditor>>,
    PathTokenVisualization: []
)

There we go! We now have access to all our event consumers - IDEViEventConsumer included.


And Finally… The Remap

Our solutions to problems are getting less sane as the post gets longer. Better wrap it up quick.

We finally have what we need to put everything together, and for me to finally have my normal escape functionality back.

First, let's define a new enum to represent the vim modes we care about (even if that is just one entry), along with a method to get the current vim mode:

typedef NS_ENUM(uint8_t, VimMode) {
    VimModeInsert = 5
};

+ (VimMode)vimModeFromSourceEditorView:(id)sourceEditorView {
    // Get our current event consumers
    void *eventConsumersStorage = (__bridge void *)([XcodeVimMap getIvar:@"eventConsumers" from:sourceEditorView]);
    NSArray *eventConsumers = [XcodeVimMap arrayFromSwiftArrayStorage:eventConsumersStorage];

    // Find the vim consumer
    id vimEventConsumer;
    for (id eventConsumer in eventConsumers) {
        if ([NSStringFromClass([eventConsumer class]) isEqualToString:@"IDESourceEditor.IDEViEventConsumer"]) {
            vimEventConsumer = eventConsumer;
            break;
        }
    }

    // Get the vim context
    id vimContext = [XcodeVimMap getIvar:@"context" from:vimEventConsumer];

    // Get the current vim mode
    uint8_t vimMode = (uint8_t)(long)[XcodeVimMap getIvar:@"mode" from:vimContext];
    return vimMode;
}

Now let's update our swizzle implementation to exit early if we're not currently in insert mode. I've also added a block to handle sending events back to the original implementation; this is mostly just so we don't have to write self and _cmd every time, though could also be useful if we wanted to pass the block to any helper methods that might want to send events for us.

- (BOOL)swizzled_keyDown:(NSEvent *)event {
    // Block to send an event to the original implementation
    BOOL (^sendEvent)(NSEvent *) = ^BOOL(NSEvent *event) {
        return originalKeyDown(self, _cmd, event);
    };

    // Get the current vim mode
    VimMode vimMode = [XcodeVimMap vimModeFromSourceEditorView:self];

    // Exit early if we're not in insert mode
    if (vimMode != VimModeInsert) {
        return sendEvent(event);
    }

    // TODO: _something_

    // Fall back to default implementation
    return sendEvent(event);
}

Next, we need to listen for j keypresses, and if we see one, store it; we don't want to send it quite yet. Let's declare a new file-scoped variable that holds this ‘j’ key event while we're waiting to send it.

static NSEvent *queuedEvent;

Note: In a more proper setup, we could add this as an associated object on SourceEditorView. But this is not a more proper setup, and this post is getting quite long as-is.

Now back to the above TODO section; when we detect a j keypress in insert mode, we want to store it rather than sending immediately:

// Check if we've received a "j" keypress
if ([event.characters isEqualToString:@"j"]) {
    // Save a reference to the event for later sending
    queuedEvent = event;

    // Return early so that the default implementation
    // (which would apply our `j` press immediately)
    // is not called.
    return true;
}

With this change, it's not impossible to type a j in insert mode. Let's add a new check above it, which will apply our prior j keypress whenever a new keypress comes in:

// Check if we have a previous `j` press queued up
if (queuedEvent != nil) {
    // Apply the `j` press
    sendEvent(queuedEvent);

    // Clear the queued event so it's not sent again
    queuedEvent = nil;
}

Here's the current result, shown by typing the start of the english alphabet with a roughly-constant time between keys; you can see the j key is not sent until the k key is pressed after it.

Now let's add timeout functionality. Within the existing block to handle j keypresses, we can add the following setup to send the j event after a one second delay:

// Added at file scope:
static dispatch_block_t sendQueuedEventAfterTimeout;

// ...

// Create a block to send the event for use in the timeout functionality
sendQueuedEventAfterTimeout = dispatch_block_create(0, ^{
    // Send the queued event
    sendEvent(queuedEvent);

    // Clear out the queued event so that it won't be sent again
    queuedEvent = nil;
});

// Invoke the above block after 1 second. This invocation
// should be cancelled if the event is manually sent sooner.
dispatch_after(
    dispatch_time(DISPATCH_TIME_NOW, (int64_t)(1 * NSEC_PER_SEC)),
    dispatch_get_main_queue(), sendQueuedEventAfterTimeout);

We'll then also edit our queued event handling logic to cancel this timeout if another key was pressed:

// Check if we have a previous `j` press queued up
if (queuedEvent != nil) {
    // Apply the `j` press
    sendEvent(queuedEvent);

    // Clear the queued event so it's not sent again
    queuedEvent = nil;

    // New: Cancel the timeout-based application of the event,
    // since we've invoked it manually.
    dispatch_block_cancel(sendQueuedEventAfterTimeout);
}

Here's the same test as before, just typing up through j this time. Despite hitting j after the same delay as the other keys, you can see Xcode hold on to the press again, but this time automatically apply it after a second of inactivity:

Now there's only one component left to add. Before the existing check for any queued events, we check if the new event corresponds to a k keypress - and if we have a j key queued up, we know we want to exit insert mode instead of applying either.

// Check if we've received a `k` press while a `j` press is still queued
if ([event.characters isEqualToString:@"k"] && queuedEvent != nil) {
    // Clear the queued j press
    queuedEvent = nil;

    // And cancel its timeout-based-sending.
    dispatch_block_cancel(sendQueuedEventAfterTimeout);

    // Create an "escape" key event and send it,
    // returning early in the process.
    NSEvent *escapeEvent = [XcodeVimMap modifiedEvent:event withCharacters:@"\x1b"];
    return sendEvent(escapeEvent);
}

It's been a long journey, but with all of that finished, here's the end result, typing the start of the english alphabet again:

This time, the j key is held back as before, but once the k key is pressed afterward, we leave insert mode, and we have finally escaped.


Wrapping Up

I've made the code for this Xcode plugin available on GitHub, but it's certainly intended more for reference than as something I'd actually recommend using.

There's all sorts of additional work that would need to go into making this widely adoptable, and I haven't even used it enough yet to know if it's stable. With any luck, XVim will continue to exist as a more configurable option separately from (or on top of?) Apple's own offering.

But in the meantime, as I play around with the Xcode 13 betas more, I at least no longer have to wince every few seconds from my vim muscle memory biting me - and isn't that worth something?


Say Hello!