After building Gusto’s text editor from scratch, I had a good idea of the memory and performance limitations of the iPad, so I knew I’d be in for a challenge to provide really good quality code coloring for Gusto.
Criteria 1: Flexible Interpolation
It was for this reason that I decided I would need to make sure the syntax engine was able to switch gears, without caring about file type, current scope, etc.
I decided the best approach for solving the interpolation issue was to make sure languages weren’t identified by file type alone, but also by the syntax itself. This allowed me to write syntax definitions based on specific scopes, and not necessarily be limited to language. This solved many issues:
- Files with mixed languages could be parsed dynamically
- Common language constructs like double quoted strings could be referenced from multiple syntax language definitions without being redundant
- Fast! Because of the tree-like structure, hundreds of possible syntaxes were quickly reduced to a few after looking at the file type and first line of the code, resulting in huge performance increases
Criteria 2: Priority or Exclusion
Another concern was priority of coloring, for example: Should quoted strings be a different color if they are within a comment block?
I concluded that syntax was not linear, but instead a tree of nested ‘modes’ . This was an important distinction in choosing the correct data structure for storing the syntaxes and the parsed results.
Criteria 3: Expandable
I couldn’t very well sit and write hundreds of syntax parsers so I wanted to make sure I chose a method that was serializable, could be easily coded up by end users and plugged in to the eco-system without any fuss.
I constructed a very simple XML convention that allowed for syntax scopes to be mapped to about ten possible scopes like ‘boolean’ or ‘string’. These would be mapped to the colors within each theme.
Criteria 4: Performance
After dealing with performance issues building the text engine I immediately jumped in with binary trees for storage, however, after everything was wired up, I was still struggling with performance. I rethought my strategy.
Consider that every time a character changes in code, a ‘re-parse’ is needed. A single ‘*’ could complete a block comment ‘/*’ making the rest of the document part of a comment, requiring a tear-down of the entire tree we’d built, a potential time cost of O(log(n)) for lookup, O(log(n)) for deleting the rest of the tree, and another O(log(n)) for rebuilding the tree.
This being said, I determined that it was equally as important to focus on deletion performance as it was insertion performance. I moved away from my binary tree and onto a doubly linked list. Immediately, this allowed me to benefit from O(1) insertions and deletions, a considerable improvement over the binary tree. Now if you’re keen on time complexity of data structures, you may be asking, but aren’t linked list lookups O(n)?!?
Good eye! The answer is yes, but, all of this isn’t happening in a vacuum, it’s happening in a text editor. Because of this, the majority of lookups stem from changes to text that happen sequentially. Because of this I was able to majorly increase lookup performance by maintaining a reference to the last accessed item in the list, instead of starting over from the first item in the list. This still means worst case scenario O(n), however best case can be O(1)! Which like I said, is very common, because it’s part of a text editor.
All in all, building the syntax engine for code coloring Gusto was a great exercise in architecting a ecosystem through and through and was a very validating project to see it all come together.