KurtE that looks like quite a nice speedup, I love the work you're doing! I had a quick look at your patch on github and I think there's a way to go faster by instead of caching the 'enabled' status of the output_e pin is to just keep hold of the output_e pin completely. I'd do something like this:
* Add an array of mraa_gpio_context* of length sizeof(outputen) inside the board configuration.
* On mraa_gpio_dir check if array[outputen[pin]] == NULL, if so call mraa_gpio_init()
* if it's not NULL then you know you have an initialised pin and just do a mraa_gpio_write() to it
* on mraa_gpio_close() that array[outputen[pin]] should be closed too for safety
How does that sound? I think there should be *some* speed up, no malloc/memset etc...
i didn't really understand your timing stuff, surely measuring with a scope/logic analyser is much easier/better? If you're curious about instruction count/cost etc I'd recommend looking at sep/vtune and use the hardware performance counters. I should really get around to setting that up on my board, it would be fun to see.