This is from Douglas Thain's Cooperative Computing Laboratory.
An architecture class assignment had us trying to find the "best" cpu configuration to run a snippet of code (number of registers, issue width, etc). Most people in the class simulated around 20 configurations. Using an earlier iteration of the CCL Work Queue abstraction, I ran ~12000 simulations in IIRC only about 6 hours.
The point being, it's pretty easy to use. The source is mostly straight C, and pretty easy to follow. I would start with chirp_server.c.
disclaimer: I was one of Dr Thain's students before I dropped out to go do some dream job or other.
I'm having trouble finding a detailed explanation of how this works and how it handles concurrency. http://www3.nd.edu/~dthain/papers/chirp-jgc.pdf seems to be the paper, but it has only a small section on this important subject.
An architecture class assignment had us trying to find the "best" cpu configuration to run a snippet of code (number of registers, issue width, etc). Most people in the class simulated around 20 configurations. Using an earlier iteration of the CCL Work Queue abstraction, I ran ~12000 simulations in IIRC only about 6 hours.
The point being, it's pretty easy to use. The source is mostly straight C, and pretty easy to follow. I would start with chirp_server.c.
disclaimer: I was one of Dr Thain's students before I dropped out to go do some dream job or other.