Yeah I have the same issue with context window size. Personally I'm just waiting for future LLM's with 10x-100x context window. However, someone else recently came up with this solution:
I feel like the next step is to quantize or otherwise downsize old tokens so that they can fit more in memory at once. Not sure what the implications of a mixed-float-size model would be.
https://news.ycombinator.com/item?id=35488291