GPT-4 presumably.

flir · on April 27, 2023

I keep bumping in to the context window size. I'm trying to figure out a "compression" step that I can use in the general case, but nothing's very satisfying so far.

The mozilla/readability library is a good first step though.

reaperman · on April 27, 2023

Yeah I have the same issue with context window size. Personally I'm just waiting for future LLM's with 10x-100x context window. However, someone else recently came up with this solution:

https://news.ycombinator.com/item?id=35488291

bobbylarrybobby · on April 27, 2023

I feel like the next step is to quantize or otherwise downsize old tokens so that they can fit more in memory at once. Not sure what the implications of a mixed-float-size model would be.

newswasboring · on April 27, 2023

Why do you want to compress this data? What's the final use case here?

flir · on April 27, 2023

Imagine a browser plugin that pops up a modal. I write a query related to the current page, eg "Please summarize this page in three paragraphs, and translate to Turkish" or "There's a recipe somewhere on this page. Please suggest some variations on the filling" or "Can you make me a list of all the people mentioned on this page". Whole page (or at least the meat of it) gets bundled up with the query and sent to openai. What I'm trying to build is a simple in-browser swiss army knife.

Yes, I could try to figure out which bits of the page need to be sent along with the prompt, but that's hard in the general case. Squeezing a bit more out of the prompt window by stripping out unnecessary boilerplate is easy by comparison. (Multi-page articles are another headache).

newswasboring · on April 27, 2023

Due to the constrained context window this is indeed a problem. But I would say solving it by just increasing context window will be really brute forcing the issue. I hope we can come up with something better. I'm betting on embeddings for these kind of things in my personal projects. But that too seems like a jackhammer for a nail kind of thing for single web pages.

jerrygenser · on April 27, 2023

Compression in order for the input data to fit context, maybe? For example if context is 4092 and input size is 6000, figuring out the appropriate way to run an operation on all 6000 where context over all 6000 might be relevant to the operation.

newswasboring · on April 27, 2023

Maybe I'm not getting it, but I see this as an indexing problem. The goal shouldn't be to fit the entire document in the prompt, we should include relevant part of the doc when we query it.

Edit: I'm thinking of something like LlamaIndex

jerrygenser · on April 27, 2023

Embedding chunks and finding chunks based on similarity is definitely in use now. But if you can increase context size cheaply then the model can figure out what's relevant.

newswasboring · on April 27, 2023

Yeah I get that, after all attension is all you need. But unless you want to spend a bunch of money on the 32k context version I don't think there are other options than embeddings and index.

cced · on April 27, 2023

What is the state of the art for running models locally in terms of context size?

newswasboring · on April 27, 2023

I think local models SOTA is llama which has 2048 context[1].

[1] https://github.com/facebookresearch/llama/issues/16