Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GPT-4 presumably.


I keep bumping in to the context window size. I'm trying to figure out a "compression" step that I can use in the general case, but nothing's very satisfying so far.

The mozilla/readability library is a good first step though.


Yeah I have the same issue with context window size. Personally I'm just waiting for future LLM's with 10x-100x context window. However, someone else recently came up with this solution:

https://news.ycombinator.com/item?id=35488291


I feel like the next step is to quantize or otherwise downsize old tokens so that they can fit more in memory at once. Not sure what the implications of a mixed-float-size model would be.


Why do you want to compress this data? What's the final use case here?


Imagine a browser plugin that pops up a modal. I write a query related to the current page, eg "Please summarize this page in three paragraphs, and translate to Turkish" or "There's a recipe somewhere on this page. Please suggest some variations on the filling" or "Can you make me a list of all the people mentioned on this page". Whole page (or at least the meat of it) gets bundled up with the query and sent to openai. What I'm trying to build is a simple in-browser swiss army knife.

Yes, I could try to figure out which bits of the page need to be sent along with the prompt, but that's hard in the general case. Squeezing a bit more out of the prompt window by stripping out unnecessary boilerplate is easy by comparison. (Multi-page articles are another headache).


Due to the constrained context window this is indeed a problem. But I would say solving it by just increasing context window will be really brute forcing the issue. I hope we can come up with something better. I'm betting on embeddings for these kind of things in my personal projects. But that too seems like a jackhammer for a nail kind of thing for single web pages.


Compression in order for the input data to fit context, maybe? For example if context is 4092 and input size is 6000, figuring out the appropriate way to run an operation on all 6000 where context over all 6000 might be relevant to the operation.


Maybe I'm not getting it, but I see this as an indexing problem. The goal shouldn't be to fit the entire document in the prompt, we should include relevant part of the doc when we query it.

Edit: I'm thinking of something like LlamaIndex


Embedding chunks and finding chunks based on similarity is definitely in use now. But if you can increase context size cheaply then the model can figure out what's relevant.


Yeah I get that, after all attension is all you need. But unless you want to spend a bunch of money on the 32k context version I don't think there are other options than embeddings and index.


What is the state of the art for running models locally in terms of context size?


I think local models SOTA is llama which has 2048 context[1].

[1] https://github.com/facebookresearch/llama/issues/16




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: