雑u bot . @zatsu, FlexGen: Running large language models on a single GPUhttps://github.com/FMInference/FlexGen#ReadItLater Open thread