FAQ SHEET - Assessment - Coalition for Sustainable AI

Section A – Evaluation Setup

Which exact benchmarks are used for the quality gate? Could you confirm the complete list of benchmarks (and question counts) used to compute the quality score for Text-to-Text? Is the list fixed between Round 1 and Round 2? We will not disclose benchmarks used for evaluation but it will be the same for Round 1 and Round 2. We only release tasks categories. Note that we will not use benchmarks that require tool-use or agents. There will be evaluation on indic benchmarks.
How is energy measured and aggregated? Energy consumption will be computed over the full benchmark suite. We do not normalize by token, because energy consumption is directly linked to the number of tokens produced by the model. If a model is more verbose than another it will have a negative impact on its overall consumption which we want to take into account in the final ranking.
Will you publish baseline energy numbers before Round 1? We have not set an energy baseline. If you meet the 80% quality threshold, you will then be ranked according to your model’s energy consumption (lesser is better)

Can we deliver our solutions in container? For evaluation, the organizing team will use the commands ‘vllm serve –config vllm_config.yaml’ or ‘llama-server -hf model_hf’ to perform the evaluation. As long as the submission work with these commands and the base model is the primary model in the inference of the submitted model, the submission will be allowed.
What is the best solution to deliver my solution on Hugging Face? You will need to upload (i) model weights (ii) README with information on the process you have applied to the model (iii) a config file for vllm or llama.
Is our max_model_len setting in vllm_config.yaml respected as-is by the evaluation harness? max_model_len parameter will not be override by the evaluation harness. Note that decreasing this parameter too low can affect your performance on benchmarks.
Will the submission form accept both a vllm_config.yaml or a llama_config.yaml in the same repo, or must we pick one engine per submission ? You must pick one engine.

Is quantization allowed? Yes!
Is distillation allowed? The base model shall be the primary model in the inference of the submitted model. As such, participants could use distillation only if the base model is the student model.
Is finetuning on the evaluation allowed? We don’t allow further finetuning after compression
Is reducing num_experts_per_tok (top-k routing) permitted? Any optimization technique is allowed. Any modification to the model architecture shall be justified.

Please remove all parameters from your vLLM config file that is specific to your infrastructure, for example : tensor-parallel-size / swap_space / no-enable-log-outputs
We remind the participants that we are running our evaluation on : one L4 GPU (Image-to-Text, Audio-to-Text), one A100 GPU (Text-to-Text)
We fixed a time limit for the model to generate an answer which corresponds roughly to the time it takes for the base model. – As such, we want to warn the participants that llama.cpp is very slow and is way less efficient to run batched inference resulting in slower inference and poorer energy score.
Be sure to check that you model gets loaded with your inference server and keep the exact requirements you used (pip freeze, docker container…) to have for debugging with us in case of issue at inference time on our side. If you are able te send your model relatively early (not the day of the deadline) we should be able to have more time debugging with you if the model can’t run on our side.
Try to have a README.md that includes : 1) clear and detailed explanations of your approach for compression (+ possibly some benchmark scores you have run) to give us a good understanding of what you did for the challenge ; 2) explanations of the extra args you could have set in your vllm_config.yaml file ; 3) any identified issue at server launch or inference time that could be useful for us to run the evaluations

The organisation HuggingFace account to be shared on the submission day.

Should you have any further technical questions, please contact the evaluation team: resilientchallenge2026@peren.gouv.fr