Skip to content

Stop Clicking Buttons: Automate Gemini Fine-Tuning with GitHub Actions

The Google Cloud Console is great for exploration. But when you're experimenting with Gemini fine-tuning, manual clicking becomes a bottleneck.

If you need to iterate on training data—testing different examples, adjusting epochs, tuning hyperparameters—clicking through the Console UI to upload datasets and launch training jobs doesn't scale. Do it once? Fine. Do it ten times while iterating? Painful.

The solution: a fire-and-forget MLOps pipeline. Push a new training.jsonl to GitHub, and the rest happens automatically—data upload, job submission, training. No clicks, no manual API calls, just git push.

Why Automate Fine-Tuning?

Fine-tuning Gemini models can improve performance on domain-specific tasks, but the iteration process is tedious when done manually. Experimenting with different training datasets, epoch counts, and hyperparameters requires multiple training runs. Automating this workflow lets you iterate quickly without manual Console work—whether you're testing fine-tuning feasibility or running production jobs.

The Architecture: Fully Serverless

No complex infrastructure, no servers to maintain:

  • Compute: Google Vertex AI (serverless tuning jobs)
  • Model: Gemini 2.0 Flash (fast, cheap, effective)
  • Orchestration: GitHub Actions
  • Security: Workload Identity Federation—no long-lived keys
  • Storage: Google Cloud Storage for versioned datasets

Why GitHub Actions Over Self-Hosted CI?

I considered using Jenkins (I already have a private instance running). But GitHub Actions has a significant advantage for GCP workloads: native support for Workload Identity Federation.

GitHub's OIDC provider is already trusted by Google Cloud, making keyless authentication straightforward. With Jenkins, I'd either manage service account keys (the thing I'm trying to avoid) or configure a custom OIDC provider—more infrastructure to maintain.

For GCP-specific automation, GitHub Actions handles authentication better out of the box.

Step 1: Keyless Authentication with Workload Identity Federation

One of the biggest security risks in CI/CD pipelines is long-lived credentials. Service account keys stored in GitHub Secrets work, but they come with risks: they can leak, they don't expire automatically, and rotating them requires manual effort.

Google's Workload Identity Federation (WIF) offers a better approach. It allows GitHub to exchange its own OIDC token for a short-lived Google Cloud access token.

You configure a "trust bridge" once (an Identity Pool), and GitHub authenticates based purely on repository identity:

.github/workflows/tune-model.yml
- name: 'Authenticate to Google Cloud'
  uses: 'google-github-actions/auth@v2'
  with:
    workload_identity_provider: '${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}'
    service_account: '${{ secrets.GCP_SERVICE_ACCOUNT }}'

No keys, no rotation, no risk. The authentication setup requires a few gcloud commands once (see the repo for the full script), but after that, your pipeline is secure by default.

Step 2: Validate Before You Upload

The #1 reason tuning jobs fail is malformed training data. Vertex AI expects a specific JSONL structure (systemInstruction, contents, role: model, etc.). A single trailing comma or missing field crashes the job 30 minutes in with a vague error.

The repo includes a validation script (validate_jsonl.sh) you can run locally before pushing:

./scripts/validate_jsonl.sh data/training.jsonl
# Output:
# ✅ JSON syntax check passed.
# ✅ Schema validation passed.

The script checks:

  • Valid JSON syntax on each line
  • Required fields (contents, parts, text)
  • Correct roles (user vs model)
  • Optional systemInstruction structure

You could keep it generic—just verify valid JSONL and call it a day. But training is expensive and slow. A failed job 45 minutes in because of a malformed example is frustrating. Running validation locally before you push saves that debugging time.

Adding this as an automated step in the GitHub Actions workflow would be a valuable enhancement—it would catch errors before they reach the cloud.

Customize Validation for Your Schema

The validation script in the repo is a starting point that checks Gemini's expected format. With LLMs, customizing it for your specific use case is trivial—describe your training data schema to Claude or GPT-4, ask for a jq filter or Python validator, and you're done. Run it locally before you push to catch errors early.

Step 3: The "Fire-and-Forget" Workflow

Fine-tuning jobs take 30 minutes to a few hours. I didn't want my GitHub Action burning billable minutes waiting.

The workflow is asynchronous:

  1. Upload: Timestamps the dataset and uploads it to a structured GCS path: gs://my-bucket/data/20251203-120000/training.jsonl
  2. Trigger: Hits the Vertex AI REST API directly via curl
  3. Exit: Grabs the Job ID, prints it to the summary, and exits

This keeps CI/CD runtime under 30 seconds, regardless of dataset size or training duration.

Why curl Instead of gcloud?

The gcloud CLI sometimes lags behind the REST API for newer features. Using curl ensures I can access the latest API parameters immediately—especially important for new models like Gemini 2.0:

Submitting a Tuning Job
TIMESTAMP=$(date +"%Y%m%d-%H%M%S")
GCS_PATH="gs://${BUCKET}/data/${TIMESTAMP}/training.jsonl"

# Upload dataset
gcloud storage cp data/training.jsonl "$GCS_PATH"

# Submit tuning job via REST API
curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/tuningJobs" \
  -d '{
    "baseModel": "gemini-2.0-flash-001",
    "supervisedTuningSpec": {
      "training_dataset_uri": "'$GCS_PATH'",
      "hyper_parameters": {
        "epoch_count": 3,
        "adapter_size": "ADAPTER_SIZE_ONE"
      }
    }
  }'

The API responds with a Job ID, which you can monitor in the Vertex AI Console.

Step 4: Cost Estimation Before Training

Training isn't free—Gemini Flash costs approximately $3 per 1M tokens. Before launching a job, I run a token counting workflow:

scripts/count_tokens.py
import sys
from google import genai

def count_tokens(file_path):
    client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])

    with open(file_path) as f:
        data = [json.loads(line) for line in f]

    total_tokens = 0
    for example in data:
        # Count tokens in system instruction
        if 'systemInstruction' in example:
            system_text = example['systemInstruction']['parts'][0]['text']
            tokens = client.models.count_tokens(
                model='gemini-2.0-flash-001',
                contents=system_text
            )
            total_tokens += tokens.total_tokens

        # Count tokens in conversation
        for turn in example['contents']:
            text = turn['parts'][0]['text']
            tokens = client.models.count_tokens(
                model='gemini-2.0-flash-001',
                contents=text
            )
            total_tokens += tokens.total_tokens

    return total_tokens

if __name__ == '__main__':
    file_path = sys.argv[1]
    tokens = count_tokens(file_path)
    cost = (tokens / 1_000_000) * 3.0 * 3  # 3 epochs
    print(f"Total tokens: {tokens:,}")
    print(f"Estimated cost: ${cost:.2f}")

Run this as a separate GitHub Action (count-tokens.yml) before committing to a full training run. It helps avoid surprises on your GCP bill.

Repository Structure

The full pipeline is open source:

gemini-tuning-pipeline/
├── .github/workflows/
│   ├── tune-model.yml          # Main training workflow
│   └── count-tokens.yml        # Cost estimation
├── data/
│   ├── training.jsonl          # Sample dataset
│   └── raw/                    # Preprocessing space
├── scripts/
│   ├── validate_jsonl.sh       # Schema validation
│   └── count_tokens.py         # Token counter
└── README.md                   # Complete setup guide

🔗 github.com/lidenlab/gemini-tuning-pipeline

Fork it, set up the Workload Identity Federation credentials, and you'll have your own tuning factory running in under 10 minutes.

Key Takeaways

1. Automate Early Don't rely on Console UIs for production workflows. Manual clicks aren't reproducible or auditable.

2. Validate Locally First Catch data errors before they leave your laptop. A 30-second validation script saves 45 minutes of failed training jobs.

3. Use Keyless Auth Workload Identity Federation is both safer and easier than managing service account keys. No rotation, no leaks.

4. Keep CI/CD Fast Fire-and-forget workflows keep your GitHub Actions runtime under 30 seconds. Let Vertex AI handle the long-running work.

5. Estimate Costs Upfront Token counting workflows prevent surprise bills. Know what you're spending before you commit.

What's Next?

This workflow is evolving. Future enhancements include:

  • Automatic evaluation: Run test datasets against the tuned model post-training
  • Multi-region deployment: Distribute jobs across regions for faster training
  • Hyperparameter tuning: Automate epoch count and adapter size optimization
  • Model versioning: Track tuned models with semantic versioning in GCS

Contributions and feature requests are welcome!

Conclusion

Fine-tuning shouldn't require clicking through consoles. With GitHub Actions, Vertex AI's REST API, and Workload Identity Federation, you can build an automated fine-tuning workflow in an afternoon—a solid foundation whether you're experimenting or moving toward production.

The result: push code, get a tuned model. No infrastructure, no keys, no manual steps. Just automation.

Stop clicking buttons. Start shipping models. 🚀

👉 Try the pipeline