On cold starts: Cold starts weren't an issue in practice. LLM inference and workflow execution dominate latency, a 2-3 second cold start is negligible in a 30+ second workflow. Containers are lightweight (Python-slim, no heavy ML frameworks since Gemini runs server-side). If you need always-on, configure --min-instances=1.
On coordination: The orchestrator prompt enforces sequential execution with explicit verification, each agent must return tool_output before proceeding. Retries are configured, and errors trigger user notification rather than silent failures. This keeps the UX transparent. Also Agent Engine's session management helped here, persistent state keeps the orchestrator warm and maintains context across the workflow.
Recommendation: Run evals to measure end-to-end performance and identify bottlenecks. That data informs whether you need minimum instances or other optimizations.
On cold starts: Cold starts weren't an issue in practice. LLM inference and workflow execution dominate latency, a 2-3 second cold start is negligible in a 30+ second workflow. Containers are lightweight (Python-slim, no heavy ML frameworks since Gemini runs server-side). If you need always-on, configure --min-instances=1.
On coordination: The orchestrator prompt enforces sequential execution with explicit verification, each agent must return tool_output before proceeding. Retries are configured, and errors trigger user notification rather than silent failures. This keeps the UX transparent. Also Agent Engine's session management helped here, persistent state keeps the orchestrator warm and maintains context across the workflow.
Recommendation: Run evals to measure end-to-end performance and identify bottlenecks. That data informs whether you need minimum instances or other optimizations.