Skip to content
  • yuneng-jiang's avatar
    6ff668c7
    [Infra] Promote internal staging to main (#27245) · 6ff668c7
    yuneng-jiang authored
    
    
    * default requested_model to empty string on litellm-side rejects
    
    * Update litellm/router.py
    
    Co-authored-by: default avatargreptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
    
    * fix: scope key access_group_ids override by team's assigned groups
    
    A team member could set any access_group_ids on their key (e.g. a group
    assigned only to a different team) and override the team's model
    restriction. Intersect the key's access_group_ids with team_object.access_group_ids
    in _key_access_group_grants_model so foreign groups are dropped before
    model expansion. Adds a regression test that asserts expansion is never
    called for foreign groups.
    
    * [Fix] Proxy: Skip Personal Budget Hook When Reservation Covers Counter
    
    The reservation path (PR #26845) atomically pre-fills `spend:user:{user_id}`
    and admits at the strict-`<` boundary. The legacy `_PROXY_MaxBudgetLimiter`
    pre-call hook re-reads the same counter with `>=`, so a reservation that
    fills the counter to exactly `max_budget` (e.g. a request without a
    `max_tokens` cap that falls back to reserving the smallest remaining
    headroom) is rejected by the hook even though the reservation already
    admitted it.
    
    Skip the hook when the request's active `budget_reservation` covers
    `spend:user:{user_id}`. The reservation is the source of truth for that
    counter cross-pod; the legacy `>=` path remains in place for requests
    without a reservation (e.g. paths that bypass the reservation entirely).
    
    Reproduces as `tests/otel_tests/test_prometheus.py::test_user_budget_metrics`
    on a fresh user with `max_budget=10` calling `fake-openai-endpoint` without
    `max_tokens`. Adds focused unit coverage in
    `tests/test_litellm/proxy/hooks/test_max_budget_limiter.py`.
    
    * harden bedrock file bucket validation
    
    * Fix syntax errors from botched merge in router.py
    
    * Fix Vertex batch output edge cases
    
    * [Fix] RBAC: Drop management_routes Write Fallback for Admin Viewer
    
    Greptile P1: the unsafe-method branch of `_check_proxy_admin_viewer_access`
    ended with a blanket `if route in management_routes: return`. That set is a
    mix of reads (info/list — handled via the safe-method GET branch above) and
    writes. The fallback let Admin Viewer POST to write endpoints not enumerated
    in `_ADMIN_VIEWER_BLOCKED_WRITE_ROUTES`, including:
      - /team/block, /team/unblock, /team/permissions_update
      - /jwt/key/mapping/{new,update,delete}
      - /key/bulk_update
      - /key/{key_id}/reset_spend
    
    Remove the fallback. The two remaining allow sets (admin_viewer_routes and
    global_spend_tracking_routes) are both read-only, so removal does not affect
    the legitimate POST-as-read cases (e.g. /spend/calculate, which is in
    spend_tracking_routes ⊂ admin_viewer_routes).
    
    Tests:
      - 8 new parametrized cases pinning each previously-leaking management write
        endpoint to 403 on POST for PROXY_ADMIN_VIEW_ONLY.
    
    * fix(tests): anchor VCR redis cassette key to repo root
    
    `os.path.relpath` with no `start` arg uses the current working
    directory, so running pytest from a subdirectory produced a
    different Redis key than running from the repo root. CI-recorded
    cassettes and locally-replayed runs would silently miss each
    other's cache.
    
    Anchor the path to the repo root (derived from `__file__`) so the
    key is stable regardless of CWD.
    
    https://claude.ai/code/session_018uCx7pcrkdUJZrCVMaTdPx
    
    * fix: gate key access_group override on group's own assignment
    
    Replaces the previous intersect-with-team.access_group_ids check, which
    made the override unreachable in practice (the team-gate fallback already
    covered every case the intersection allowed). The override now resolves
    each of the key's access_group_ids via get_access_object and accepts the
    group only if its assigned_team_ids includes the key's team_id, or its
    assigned_key_ids includes the key's token. This fulfills the original ask
    (a key can extend a team's allow-list via a group the admin granted to
    that team or that specific key) while still rejecting foreign groups
    referenced by team members of other teams.
    
    * [Fix] Proxy/Key Management: Honor team_member_permissions /key/list In /key/list Endpoint
    
    When a team grants /key/list via team_member_permissions, non-admin members
    should see all keys for that team — same as a team admin. Previously the
    classification in list_keys() only checked admin status, so permitted
    members fell into the service-account-only path and could not see other
    members' personal keys. Routes those members into the full-visibility set.
    
    * Fix access-group bypass via litellm-model fallback path
    
    When _get_all_deployments returns 0 candidates and the litellm-model
    fallback branch (_get_deployment_by_litellm_model) finds deployments that
    the access-group filter then empties, _access_group_filter_emptied_candidates
    remained False (it was captured before that branch ran). The router would
    then proceed to default fallbacks; the fallback model could have no
    access_groups and short-circuit the filter, silently serving a caller
    blocked by access-group restrictions.
    
    Update the flag inside the litellm-model branch when filtering empties a
    non-empty candidate set so the default-fallback guard still triggers.
    
    * fix(proxy): redact MCP server URL and headers for non-admin viewers (VERIA-8)
    
    Many MCP integrations (Zapier, etc.) embed an upstream API key
    directly in the server URL, e.g.
    ``https://actions.zapier.com/mcp/<api-key
    
    >/sse``. The list and
    single-server endpoints were returning the full URL to any
    authenticated user — `_redact_mcp_credentials` only stripped the
    explicit ``credentials`` field, and `_sanitize_mcp_server_for_virtual_key`
    only ran for restricted virtual keys. Non-admin internal users could
    read the dashboard, click the unmask toggle, and exfiltrate the raw
    token.
    
    Add `_sanitize_mcp_server_for_non_admin` that runs on top of the
    existing credential redaction and clears the credential-bearing
    fields:
    
    - ``url`` (the primary leak vector)
    - ``spec_path`` (OpenAPI spec URLs that may carry tokens)
    - ``static_headers`` / ``extra_headers`` (Authorization)
    - ``env`` (arbitrary secrets)
    - ``authorization_url`` / ``token_url`` / ``registration_url``
    
    Identity fields (``server_id``, ``alias``, ``mcp_info``, etc.) are
    preserved so the UI can still list servers a non-admin's team has
    access to.
    
    Apply the new sanitizer in `fetch_all_mcp_servers` and the per-server
    fetch path right after the existing virtual-key branch. Update the
    existing `test_list_mcp_servers_non_admin_user_filtered` assertions
    that previously checked URL visibility.
    
    Frontend defense-in-depth: hide the URL unmask toggle on
    `mcp_server_view.tsx` unless the viewer is a proxy admin.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * Fix runtime policy attachment initialization
    
    Mark runtime-created policies and attachments initialized so global policy attachments created from the policy builder apply immediately without requiring a restart.
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * test(router): cover _try_early_resolve_deployments_for_model_not_in_names
    
    The router_code_coverage CI check requires every function in router.py to
    be referenced by at least one test under tests/{local_testing,
    router_unit_tests,test_litellm} in a file with "router" in its name.
    The recently-extracted helper had no direct test, so the check failed
    with "0.45% of functions in router.py are not tested".
    
    Add a focused test that exercises the four return paths: model already
    in self.model_names, no fallback applies, pattern-router match, and
    default_deployment substitution (also asserting the stored default
    isn't mutated).
    
    https://claude.ai/code/session_019AVp1XL7RT9RxRe4qRLkay
    
    
    
    * Fix policy registry teardown in tests
    
    Reset the policy ID index during policy engine test cleanup so stale policy versions cannot leak between tests.
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * fix(batches): count non-chat tokens, validate batch-file model access (VERIA-39) (#27015)
    
    * fix(batches): count non-chat tokens and validate every model in batch file
    
    Two security control bypasses on POST /v1/batches:
    
    1. `_get_batch_job_input_file_usage` only summed tokens for
       `body.messages` (chat completions). Embedding (`input`) and text
       completion (`prompt`) batches reported zero, letting massive
       non-chat workloads slip past TPM rate limits. Extend the counter
       to handle string and list shapes for both fields.
    
    2. The batch input file was forwarded to the upstream provider
       without inspecting the models named inside the JSONL — only the
       outer `model` query parameter was checked against the caller's
       allowlist. A caller restricted to gpt-3.5 could submit a batch
       targeting gpt-4o and the upstream would execute it under the
       proxy's shared API key.
    
    Add `_get_models_from_batch_input_file_content` (returns the
    distinct `body.model` values) and call it from
    `_enforce_batch_file_model_access` in the pre-call hook, which runs
    each model through `can_key_call_model` so the same allowlist
    semantics (wildcards, access groups, all-proxy-models, team aliases)
    the proxy enforces on `/chat/completions` apply here too. Any
    unauthorized model raises a 403 before the file is forwarded.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(batches): count pre-tokenized prompt/input shapes, classify 403 logs
    
    Two follow-ups from the Greptile review on the batch validation PR:
    
    1. P1 TPM bypass via integer token arrays. The OpenAI batch schema
       accepts ``prompt`` and ``input`` as ``list[int]`` (a single
       pre-tokenized prompt) or ``list[list[int]]`` (multiple) in addition
       to the string and ``list[str]`` shapes. Pre-fix only the string
       shapes were counted, so a caller could submit a batch with hundreds
       of millions of pre-tokenized tokens and the rate limiter would
       record zero. Extract the per-field logic into
       ``_count_prompt_or_input_tokens`` and count each int as one token.
    
    2. P2 access-denial logs were indistinguishable from I/O failures.
       ``count_input_file_usage`` caught every exception under a generic
       "Error counting input file usage" message, so an intentional 403
       from ``_enforce_batch_file_model_access`` looked the same in the
       logs as a missing file or a Prisma timeout. Catch ``HTTPException``
       separately and log 403s at WARNING level with a security-relevant
       message before re-raising.
    
    Tests cover the new shapes: single ``list[int]``, ``list[list[int]]``
    (the worst-case bypass vector), and embeddings ``input`` with
    pre-tokenized arrays.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(proxy): re-validate user_id after /user/info re-parses query (#27009)
    
    * fix(proxy): re-validate user_id ownership after /user/info re-parses query
    
    The route-level access check in `RouteChecks.non_proxy_admin_allowed_routes_check`
    reads `request.query_params.get("user_id")`, which decodes literal `+` to
    spaces. The endpoint then re-parses the raw query string with `urllib.unquote`
    in `get_user_id_from_request` to preserve `+` characters (so plus-addressed
    emails work as user_ids). Those two paths produce different ids: a caller
    who registered a user_id containing a literal space could pass the route
    check and then read another user's row by sending the encoded `+` form.
    
    Add `_enforce_user_info_access` and call it after `_normalize_user_info_user_id`
    returns the final id. Proxy admin / view-only admin still bypass; everyone
    else must match the resolved user_id (or have no user_id, which falls back
    to the caller's own id later in the handler).
    
    Tests cover the admin bypass, owner-match path, and the cross-user lookup
    that this change blocks.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(proxy): apply user_info ownership check to PROXY_ADMIN_VIEW_ONLY
    
    `_enforce_user_info_access` was bypassing both PROXY_ADMIN and
    PROXY_ADMIN_VIEW_ONLY, but the upstream route check in
    `RouteChecks.non_proxy_admin_allowed_routes_check` only treats
    PROXY_ADMIN as a true admin for the `/user/info` route — view-only
    admins go through the `user_id == valid_token.user_id` enforcement
    along with regular users. Mirroring that asymmetry left the same
    encoded-`+` bypass open for view-only admins whose user_id contains a
    literal space.
    
    Drop the PROXY_ADMIN_VIEW_ONLY exemption so the post-decode re-check
    matches the upstream rule. Update tests: a view-only admin must now
    be blocked from cross-user lookups but still allowed to read their
    own row.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: default avataryuneng-jiang <yuneng@berri.ai>
    Co-authored-by: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * feat(spend-logs): opt-in suppression of stack traces in spend-tracking error logs
    
    Adds LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS env var. When set to true and the
    proxy log level is INFO or above, spend-tracking error paths emit a single
    ERROR line without the full traceback. Stack traces are preserved at DEBUG
    and the Sentry / proxy_logging_obj.failure_handler path is unchanged.
    
    The new spend_log_error helper is wired through the spend write hot path:
      - DBSpendUpdateWriter (update_database, _update_*_db, batch upsert,
        redis-commit fallbacks)
      - _ProxyDBLogger._PROXY_track_cost_callback
      - get_logging_payload exception path
      - update_spend / update_daily_tag_spend / spend logs queue monitor
    
    Resolves LIT-2704.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * fix(spend-logs): preserve no-traceback behavior for update_daily_tag_spend
    
    This call site previously logged a single-line error via verbose_proxy_logger.error()
    with no traceback. Switching it to spend_log_error(..., exc=e) caused a full stack
    trace to render by default (when LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS is unset),
    which contradicts the PR goal of leaving default behavior unchanged. Revert this
    specific site to the original error log call.
    
    * fix(spend-logs): preserve no-traceback behavior for update_daily_tag_spend
    
    Bugbot caught a regression: the previous error log here was a single-line
    verbose_proxy_logger.error(...) with no traceback. spend_log_error attaches
    the active exception's traceback by default (when the suppression env var
    is unset), so swapping it in changed default behavior. Revert this one site
    to its original .error() call to keep the PR strictly opt-in.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * feat(spend-logs): suppress traceback in SpendLogs error_information row
    
    Extend LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS to the failure callback so the
    per-row Metadata pane in the UI no longer shows the stack trace when the
    opt-in env var is set, matching the existing console-side suppression.
    
    https://claude.ai/code/session_014dztoRbRnRvq54HL9EyHx6
    
    
    
    * [Fix] Proxy: Repair Merge Fallout In Router-Override Fallback Auth
    
    Conflict resolution for #26968 dropped the `Iterator` typing import
    (NameError at module load), left a dead `fallback_models = cast(...)`
    block, and the new tests called `_enforce_key_and_fallback_model_access`
    without the now-required `request` kwarg.
    
    * isolate dual OTEL handlers
    
    * harden cloud file compatibility path
    
    * harden cloud file compatibility path
    
    * [Fix] Proxy/Key Management: Align Key-Org Membership Checks On Generate And Regenerate
    
    Mirrors the membership rule on /key/update so that /key/generate and
    /key/{key}/regenerate apply the same `_validate_caller_can_assign_key_org`
    gate when the caller specifies an `organization_id`. Proxy admins bypass.
    The check no-ops when `organization_id` is not being set.
    
    * thread trusted params through vertex file content
    
    * trust only server legacy file flag
    
    * chore(proxy): keep public AI hub unauthenticated
    
    * fix(proxy): preserve low-detail readiness status
    
    * [Test] Anthropic: Replace Legacy Claude-4-Sonnet Alias With Haiku 4.5
    
    Three live-API tests pinned to claude-4-sonnet-20250514, which is a
    non-canonical alias of claude-sonnet-4-20250514. Anthropic's main API
    no longer resolves the legacy form under freshly issued keys, so the
    tests fail with not_found_error. The token counter test pinned to
    claude-sonnet-4-20250514 itself (deprecation_date 2026-05-14, two weeks
    out) was on borrowed time too.
    
    Bump all four to claude-haiku-4-5-20251001 — capability superset for what
    these tests exercise (streaming, parallel tool calling, extended thinking,
    token counting), no upcoming deprecation, cheaper per-token.
    
    * chore(proxy): move URL-valued model/file_id guard from SDK to proxy
    
    The previous per-provider guards in HuggingFace, Oobabooga, and Gemini
    files lived in the SDK layer, breaking SDK callers who legitimately pass
    URL-valued model identifiers. Move the check to the proxy boundary in
    add_litellm_data_to_request so SDK users keep working while proxy users
    default-deny URL-valued model and file_id, with admin opt-in via
    litellm.provider_url_destination_allowed_hosts.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * [Chore] Proxy/UI: Drop stray _experimental/out/chat/index.html
    
    This file is a regenerable UI build artifact that should not be tracked
    in source. Removing so the merge into litellm_internal_staging stays clean.
    
    * [Test] Anthropic Passthrough: Bump Streaming Cost-Injection Test To Haiku 4.5
    
    test_anthropic_messages_streaming_cost_injection hits the proxy's
    /v1/messages route, which routes via the anthropic/* wildcard to
    api.anthropic.com. The 404 surfaced in the test was Anthropic's own
    not_found_error propagated back through the proxy (visible from the
    x-litellm-model-id hash on the response — the proxy did route).
    
    Same root cause as the prior commit: the legacy claude-4-sonnet-20250514
    alias is no longer recognized by Anthropic's main API under the new key.
    Swap to claude-haiku-4-5-20251001 — same routing path, canonical model.
    
    * fix(proxy): handle ownership-recording failures after upstream create
    
    If record_container_owner raises after the upstream container is created,
    the user previously got a 500 with no usable container — they were billed
    for an unreachable resource. Move ownership recording into the create
    path's exception handling and split the two failure modes:
    
    - HTTPException from the recorder (auth conflicts) propagates verbatim
      so the client sees the real status code, not a generic LLM error.
    - Unexpected exceptions are logged and swallowed; the response is
      returned to the caller so they aren't billed for a container they
      can't address. The DB row stays untracked until an operator reconciles.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(guardrails): close post-call coverage gaps
    
    * fix(types): add /team/permissions_bulk_update to management_routes
    
    The blocklist check in _check_proxy_admin_viewer_access only fires for
    routes that match LiteLLMRoutes.management_routes — the bulk-update
    endpoint was missing from that list, so the test for view-only admins
    on /team/permissions_bulk_update fell through to "allow."
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * [Test] Anthropic Passthrough: Bump Thinking Tests Off Legacy Sonnet 4 Alias
    
    base_anthropic_messages_test.test_anthropic_messages_with_thinking and
    test_anthropic_streaming_with_thinking still pinned to
    claude-4-sonnet-20250514 — the same legacy alias Anthropic no longer
    recognizes under freshly issued keys. The other four tests in this base
    class already use claude-sonnet-4-5-20250929; these two were missed.
    
    Bump to claude-haiku-4-5-20251001 (supports_reasoning=true, no upcoming
    deprecation). Subclasses including TestAnthropicPassthroughBasic
    inherit these methods.
    
    * fix(guardrails): cover multi-choice output variants
    
    * fix(proxy): preserve public ai hub ui setting
    
    * fix(scim): cascade FK cleanup on user delete and surface block status in UI
    
    SCIM DELETE /Users/{id} previously called litellm_usertable.delete without
    clearing rows that FK back to the user, so Postgres rejected the delete with
    LiteLLM_InvitationLink_user_id_fkey and the SCIM caller saw a 500. Add a
    helper to drop invitation_link, organization_membership, and team_membership
    rows before the user delete (mirrors /user/delete in internal_user_endpoints).
    
    Also add a Status column to the Virtual Keys and Internal Users tables so
    admins can see at a glance which keys are blocked and which users SCIM has
    deactivated. SCIM-blocked keys carry a tooltip explaining the origin.
    
    Pin the dashboard's Node version to 20 via .nvmrc to match CI.
    
    * chore: update Next.js build artifacts (2026-05-02 03:21 UTC, node v20.20.2)
    
    * perf(proxy): cache container/skill ownership reads on the hot path
    
    Container ownership and skill rows are looked up on every retrieve /
    delete / list / file-content / chat-completion-with-skill call. The new
    stores wrapped raw Prisma queries with no cache, putting one DB
    round-trip on each request. Add an in-process TTL'd cache mirroring the
    _byok_cred_cache pattern in mcp_server/server.py: per-key (value,
    monotonic_timestamp), 60s TTL, 10000-entry cap with full-clear on
    overflow, invalidated by every write. Negative results (`None`) are
    cached too so untracked-resource checks also skip the DB.
    
    Tests cover: cache-after-first-hit, negative caching, write
    invalidation, no-caching-on-DB-error, TTL expiry, capacity eviction.
    56 tests pass.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * chore: update Next.js build artifacts (2026-05-02 03:39 UTC, node v20.20.2)
    
    * fix: remove traceback key instead of it being ""
    
    * fix: linting error
    
    * fix(scim): preserve scim_active on PUT when client omits the field
    
    A SCIM PUT may legally omit `active` (full-replace with the field
    absent). Pydantic fills the SCIMUser.active default of True, so the PUT
    handler was overwriting metadata.scim_active with True even when the
    client never sent it — silently reactivating a previously SCIM-blocked
    user and unblocking their keys.
    
    Use model_fields_set to detect whether the client actually sent
    `active`. If omitted, preserve the prior scim_active value and skip
    the cascade to virtual keys.
    
    Also drop comments added in this PR that just narrate what the code
    does; keep only the docstrings and the SQL-NULL pitfall note that
    explain non-obvious behaviour.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * fix(proxy): use set lookup for permitted agent filters
    
    * fix(mcp): redact command fields for non-admin server views
    
    * fix(proxy): forward decoded container ids after ownership checks
    
    * fix(caching): handle stale isolated Redis semantic index
    
    * fix(cloudflare): support response_text in streaming chunk parser
    
    Newer Cloudflare Workers AI models (e.g. Nemotron) emit 'response_text'
    instead of 'response' on streamed chunks. The non-streaming path was
    already updated to fall back to 'response_text' (#26385), but the
    streaming chunk parser still only read 'response', which caused
    streaming requests against those models to silently produce empty
    content.
    
    Mirror the non-streaming fallback in CloudflareChatResponseIterator.chunk_parser
    and add a streaming test for the response_text shape.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * Fix code qa
    
    * Address bugbot: drop dead encode/decode helpers; preserve empty custom_id
    
    - Remove unused _encode_gcp_label_value / _decode_gcp_label_value singular
      helpers; only the _chunks variants are actually called.
    - Use 'is not None' check for custom_id so empty-string custom_ids are
      still labeled and round-trip through batch outputs.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * Forward Vertex file content logging context
    
    * test vertex file content logging forwarding
    
    Co-authored-by: default avatarSameer Kankute <Sameerlite@users.noreply.github.com>
    
    * Fix Vertex batch output logging mutation
    
    * fix: don't mutate caller's logging_obj in _try_transform_vertex_batch_output_to_openai
    
    The method was overwriting logging_obj.optional_params, logging_obj.model,
    and logging_obj.start_time on the caller's Logging instance. When invoked
    from llm_http_handler.py's generic framework path, the framework's own
    logging_obj (which already went through pre_call) had its properties
    clobbered, causing model and start_time to reflect the last batch line's
    values rather than the original call context.
    
    Fix: create a fresh local Logging instance for the per-line transformation
    instead of mutating the incoming logging_obj. The caller's object is now
    left entirely untouched regardless of whether a logging_obj was passed in
    or not.
    
    Regression tests added to verify model, start_time, and optional_params
    are not mutated on the caller's logging_obj.
    
    Co-authored-by: default avatarSameer Kankute <Sameerlite@users.noreply.github.com>
    
    * feat: add opt-out flag for Vertex batch output transformation
    
    Adds litellm.disable_vertex_batch_output_transformation (default False).
    When True, afile_content returns raw Vertex predictions.jsonl untouched
    so users that parse candidates/modelVersion directly are not broken.
    
    * fix(anthropic,bedrock): omit thinking/output_config when reasoning_effort="none"
    
    Setting reasoning_effort="none" on Anthropic chat models (direct, Bedrock
    Invoke, Bedrock Converse, Vertex AI Anthropic, Azure AI Anthropic) crashed
    LiteLLM with:
    
      litellm.APIConnectionError: 'NoneType' object has no attribute 'get'
    
    Both the Anthropic chat transformation and Bedrock Converse called
    ``AnthropicConfig._map_reasoning_effort`` and assigned the ``None`` it returns
    for ``"none"`` directly to ``optional_params["thinking"]``. Downstream
    ``is_thinking_enabled`` then did ``optional_params["thinking"].get("type")``
    and crashed.
    
    Pop ``thinking`` (and on Claude 4.6/4.7, ``output_config``) instead of
    assigning ``None``, restoring the documented contract that
    ``reasoning_effort="none"`` means "do not enable thinking". This also
    prevents downstream Anthropic 400s ("thinking: Input should be an object",
    "output_config.effort: Input should be ...") if the bug were ever masked.
    
    Verified end-to-end against the live Anthropic API and Bedrock Converse
    on claude-opus-4-{5,6,7} and claude-sonnet-4-6, plus Bedrock Invoke for
    Claude 4.5/4.6. Vertex AI Anthropic and Azure AI Anthropic inherit the
    fixed ``map_openai_params`` from ``AnthropicConfig`` and need no further
    changes.
    
    * fix(vertex-ai): set response=null on batch error entries per OpenAI spec
    
    The Vertex batch output transformer was emitting both a populated 'response' and 'error' for failed batch entries. The OpenAI Batch output spec defines them as mutually exclusive: on error 'response' MUST be null. This broke any consumer using 'result["response"] is None' to detect failures.
    
    * test(vertex-ai): cover transformation_error path emits response=null
    
    * fix(security): sandbox jinja2 in gitlab/arize/bitbucket prompt managers
    
    DotpromptManager was hardened to render through
    ImmutableSandboxedEnvironment. The three sibling managers (gitlab,
    arize, bitbucket) were missed and still instantiate plain
    jinja2.Environment(), leaving the same attribute-traversal SSTI
    primitive open: a template fetched from a GitLab/BitBucket repo or
    Arize Phoenix workspace can reach __class__.__init__.__globals__ and
    execute arbitrary Python on the proxy host.
    
    Match the dotprompt pattern by switching all three to
    ImmutableSandboxedEnvironment. The sandbox blocks the dunder-traversal
    chain while leaving normal {{ var }} substitution intact, so the
    template surface is unchanged for legitimate use.
    
    Adds tests/test_litellm/integrations/test_prompt_manager_ssti.py
    (18 cases) verifying each manager's jinja_env is a sandbox, that
    classic SSTI payloads raise SecurityError, and that ordinary variable
    rendering still works.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * chore(proxy): drop client-supplied pricing fields from request bodies
    
    The proxy currently forwards request-body pricing parameters (the fields
    on `CustomPricingLiteLLMParams`, plus `metadata.model_info`) into the
    core call path. Those fields belong to deployment configuration, not to
    per-request input — sending them from a client mutates the request's
    recorded cost and, via `litellm.completion` → `register_model`, the
    process-wide `litellm.model_cost` map for every later caller in the
    worker. Strip them at the boundary.
    
    The strip set is built from `CustomPricingLiteLLMParams.model_fields` so
    pricing fields added later are covered automatically. Operators who do
    want clients to supply per-request pricing can opt back in per key or
    team via `metadata.allow_client_pricing_override = true`, mirroring the
    existing `allow_client_mock_response` and
    `allow_client_message_redaction_opt_out` flags.
    
    Tests cover the strip set's coverage, root and metadata strips, the
    opt-in skip on both key and team metadata, and a regression check that
    the global `litellm.model_cost` map is unmutated after a stripped
    request.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * chore(proxy): log stripped pricing fields at debug for operator visibility
    
    Operators upgrading would otherwise see client-supplied pricing overrides
    silently stop applying with no diagnostic. Emit a debug-level line listing
    the dropped fields and pointing at the opt-in flag when any are stripped;
    stay silent on the no-op path so the log isn't filled with noise.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(proxy): move pricing strip below the litellm_metadata JSON-string parse
    
    The strip ran before the proxy parses ``litellm_metadata`` from a JSON
    string into a dict (a path used by multipart/form-data and ``extra_body``
    callers), so ``isinstance(metadata, dict)`` was False and ``model_info``
    survived the strip. Move the call to the same post-parse position the
    ``user_api_key_*`` strip already uses for the same reason. Adds a
    regression test exercising the JSON-string ``litellm_metadata`` path.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * test(responses): replace legacy claude-4-sonnet alias in multiturn tool-call test
    
    Anthropic's main API no longer resolves the non-canonical 'claude-4-sonnet-20250514'
    alias for freshly issued keys, returning 404 not_found_error. PR #27031 already
    swept three other live tests pinned to this alias to claude-haiku-4-5-20251001
    but missed test_multiturn_tool_calls in the responses API suite, which is now
    failing reliably on PR CI runs (e.g. PR #27074, job 1603363).
    
    Bump the two model references in test_multiturn_tool_calls to the same
    claude-haiku-4-5-20251001 snapshot used by PR #27031 -- it covers everything
    this test exercises (tool calling, multi-turn) and isn't on a deprecation
    schedule.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * chore(proxy): close callback-config and observability-credential side channels
    
    Two related gaps in the proxy's request bouncer:
    
    1. ``is_request_body_safe`` (auth_utils.py) walked the request-body root
       and the ``litellm_embedding_config`` nested dict, but not ``metadata``
       or ``litellm_metadata``. The same fields it bans at root — Langfuse /
       Langsmith / Arize / PostHog / Braintrust / Phoenix / W&B Weave / GCS /
       Humanloop / Lunary credentials and routing — were silently accepted
       when the caller put them inside metadata, retargeting observability
       callbacks to a caller-controlled host with caller-supplied creds.
       Walk both metadata containers (and parse the JSON-string form sent via
       multipart / ``extra_body``) through the same banned-params helper, so
       the existing ``allow_client_side_credentials`` opt-in covers both
       paths consistently.
    
    2. The banned-params list was hand-maintained and lagged the canonical
       ``_supported_callback_params`` allow-list in
       ``initialize_dynamic_callback_params``. Derive the observability bans
       from that allow-list (minus a small ``_SAFE_CLIENT_CALLBACK_PARAMS``
       set for informational fields like ``langfuse_prompt_version`` and
       ``langsmith_sampling_rate``) so future integrations are covered
       automatically; ``_EXTRA_BANNED_OBSERVABILITY_PARAMS`` carries the
       handful of fields integrations read but the allow-list hasn't caught
       up to. A guard test fails CI if a new entry is added to
       ``_supported_callback_params`` without an explicit safe-list decision.
    
    Separately in ``litellm_pre_call_utils.py``: add ``callbacks``,
    ``service_callback``, ``logger_fn``, and ``litellm_disabled_callbacks``
    to ``_UNTRUSTED_ROOT_CONTROL_FIELDS``. The first three are appended to
    worker-wide ``litellm.{input,success,failure,_async_*,service}_callback``
    lists / ``litellm.user_logger_fn`` from inside ``function_setup`` — one
    request poisons every subsequent caller in that worker. The last is the
    inverse primitive: the legitimate path reads it from key/team metadata,
    the request-body version silently disables admin-configured audit /
    observability for the call.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(auth): per-param allow must continue, not return early
    
    A pre-existing logic bug in ``_check_banned_params``: when the
    deployment-level ``configurable_clientside_auth_params`` permitted one
    banned field, the loop ``return``-ed on the first match instead of
    ``continue``-ing, so any other banned param later in the same body or
    metadata dict was never checked. This PR's metadata walk multiplies the
    surface where that bypass matters — a body pairing an allowed
    ``api_base`` with an observability credential like ``langfuse_host``
    would silently pass.
    
    Proxy-wide ``allow_client_side_credentials`` keeps ``return`` (it's a
    global opt-in for every banned param). The per-param branch becomes
    ``continue`` so only the one explicitly-permitted field is skipped.
    
    Adds a regression test that exercises the api_base + langfuse_host pair.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(vector_store): resolve embedding config at request time, never persist creds
    
    The vector store create/update path previously called
    ``_resolve_embedding_config`` against the admin-configured router/DB
    model and persisted the resolved ``litellm_embedding_config`` dict
    (``api_key`` / ``api_base`` / ``api_version``) into the
    ``litellm_managedvectorstorestable.litellm_params`` column. Because the
    resolver expanded ``os.environ/...`` references via ``get_secret``, the
    DB row carried cleartext provider credentials, and the
    ``/vector_store/{new,info,update,list}`` responses returned them to any
    authenticated caller who could supply a known admin model name.
    
    Move the auto-resolve out of ``create_vector_store_in_db`` and out of
    the update path. Persist only the user-supplied ``litellm_embedding_model``
    reference. Resolve at request-handling time inside
    ``_update_request_data_with_litellm_managed_vector_store_registry`` so
    the resolved config lives in the per-request ``data`` dict and is
    garbage-collected after the response. Legacy rows that were created by
    an earlier proxy version and already carry a resolved
    ``litellm_embedding_config`` skip the re-resolution and pass through
    unchanged so embedding calls keep working.
    
    The ``new_vector_store`` response now also runs the existing
    ``_redact_sensitive_litellm_params`` masker (already used by ``info``,
    ``update``, and ``list``), defending against caller-supplied cleartext
    on the create path and against legacy rows whose persisted credentials
    are still in the database.
    
    Existing tests that asserted the old write-time-resolve behaviour are
    updated to assert the new persistence shape (no embedding config
    stored, just the model reference). Two new tests cover the use-time
    path: one asserting fresh resolution happens when a row carries only
    the model reference, the other asserting legacy rows with persisted
    config skip re-resolution and continue to work.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(vector_store): tighten registry-mutation comment and dedupe test helpers
    
    * fix(vector_store): cache use-time embedding-config resolution
    
    Hold the resolved config in a process-memory TTL cache so the
    request-handling path doesn't run litellm_proxymodeltable.find_first
    on every vector-store call.
    
    * fix(anthropic,bedrock,vertex): forward output_config.effort + 400 on garbage reasoning_effort
    
    Follow-up bugs surfaced by the QA sweep on PR #27039
    (https://github.com/BerriAI/litellm/pull/27039#issuecomment-4363363610
    
    ).
    
    1. Stop stripping output_config.effort on Bedrock + Vertex adaptive routes.
       - Vertex AI Claude 4.6/4.7 accepts output_config.effort on rawPredict
         (verified end-to-end against us-east5 / global). The strip helper now
         no-ops for effort.
       - Bedrock Converse routes output_config into additionalModelRequestFields
         for anthropic base models so the requested adaptive tier (low/medium/
         high/xhigh/max) actually reaches the wire instead of all collapsing to
         identical thinking.
       - Bedrock Invoke chat transformation (AmazonAnthropicClaudeConfig) stops
         popping output_config from the post-AnthropicConfig request body.
       - Bedrock Invoke /v1/messages allowlist (BedrockInvokeAnthropicMessagesRequest)
         now lists output_config so the runtime allowlist filter forwards it.
    
    2. Validate effort across Bedrock Converse so 'disabled' / 'invalid' / '' /
       unsupported tiers (xhigh/max on Sonnet 4.6 or budget-mode 4.5 models)
       surface as a clean 400 BadRequestError instead of 500.
    
    3. ValueError -> BadRequestError throughout (AnthropicConfig.map_openai_params,
       _apply_output_config, AmazonConverseConfig._handle_reasoning_effort_parameter).
       Empty-string effort is now rejected (was silently passing the
       'if effort and ...' short-circuit).
    
    4. Floor reasoning_effort='minimal' at the Anthropic provider minimum
       (1024 budget_tokens) via new ANTHROPIC_MIN_THINKING_BUDGET_TOKENS so it's
       a usable tier on direct Anthropic / Azure AI Anthropic / Vertex AI Anthropic /
       Bedrock Invoke (all of which 400 below 1024).
    
    5. model_prices: dedupe duplicate supports_max_reasoning_effort key on
       claude-opus-4-7 / claude-opus-4-7-20260416.
    
    Adds regression tests across all five affected paths; existing tests asserting
    the silent-strip behavior were updated to reflect the new pass-through and
    clean 400 surfaces.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * fix(constants): make ANTHROPIC_MIN_THINKING_BUDGET_TOKENS a plain constant
    
    The documentation CI test (tests/documentation_tests/test_env_keys.py)
    asserts every os.getenv() key in the source has a matching entry in the
    litellm-docs config_settings.md table. ANTHROPIC_MIN_THINKING_BUDGET_TOKENS
    tracks Anthropic's published wire-protocol minimum (1024) — it's not a
    user-tunable, so making it env-overridable was wrong anyway. Drop the
    os.getenv() wrapper; the value is now a plain literal.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * fix(anthropic,bedrock): correct effort error message and dedupe effort_map
    
    - Remove 'none' from the Bedrock _validate_anthropic_adaptive_effort error
      message; it was listed as a valid value but rejected by the membership
      check, leaving users in a feedback loop if they tried 'none'.
    - Hoist the duplicated reasoning_effort -> output_config.effort mapping
      out of AnthropicConfig.map_openai_params and
      AmazonConverseConfig._handle_reasoning_effort_parameter into a single
      AnthropicConfig.REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT class constant
      so the two routes cannot drift.
    
    * fix(anthropic): translate reasoning_effort on /v1/messages route
    
    Closes the remaining QA-sweep gap on PR #27074: Bedrock Invoke
    /v1/messages was silently ignoring ``reasoning_effort`` because the
    shared param filter only kept native Anthropic keys, so every effort
    tier collapsed to the same behavior on the wire (27/231 cells failing
    across opus-4-5 / opus-4-6 / sonnet-4-6).
    
    Map ``reasoning_effort`` to native Anthropic ``thinking`` /
    ``output_config.effort`` at the ``AnthropicMessagesConfig`` layer so
    all four /v1/messages routes (direct Anthropic, Azure AI, Vertex AI,
    Bedrock Invoke) inherit the same translation:
    
    - Add ``reasoning_effort`` to ``AnthropicMessagesRequestOptionalParams``
      so the param filter in
      ``AnthropicMessagesRequestUtils.get_requested_anthropic_messages_optional_param``
      no longer drops it before the transformation runs.
    
    - Add ``_translate_reasoning_effort_to_anthropic`` and call it from
      ``transform_anthropic_messages_request``. Mirrors
      ``AnthropicConfig.map_openai_params`` on the chat completion path
      (re-uses ``_map_reasoning_effort`` and
      ``REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT``) so the two routes
      cannot drift. Pops ``reasoning_effort`` so it never reaches the wire.
    
    - Caller-supplied native ``thinking`` / ``output_config.effort`` always
      win — same precedence as
      ``_translate_legacy_thinking_for_adaptive_model``.
    
    - Garbage values (``""``, ``"disabled"``, ``"invalid"``) raise
      ``AnthropicError(status_code=400)`` instead of falling through and
      surfacing as 500s from the provider.
    
    - ``"none"`` clears thinking + output_config so callers can opt out
      per request.
    
    Also restores the non-adaptive-model test coverage on Bedrock Invoke
    /v1/messages that the previous commit lost when
    ``test_bedrock_messages_strips_output_config`` was renamed to the
    ``forwards`` variant on Opus 4.7.
    
    Adds a new test file
    ``test_reasoning_effort_translation.py`` covering the translation at
    the shared config level (adaptive + non-adaptive models, none, garbage,
    caller precedence) so all four /v1/messages routes are exercised by a
    single suite.
    
    Adds parametrized + behavioral tests on the Bedrock Invoke /v1/messages
    suite covering: minimal/low/medium/high/xhigh/max mapping for adaptive
    models, thinking-budget mapping for non-adaptive Opus 4.5, ``none``
    clears both, garbage raises 400, explicit ``output_config`` wins.
    
    Refs: https://github.com/BerriAI/litellm/pull/27074
    
    
    
    * fix(anthropic,bedrock): reject unmapped reasoning_effort at mapping site
    
    Both the chat completion path (AnthropicConfig.map_openai_params) and the
    Bedrock Converse path (_handle_reasoning_effort_parameter) used
    REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get(value, value) which falls
    back to the raw input on unmapped keys. Combined with _map_reasoning_effort
    returning type='adaptive' for any string on Claude 4.6/4.7, garbage values
    (e.g. 'disabled') could leak into optional_params['output_config']['effort']
    unvalidated if map_openai_params ran without the downstream transform_request
    or _validate_anthropic_adaptive_effort check.
    
    Mirror the /v1/messages pattern: use .get(value) (no fallback) and raise
    BadRequestError immediately when the value is unmapped, co-locating
    validation with the mapping for defense in depth.
    
    * style: black formatting
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * fix(anthropic): stop class-attr leak; gate xhigh/max on every route
    
    The reasoning-effort mapping dict was a public class attribute on
    AnthropicConfig, so BaseConfig.get_config returned it as a request
    parameter and every Anthropic-backed call (Anthropic / Azure / Vertex /
    Bedrock Invoke) hit a 400 'REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT:
    Extra inputs are not permitted' from the provider. Move the mapping
    to a module-level constant.
    
    _supports_effort_level only looked the model up under
    custom_llm_provider='anthropic', so bedrock-prefixed model ids
    (e.g. bedrock/invoke/us.anthropic.claude-opus-4-7) returned False
    for both 'max' and 'xhigh' even when the underlying model entry has
    the flag set. Strip known provider prefixes and retry the lookup
    against litellm.model_cost directly so per-model gating works on
    every route.
    
    Mirror the per-model xhigh/max gate from
    AnthropicConfig._apply_output_config in
    AnthropicMessagesConfig._translate_reasoning_effort_to_anthropic so
    the /v1/messages route also raises a clean 400 instead of forwarding
    the unsupported tier.
    
    * feat(anthropic,bedrock): strip output_config under drop_params for non-effort models
    
    When a proxy fronts Claude Code (which always sends `output_config.effort`)
    at a pre-4.5 Anthropic model — haiku-3, sonnet-3.5, opus-3, sonnet-4 — the
    forwarded knob causes a forced 400 the client can't fix. Gating a strip
    behind the existing `drop_params` flag lets operators opt into silent
    fixup once and stop worrying about per-model param hygiene.
    
    Default (`drop_params=False`) still forwards and surfaces the provider's
    error, preserving the strict, debuggable contract from #27074.
    
    Per https://platform.claude.com/docs/en/build-with-claude/effort the
    supporting set is Opus 4.5+, Sonnet 4.6+, and Mythos Preview; everything
    else is dropped (with a verbose_logger warning so the strip is visible).
    Recognition uses model-name patterns plus a fallback to any
    `supports_*_reasoning_effort` flag in the model map for forward
    compatibility with new entries.
    
    https://claude.ai/code/session_01WjHq31rvXT6xYNdVmSJvRp
    
    
    
    (cherry picked from commit 1233943e7861ba8a9062f792310ebd401cb03db8)
    
    * fix(base_llm): filter all _-prefixed class attrs from get_config
    
    The drop_params strip work added `AnthropicConfig._EFFORT_SUPPORTING_MODEL_PATTERNS`
    as a private class-level lookup tuple. `BaseConfig.get_config()` only
    filtered the `__`-prefixed names plus `_abc` / `_is_base_class`, so
    `_EFFORT_SUPPORTING_MODEL_PATTERNS` would have leaked into the request
    body the same way `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT` did before
    the previous commit.
    
    Generalize the existing `_abc` / `_is_base_class` carve-outs to skip
    every `_`-prefixed name. `AmazonConverseConfig.get_config()` overrides
    the base method, so apply the same change there.
    
    Also unblocks future internal helpers from accidentally serialising into
    the wire body.
    
    * fix(anthropic): drive output_config.effort support from model map flags
    
    Replace hardcoded _EFFORT_SUPPORTING_MODEL_PATTERNS with a JSON-backed
    check that uses supports_*_reasoning_effort flags from the model map.
    Add supports_minimal_reasoning_effort: true to opus-4-5 and mythos-preview
    entries (which previously only carried supports_reasoning) so the JSON
    remains the single source of truth for effort capability.
    
    * fix(anthropic,bedrock,databricks): four reasoning_effort follow-ups
    
    - claude-sonnet-4-6 + reasoning_effort=max no longer 400s. Renamed
      _is_opus_4_6_model to _is_claude_4_6_model at three sites and added
      supports_max_reasoning_effort: true to 12 model entries in the JSON
      cost map (10 sonnet 4.6 ids + OpenRouter opus 4.6/4.7).
    - _map_reasoning_effort now raises BadRequestError(400) directly with
      llm_provider, instead of letting Databricks (and similar callers)
      surface its raw ValueError as a 500.
    - output_config.effort on Opus 4.5 over Bedrock no longer 400s for
      missing effort-2025-11-24 beta. Flipped JSON to "effort-2025-11-24"
      for bedrock + bedrock_converse and added an auto-attach branch in
      _process_tools_and_beta for non-adaptive Anthropic + output_config
      on Converse.
    - reasoning_effort=xhigh / =max on legacy budget-mode models
      (Haiku 4.5, Sonnet 4.5, Opus 4.5) now map to thinking.budget_tokens
      8192 / 16384 instead of returning 400. Added two constants in
      litellm/constants.py.
    
    Tests updated for all four flips. Validated end-to-end via 306-cell
    live proxy matrix (6 model families x 3 routes x 17 effort cases),
    all pass.
    
    * fix(databricks): validate reasoning_effort and set output_config on adaptive Claude
    
    The Databricks path called `AnthropicConfig._map_reasoning_effort` for
    Claude models but never validated the effort string nor set
    `output_config.effort` for adaptive models (Claude 4.6/4.7). Since
    `_map_reasoning_effort` returns `type=adaptive` for ANY non-None /
    non-"none" string on adaptive models (including "disabled",
    "invalid", ""), Databricks silently accepted garbage and emitted a
    request without an `output_config.effort`, collapsing every adaptive
    tier to identical behavior.
    
    Match the Anthropic native, Bedrock Converse, Bedrock Invoke, and
    /v1/messages paths: when the resolved `thinking` is non-None on a
    4.6/4.7 model, look up the value in
    `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT` and either raise a clean
    `BadRequestError` or set `optional_params["output_config"]`.
    
    * fix(azure): omit model from image generation and image edit deployment requests
    
    Azure OpenAI routes image gen/edit by deployment in the URL; sending the
    deployment id in model breaks gpt-image-2 (invalid_value). Strip model from
    JSON for deployments/.../images/generations and from multipart data for
    .../images/edits. Non-deployment URLs (e.g. Azure AI FLUX) unchanged.
    
    Fixes #26316.
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * test(azure): exercise image gen JSON filter via HTTP client; dedupe image edit URL
    
    - Image generation tests patch HTTPHandler.post / get_async_httpx_client so
      make_*_azure_httpx_request runs and wire json is asserted on call kwargs.
    - Azure image edit: strip model in finalize_image_edit_multipart_data using the
      same URL string the handler passes to POST (no second get_complete_url in
      transform). BaseImageEditConfig default finalize is a no-op.
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * fix(azure_ai/anthropic): promote output_config out of extra_body so validation runs
    
    `azure_ai` is registered in `litellm.openai_compatible_providers`, so
    `add_provider_specific_params_to_optional_params` (litellm/utils.py)
    auto-stuffs any non-OpenAI kwarg (e.g. `output_config={"effort": "..."}`)
    into `optional_params["extra_body"]`. `AzureAnthropicConfig.transform_request`
    then strips `extra_body` entirely on the way out, silently dropping the
    param — and `AnthropicConfig._apply_output_config` never sees it, so
    `effort="invalid"` / `effort="xhigh"` on a non-supporting model
    quietly reaches the model with default behavior instead of returning a
    clean 400 (as the native `anthropic` provider does).
    
    Promote the keys back to top-level `optional_params` (using `setdefault`
    so explicit top-level values win) before delegating to the parent
    `AnthropicConfig`. Apply in both `validate_environment` and
    `transform_request` so flag detection (`is_mcp_server_used`, etc.) and
    output-config validation both run.
    
    Surfaced by the QA matrix expansion on PR #27074: 20 cells where Azure
    returned 200 while `anthropic` returned 400 — all `output_config` mode
    across haiku_4_5, sonnet_4_5, opus_4_5, sonnet_4_6, opus_4_6, opus_4_7
    families with `effort` in {invalid, xhigh, max, low, medium, high}.
    
    Tests:
    * `test_output_config_promoted_from_extra_body`: valid effort reaches data
    * `test_invalid_output_config_effort_raises_via_extra_body`: 400 on bad effort
    * `test_unsupported_effort_xhigh_raises_via_extra_body`: 400 on xhigh-on-Sonnet-4.6
    * `test_extra_body_promotion_does_not_clobber_top_level`: setdefault semantics
    
    * test(image_gen): expect no model in Azure image edit multipart (#26316)
    
    Align test_azure_image_edit_litellm_sdk with deployment-scoped Azure edits.
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * refactor(anthropic): extract _validate_effort_for_model to prevent drift
    
    The chat completion path (`_apply_output_config`) and the /v1/messages
    pass-through (`AnthropicMessagesConfig._translate_reasoning_effort_to_anthropic`)
    both gate `max` / `xhigh` per model. The two sites had diverged from
    near-identical copies into separately maintained blocks, creating a real
    drift risk when a new model tier (e.g. Claude 4.8) lands -- a contributor
    could update one site and miss the other.
    
    Centralise the gating in `AnthropicConfig._validate_effort_for_model`,
    which returns an error message string or `None`. Each call site keeps
    its own provider-appropriate exception type (`BadRequestError` for the
    chat path, `AnthropicError` for the /v1/messages pass-through) but the
    gating decision now comes from one place. Net -11 LOC.
    
    Adds a parametrised unit test exercising the helper directly across
    4.5 / 4.6 / 4.7 model families and `max` / `xhigh` / lower-effort
    inputs. Existing tests at both call sites continue to pass unchanged.
    
    Addresses Greptile finding on PR #27074.
    
    * fix(databricks): narrow reasoning_effort_value to str for mypy
    
    `non_default_params.get("reasoning_effort")` returns `Any | None`,
    but `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get()` expects `str`.
    Mypy flagged this on the strict pass. Narrow with `isinstance` before
    the lookup; non-strings fall through to the existing `BadRequestError`
    below with a clean validation message, so behavior is unchanged.
    
    Fixes a regression introduced by 1a10746e95 in this PR.
    
    * feat(proxy): add health_check_reasoning_effort for model health checks
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * test(image_gen): align Azure image gen fixture with body omitting model
    
    Expected JSON matches deployment-scoped Azure POST (#26316).
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * test(anthropic/chat): force PR-local model_cost map via autouse fixture
    
    CI runs without LITELLM_LOCAL_MODEL_COST_MAP=True, so litellm.model_cost
    is loaded from main-branch JSON (default model_cost_map_url) instead of
    the PR's checked-out model_prices_and_context_window.json. Tests that
    assert per-model flags added in this PR (supports_max_reasoning_effort,
    supports_xhigh_reasoning_effort) therefore pass locally but fail in CI
    with 'AssertionError: assert False is True' on 5 cases:
    
      - test_anthropic_model_supports_effort_param_recognizes_supporting_models
        [anthropic.claude-mythos-preview, bedrock/.../mythos-preview,
         claude-opus-4-5-20251101]
      - test_supports_effort_level_handles_provider_prefixes
        [bedrock/invoke/us.anthropic.claude-sonnet-4-6-max-True,
         claude-sonnet-4-6-max-True]
    
    Add an autouse fixture at tests/test_litellm/llms/anthropic/chat/conftest.py
    that monkey-patches litellm.model_cost to the PR-local JSON for every test
    in this directory. The parent conftest already snapshots+restores
    litellm.model_cost per-function, so the mutation is contained.
    
    This is a scoped workaround. The proper fix is to set the env var
    globally in the test workflow once the ~10 inline self-set test files
    are audited; tracking that as a follow-up issue.
    
    * [Fix] Docker: Pin Wolfi And Uv To Multi-Arch Index Digests
    
    The previous pins resolved to single-platform amd64 manifests, so buildx
    pulled the same amd64 base for both linux/amd64 and linux/arm64 targets.
    The published OCI index then advertised an arm64 entry whose layers are
    byte-identical to amd64 -- arm64 users got an amd64 binary.
    
    Switch all three Dockerfiles to the multi-arch image-index digests:
      - cgr.dev/chainguard/wolfi-base   (index has linux/amd64 + linux/arm64)
      - ghcr.io/astral-sh/uv:0.11.7     (index has linux/amd64 + linux/arm64)
    
    Resolved with `docker buildx imagetools inspect <ref>` -- that returns
    the index digest. `docker pull` + `docker inspect` returns the per-host
    platform digest, which is what slipped in last time.
    
    * [Fix] Docker: Pin Uv To Multi-Arch Index Digest In Remaining Dockerfiles
    
    Apply the same fix to the three Dockerfiles not in the release pipeline
    today (alpine, dev, health_check) so they stay correct if/when they're
    built for arm64 in the future.
    
    Wolfi pins are not present in these files; the python:3.11-alpine and
    python:3.13-slim digests they already use are multi-arch indexes that
    include arm64/v8, so only the uv pin needed swapping.
    
    * fix(xai): fold reasoning_tokens into completion_tokens to satisfy OpenAI invariant
    
    xAI's chat completions API accounts reasoning_tokens separately from
    completion_tokens, but rolls them into total_tokens. This breaks the
    OpenAI invariant total_tokens == prompt_tokens + completion_tokens
    that downstream consumers (including litellm's own _usage_format_tests
    in tests/llm_translation/base_llm_unit_tests.py:58) rely on.
    
    Live capture (grok-3-mini-beta, 2026-05-04):
        prompt=14, completion=10, total=336, reasoning=312
        14 + 10 = 24, NOT 336.
    
    OpenAI's o1/o3 reasoning models include reasoning_tokens in
    completion_tokens, leaving the prompt+completion=total invariant
    intact. xAI deviates. This patch aligns xAI to OpenAI semantics by
    folding reasoning_tokens into completion_tokens after the parent
    OpenAI parser runs.
    
    The fold is idempotent and defensive:
    - Only fires when total_tokens == prompt_tokens + completion_tokens
      + reasoning_tokens (the documented xAI shape). Refuses to fold if
      the gap doesn't match, guarding against silent corruption when xAI
      changes accounting.
    - Skips if completion_tokens already covers the gap (already
      normalised — e.g. cost calc replays a previously-folded Usage).
    
    xai.cost_calculator.cost_per_token already added reasoning_tokens to
    the visible completion count for billing. Post-fold the Usage block
    now satisfies that invariant directly, so the cost calc would
    double-bill. Updated cost_per_token to detect the OpenAI-normalised
    shape (total == prompt + completion) and skip the reasoning add-on
    in that case, falling through to the legacy raw-shape behaviour for
    callers that bypass the transformation (e.g. proxy log replay).
    
    Tests:
    - Adds TestXAIReasoningTokenFolding covering: gap-explained-fold,
      idempotent-no-double-fold, no-reasoning-skip, gap-mismatch-skip.
    - Adds test_already_normalised_usage_does_not_double_count_reasoning
      to lock the cost-calc idempotency.
    - Updates 7 pre-existing cost-calc tests whose total_tokens was
      internally inconsistent (used the OpenAI-normalised total but kept
      reasoning_tokens external) to use the documented xAI raw shape
      total = prompt + visible completion + reasoning. Pre-existing
      values masked the missing-fold by accident.
    
    Verified end-to-end against the live xAI API:
        LITELLM_LOCAL_MODEL_COST_MAP=False (CI default) +
        XAI_API_KEY set +
        pytest tests/llm_translation/test_xai.py::TestXAIChat::test_prompt_caching
            -> PASSED in 18.81s (was: AssertionError on
            usage.total_tokens == usage.prompt_tokens + usage.completion_tokens)
    
    20/20 tests in tests/test_litellm/llms/xai/test_xai_cost_calculator.py
    and 8/8 in tests/test_litellm/llms/xai/test_xai_chat_transformation.py
    pass.
    
    * refactor(bedrock/converse): delegate effort gating to AnthropicConfig._validate_effort_for_model
    
    Removes the duplicated max/xhigh gating logic in
    _validate_anthropic_adaptive_effort and the now-unused
    _supports_effort_level_on_bedrock helper. Per-model gating now flows
    through the centralized AnthropicConfig._validate_effort_for_model
    (whose _supports_effort_level already strips Bedrock prefixes), so the
    chat completion, /v1/messages, and Bedrock Converse paths can't drift
    when a new gated effort tier is added.
    
    * Implement normalize_nonempty_secret_str function to trim whitespace from secrets and treat empty values as unset. Update proxy_server to use this function for Grafana credentials. Enhance tests to validate the new normalization behavior.
    
    * Fix qdrant semantic cache miss metadata
    
    * chore(deps): refresh dependency locks
    
    * chore(deps): authorize pytest license
    
    * fix: preserve tokenizer decode round trips
    
    * refactor(anthropic): drive adaptive-thinking gate via supports_adaptive_thinking flag
    
    Three of greptile's open comments on #27074 (P2 converse:512, P1
    databricks:361, and the underlying capability-flag policy rule) flagged
    the same pattern: _is_claude_4_6_model(...) or _is_claude_4_7_model(...)
    used inline as a runtime 'is this an adaptive-thinking model?' check.
    That requires a code release each time a new adaptive Claude lands.
    
    Consolidate the inline gating to AnthropicModelInfo._is_adaptive_thinking_model,
    and switch the helper itself to read a new supports_adaptive_thinking
    flag from `model_prices_and_context_window.json` via `_supports_factory`,
    falling back to the family pattern only when the model-map entry doesn't
    carry the flag (preserves OpenRouter / Vercel / Bedrock-prefixed variants
    that route through the same code path with non-canonical ids).
    
    Adds `supports_adaptive_thinking: true` to the four 4.6/4.7 anthropic
    entries (opus-4-6 + dated, opus-4-7 + dated, sonnet-4-6). Bedrock-prefixed
    and Vertex-prefixed entries don't need the flag because both fall back
    through the family pattern (the helper short-circuits early on True from
    either path) and the bedrock/vertex Claude IDs all match the existing
    opus-4-{6,7} / sonnet-4-{6,7} pattern.
    
    Affected call sites:
    
    - `bedrock/chat/converse_transformation.py:_handle_reasoning_effort_parameter`
    - `anthropic/chat/transformation.py:_map_reasoning_effort`
    - `anthropic/chat/transformation.py:map_openai_params` (output_config branch)
    - `databricks/chat/transformation.py:map_openai_params` (output_config branch)
    
    The remaining `_is_claude_4_6_model` / `_is_claude_4_7_model` references
    in `AnthropicConfig._validate_effort_for_model` and
    `AnthropicConfig.get_supported_openai_params` are intentionally retained:
    they're per-model gating fallbacks for variants whose model-map entries
    don't yet carry the `supports_max_reasoning_effort` /
    `supports_reasoning` flag. Those are documented in-place.
    
    Tests: 537 anthropic/bedrock/databricks/vertex/messages tests pass.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * chore(deps): address dependency review notes
    
    * test(model_prices): add supports_adaptive_thinking to schema
    
    `test_aaamodel_prices_and_context_window_json_is_valid` validates the
    model-map JSON against an explicit schema with `additionalProperties`,
    so the new `supports_adaptive_thinking` flag added in
    98ced0ae43 needs a matching schema entry.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * refactor: remove unnecessary comments from #27074
    
    Strip out the explanatory and historical comments that don't carry
    business-logic justification. Comments that simply narrate what code
    does — or that explain prior behavior, what was changed, or which PR
    introduced a fix — are removed. Docstrings are reduced to a one-line
    summary where the long form repeated information already evident from
    the code or test data.
    
    No code-behavior changes. All 643 affected unit tests still pass.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test: keep decode token test local
    
    * chore(deps): align dashboard node engine
    
    * feat: selectively apply routing strategy according to model name
    
    * style: make _model_supports_effort_param more concise
    
    * refactor(anthropic,bedrock): hoist drop_params output_config warning to module constant
    
    Three call sites (anthropic chat, bedrock converse, bedrock invoke messages)
    emitted the same '...Effort is only supported on Opus 4.5+, Sonnet 4.6+, and
    Mythos Preview' warning verbatim. Extract DROP_UNSUPPORTED_OUTPUT_CONFIG_WARNING
    in litellm/llms/anthropic/chat/transformation.py and import it from the bedrock
    sites so future copy edits live in one place.
    
    Addresses Michael's review on PR #27074.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * refactor(anthropic,bedrock,databricks): factor BadRequestError for unknown reasoning_effort
    
    Three call sites raised the same BadRequestError("Invalid reasoning_effort:
    ... Must be one of 'minimal', 'low', ...") block when REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT
    returned None: anthropic chat map_openai_params, bedrock converse
    _handle_reasoning_effort_parameter, and databricks chat reasoning_effort path.
    
    Extract AnthropicConfig._raise_invalid_reasoning_effort(model, value, llm_provider)
    so future copy edits / valid-set changes happen in one place. Typed as NoReturn
    so type-checkers correctly narrow control flow at call sites.
    
    Addresses Michael's review on PR #27074.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * Clean up Redis semantic cache isolation fallback
    
    * fix(guardrails): align banned_keywords + azure_content_safety call_type gates with runtime route_type
    
    The hooks gated on ``call_type == "completion"`` but the proxy ingress
    passes ``route_type`` straight through as ``call_type`` —
    ``"acompletion"`` for /v1/chat/completions and ``"aresponses"`` for
    /v1/responses. Tests passed because they used the literal sync
    ``"completion"`` value, masking the gap.
    
    Switch both hooks to ``is_text_content_call_type`` (matches the
    canonical runtime values: completion / acompletion / aresponses) and
    update existing tests to assert against runtime values, plus parametrize
    a regression test that pins the gate.
    
    * fix: remove unused import
    
    * Add semantic cache legacy migration flag
    
    * Treat 0 team_member_budget as no cap
    
    * chore(caching): annotate qdrant quantization_params dict type
    
    Mypy infers the dict's value type from the first branch
    (Dict[str, bool]) which clashes with the scalar branch's mixed-type
    inner dict. Explicit Dict[str, Any] annotation lifts the inference.
    
    * chore(caching): remove allow_legacy_unscoped_cache_hits opt-in
    
    The flag was an opt-in escape hatch for the cross-tenant leak the rest
    of the patch closes — flipping it on (env var or constructor param)
    re-enables exactly the VERIA-54 primitive on either backend. There is
    no operational need that the secure path doesn't already meet:
    
    - Qdrant: legacy points without ``litellm_cache_key`` payload are
      excluded by the must-clause filter and treated as misses; new sets
      populate the cache key, so cold-start lasts only as long as the
      natural cache rebuild.
    - Redis: existing unscoped index can't carry the new schema; the init
      path falls back to ``{name}_isolated`` (and recreates it on stale
      schema), leaving the legacy index untouched.
    
    Drop the constructor param, env-var fallback, ``_using_legacy_unscoped_index``
    flag, the legacy-reuse branch in ``_init_semantic_cache``, and the
    matching guards in set/get paths. Update tests to drop the legacy-mode
    cases and assert the secure-only behaviour.
    
    * fix(container): keep ownership-filter exceptions out of the LLM-error path
    
    filter_container_list_response runs after the upstream call has
    already succeeded; treating an ownership-lookup failure as an LLM-API
    error fires post_call_failure_hook for a successful upstream call and
    returns a misleading provider-shaped error to the client. Run the
    filter outside the try/except so genuine LLM errors stay scoped to
    the upstream call.
    
    * chore(container,skills): LRU eviction for owner caches; widen file_purpose Literal
    
    Two cleanups from the /simplify pass:
    
    * ``_CONTAINER_OWNER_CACHE`` and ``_SKILL_CACHE`` now LRU-evict via
      ``OrderedDict.popitem(last=False)`` instead of full ``clear()`` at
      capacity. Full clears converted a steady-state cached workload into a
      periodic full-DB-load oscillation as the cache repopulated from zero
      and cleared again. Reads now ``move_to_end`` so the just-touched
      entry survives the next eviction. Mirrors the pre-existing LRU
      pattern in ``_remember_container_owner``.
    
    * ``LiteLLM_ManagedObjectTable.file_purpose`` Literal now includes
      ``"container"`` so Pydantic validation accepts rows written by the
      ownership store.
    
    * chore(container,skills): drop legacy-access opt-out env vars
    
    LITELLM_ALLOW_UNTRACKED_CONTAINER_ACCESS and
    LITELLM_ALLOW_UNOWNED_SKILL_ACCESS were operator-toggleable opt-outs
    for the cross-tenant access primitive this PR closes — flipping either
    on re-enabled exactly the VERIA-20 read path. Default-secure with no
    escape hatch matches sibling fixes (vector-store cred isolation, semantic
    cache key isolation, user_config strip): all rejected the
    opt-out-of-security pattern.
    
    Untracked containers and unowned skills (rows that pre-date this
    enforcement) are admin-only. Non-admin owners need to either re-create
    via the now-tracked flow or have an admin assign ``created_by`` on the
    existing row. Update tests to assert the strict-only behaviour.
    
    * fix(ownership): reject identity-less callers instead of sharing a sentinel scope
    
    UNSCOPED_RESOURCE_OWNER_SCOPE collapsed every caller without an
    identity field (no user_id / team_id / org_id / api_key / token) into
    a single shared owner — a cross-tenant access primitive: any two such
    callers could see and delete each other's containers and skills.
    
    Drop the sentinel. ``get_primary_resource_owner_scope`` returns
    ``None`` and ``get_resource_owner_scopes`` returns ``[]`` for
    identity-less callers. ``record_container_owner`` and
    ``LiteLLMSkillsHandler.create_skill`` now reject creates from
    identity-less callers with a 403 instead of stamping the placeholder.
    Read paths already deny ``owner is None`` correctly so legacy rows
    (if any) are admin-only.
    
    * fix(proxy): include request-blocked callback params in auth bans
    
    * fix: keep skills handler FastAPI-free; fold gcs deny list into the body bouncer
    
    Two cleanups:
    
    * ``LiteLLMSkillsHandler.create_skill`` raised ``HTTPException`` for
      identity-less callers, importing FastAPI from a ``litellm/llms/``
      module — that violates the project rule that FastAPI lives only
      under ``proxy/``. Switch to ``ValueError`` (the same shape the rest
      of the handler uses for not-found/forbidden) and update the test.
    
    * The proxy-auth body bouncer derived its observability ban list from
      ``_supported_callback_params`` only, missing
      ``_request_blocked_callback_params`` (where ``gcs_bucket_name`` and
      ``gcs_path_service_account`` live). Two recently-merged sibling PRs
      (#27019 added the deny list, #27081 added the test asserting these
      are rejected at the request body root) crossed without folding them
      together. Union the GCS deny list into the bouncer's derivation so
      the single source of truth covers both code paths.
    
    * fix(proxy): normalize managed resource team owner field
    
    * chore: simplify ownership tracking — drop thin stores, in-memory fallback, hand-rolled cache
    
    Substantial reduction (~765 LOC) without changing the security
    boundary:
    
    * Drop ContainerOwnershipStore and LiteLLMSkillsStore — both were
      one-method-per-Prisma-call wrappers. Inline the calls instead,
      matching the established pattern in vector_store_endpoints,
      agent_endpoints, and mcp_server/db.py.
    
    * Drop the prisma_client is None in-memory fallback. Production
      deploys always have Prisma; running ownership-critical paths on a
      process-local dict is a security footgun in the dev-mode case it
      was meant to support, and complicates every code path with a
      branch. Fail-secure: skip recording if Prisma is unavailable, and
      treat reads as "not found" (admin-only).
    
    * Drop the hand-rolled module-level cache. Replace with the existing
      litellm.caching.in_memory_cache.InMemoryCache, which already has
      TTL + max-size + eviction tested in its own module. Sentinel string
      for negative caching since InMemoryCache can't disambiguate "miss"
      from "cached as None".
    
    * Tests: drop coverage for removed code paths (in-memory fallback,
      hand-rolled cache internals). Keep tests for actual behavior (cache
      hit-rate, negative caching, owner check, list filtering,
      identity-less reject, admin bypass).
    
    * fix(container): cache list-allow-set, track admin-created containers
    
    Address Greptile P2 follow-ups from the prior round:
    
    * Cache ``_get_allowed_container_ids`` (60s LRU/TTL keyed by sorted
      owner-scope tuple) so ``GET /v1/containers`` doesn't issue a fresh
      ``find_many`` against ``litellm_managedobjecttable`` on every list
      call. Invalidate the caller's own cache entry when they record a
      new owner so the just-created container shows up on their next list.
    
    * Tighten the admin early-return in ``record_container_owner`` to skip
      ONLY when there's literally no container ID to stamp. An admin with
      identity (the master-key path populates ``user_id`` + ``api_key``)
      flows through the normal record path so admin-created containers are
      tracked like any other caller's. The truly-identity-less admin case
      still falls through to the 403 below — correct fail-secure default.
    
    Skill-cache invalidation gap (also flagged by Greptile) is moot: there
    is no skill update endpoint exposed; ownership-affecting mutations are
    only delete (already invalidates) and create (new ID, no cache entry
    to update).
    
    * chore(container): use delete_cache, json-encode scope key, clean test
    
    /simplify follow-ups:
    
    * Replace the two-``pop`` reach into ``cache_dict``/``ttl_dict`` with
      the existing public ``InMemoryCache.delete_cache(key)`` — the same
      idiom used elsewhere in the proxy. Bonus: ``delete_cache`` calls
      ``_remove_key`` which also handles ``expiration_heap`` consistency
      the direct pops were silently leaking.
    
    * JSON-encode the sorted scope list for the cache key instead of
      ``"|".join``. ``user_id`` / ``team_id`` / ``org_id`` / ``api_key``
      are free-form strings and could contain a literal ``|`` — JSON
      quoting escapes any in-string separator unambiguously.
    
    * Extract ``_allowed_container_ids_cache_key()`` so the read and
      invalidation sites compute the key the same way.
    
    * Fix a placeholder-then-overwrite test construction: the
      ``__module__.split(".")[0] and "proxy_admin"`` line evaluated to a
      literal string that was immediately overwritten with the real enum
      value. Hoist the import and construct directly.
    
    * [Fix] Tests: Replace deprecated openrouter/claude-3.7-sonnet with claude-sonnet-4.5
    
    OpenRouter has dropped active endpoints for anthropic/claude-3.7-sonnet,
    causing test_reasoning_content_completion to fail with a 404 "No endpoints
    found" error. Switch to anthropic/claude-sonnet-4.5, which is current and
    supports reasoning streaming.
    
    * feat: routing groups ui
    
    * fix(security): prevent secret_fields from leaking into spend logs
    
    secret_fields (containing raw HTTP headers including Authorization
    Bearer tokens) was being included in proxy_server_request['body']
    because the body snapshot was a copy.copy(data) of the full request
    dict. This body gets serialized and persisted in the LiteLLM_SpendLogs
    table, exposing user credentials in the database.
    
    Root cause: data['secret_fields'] was set before the body snapshot at
    data['proxy_server_request']['body'] = copy.copy(data), so the full
    raw headers (including auth tokens) ended up in the snapshot.
    
    Fix (defense in depth):
    1. Exclude 'secret_fields' when creating the body snapshot in
       litellm_pre_call_utils.py (primary fix)
    2. Strip 'secret_fields' in _sanitize_request_body_for_spend_logs_payload
       as a secondary safeguard
    
    secret_fields remains available on the live data dict for legitimate
    downstream consumers (MCP, Responses API).
    
    Co-authored-by: default avatarKrrish Dholakia <krrish-berri-2@users.noreply.github.com>
    
    * chore: update Next.js build artifacts (2026-05-05 02:13 UTC, node v20.20.2)
    
    * [Fix] Proxy: Break managed-resources import cycle on Python 3.13
    
    The Python 3.13 CCI smoke matrix surfaces a partially-initialized-module
    ImportError when loading the managed files hook chain:
    
      litellm.proxy.hooks/__init__ (mid-import)
        -> enterprise.enterprise_hooks
        -> litellm_enterprise.proxy.hooks.managed_files
        -> litellm.llms.base_llm.managed_resources.isolation
        -> litellm.proxy.management_endpoints.common_utils
        -> litellm.proxy.utils  (re-enters litellm.proxy.hooks)
    
    The except ImportError block in hooks/__init__.py silently swallowed the
    failure, leaving managed_files unregistered and POST /files returning
    500 "Managed files hook not found".
    
    Two-layer fix:
    - Inline the 3-line _user_has_admin_view check in isolation.py instead
      of importing it from litellm.proxy.management_endpoints.common_utils.
      litellm.llms.* should not depend on litellm.proxy.* — removing this
      layering violation breaks the cycle at its root.
    - Define PROXY_HOOKS and get_proxy_hook before the conditional
      enterprise import in litellm/proxy/hooks/__init__.py, so any future
      re-entry resolves the public names instead of hitting an
      ImportError on a partially-initialized module.
    
    Also fold in two unrelated CCI repairs surfaced in the same staging run:
    - tests/otel_tests/test_key_logging_callbacks.py: per-key
      gcs_bucket_name / gcs_path_service_account are now stripped by
      initialize_dynamic_callback_params, so the GCS client falls through
      to the env-only branch. Update the assertion to match the new
      "GCS_BUCKET_NAME is not set" message.
    - .circleci/config.yml: tests/pass_through_tests now resolves
      google-auth-library@10.x via the @google-cloud/vertexai 1.12.0 bump,
      which uses dynamic ESM imports Jest 29 cannot load without
      --experimental-vm-modules. Pass that flag in the Vertex JS test step.
    
    Adds tests/test_litellm/proxy/hooks/test_proxy_hooks_init.py as a
    regression guard: managed_files / managed_vector_stores must register,
    and isolation.py must not transitively import litellm.proxy.utils.
    
    * [Fix] Proxy: Address Greptile feedback on hook-cycle PR
    
    - Move _user_has_admin_view to litellm.proxy._types as
      user_api_key_has_admin_view (single source of truth). common_utils.py
      and isolation.py both import from there now, removing the duplicated
      role-check that could silently diverge if new admin roles are added.
    - Add pytest.importorskip("litellm_enterprise") to the two regression
      tests that assert managed_files / managed_vector_stores are registered;
      those keys come from ENTERPRISE_PROXY_HOOKS so the tests would fail
      unconditionally in a checkout without the enterprise extra installed.
    
    * [Fix] Lint: Mark _user_has_admin_view re-export in common_utils
    
    Ruff F401 flagged the aliased import as unused within common_utils.py
    because the name is consumed only by external modules (~15 callers
    across guardrails, spend tracking, MCP, agents, management endpoints).
    Add `# noqa: F401  re-exported` so the alias survives lint while
    keeping a single source of truth in litellm.proxy._types.
    
    * refactor(azure): move image gen JSON helper; rename image edit finalize hook
    
    - Add image_generation/http_utils.azure_deployment_image_generation_json_body; call
      from azure.py (keeps AzureChatCompletion focused on chat).
    - Rename finalize_image_edit_multipart_data to finalize_image_edit_request_data with
      docstring covering multipart and JSON POST payloads (review feedback).
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * test(proxy): cover health_check_reasoning_effort for completion mode
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * [Fix] Tests: Use master key for /otel-spans in test_chat_completion_check_otel_spans
    
    /otel-spans now requires proxy admin (returns 401 'Only proxy admin
    can be used to generate, delete, update info for new keys/users/teams.
    Route=/otel-spans' for non-admin callers). Switch the GET call to use
    the master key sk-1234 while keeping the generated key for the
    chat-completion request that produces the spans.
    
    * [Fix] Tests: Pick chat-completion OTEL trace by content, not recency
    
    The /otel-spans endpoint returns process-wide spans and tags
    most_recent_parent by max start_time. After tightening that route to
    proxy_admin (sk-1234), the GET /otel-spans request itself emits auth
    spans that beat the chat-completion spans on start_time, so
    most_recent_parent now points at the request's own auth trace
    (['postgres', 'postgres']) and the >=5-span assertion fails.
    
    Pick the chat-completion trace by content: it is the only trace whose
    span list is a superset of {postgres, redis, raw_gen_ai_request,
    batch_write_to_db}. Verified locally end-to-end against
    otel_test_config.yaml + OTEL_EXPORTER=in_memory: 3/3 runs green.
    
    * [Fix] CI: Enable VCR replay for test_azure_o_series
    
    The Azure o-series tests were excluded from the conftest's VCR auto-marker
    because of a respx/vcrpy transport-patching conflict, but the only respx
    reference in the file was an unused `MockRouter` import. Drop the dead
    import and remove the file from the conflict set so cassettes record on
    first run and replay thereafter, eliminating the 60-95s live Azure latency
    that was crashing xdist workers under --timeout=120 thread-mode timeouts.
    
    * [Fix] Tests: Restore /metrics access for prometheus test suite
    
    /metrics now requires auth by default; tests/otel_tests/test_prometheus.py
    makes 4+ unauthenticated GETs against http://0.0.0.0:4000/metrics, so
    every prometheus test in CI now fails the metric assertion.
    
    Set require_auth_for_metrics_endpoint: false in otel_test_config.yaml
    to opt out for this test job, which scrapes /metrics directly. Verified
    locally: 8/8 prometheus tests green (one flaky retry on
    test_proxy_success_metrics that pre-dates this PR).
    
    Also drop the -x stop-on-first-failure flag from the otel test command
    so all failures in the job surface in a single CI run rather than
    hiding behind whichever one trips first.
    
    * [Perf] CI: Skip Redundant Playwright Apt Install in E2E UI Job
    
    The cimg/python:3.12-browsers base image already ships every Chromium
    system dependency Playwright needs (libnss3, libatk-bridge2.0-0,
    libcups2, etc. — the install log shows them all as "already the newest
    version"). Passing --with-deps to `npx playwright install` therefore
    runs an apt-get update + install for nothing, but pays the full cost of
    hitting Ubuntu mirrors. On a recent run those mirrors stalled hard:
    apt-get update alone took 6m53s at 81.5 kB/s with several archives
    returning connection refused.
    
    Drop --with-deps and persist ~/.cache/ms-playwright alongside
    node_modules so the Chromium binary is also reused across runs. Bump
    the cache key to v2 so the existing v1 entry (which only contained
    node_modules) is not loaded and skipped over the new browser path.
    
    * [Fix] Docker: Remove Hardcoded Prisma Binary Target For Multi-Arch Builds
    
    PRISMA_CLI_BINARY_TARGETS="debian-openssl-3.0.x" was hardcoded in
    docker/Dockerfile.non_root by #17695. On a buildx linux/arm64 leg this
    forces prisma to download the amd64 schema-engine into an arm64 image,
    so 'prisma migrate deploy' fails at startup with 'Could not find
    schema-engine binary'.
    
    Removing the env lets prisma auto-detect per build platform: amd64
    builds still resolve to debian-openssl-3.0.x (Wolfi falls back to
    debian, same binary as before), and arm64 builds now correctly fetch
    linux-arm64-openssl-3.0.x. The offline-cache pre-warm goal of #17695 is
    preserved — only which binaries fill the cache changes.
    
    Fixes #19458
    
    * [Fix] UI: Clear Admin Session Cookies Before Establishing Invited User's Session (#27227)
    
    The invite-signup form was writing the new user's token via raw
    `document.cookie` at `path=/`, while the rest of the auth surface uses
    `storeLoginToken` (which writes at `path=/ui` and mirrors to
    sessionStorage). After signup the inviter's `path=/ui` cookie kept
    winning path-specificity matching, and sessionStorage still held the
    inviter's token, so the dashboard rendered as the inviter rather than
    the newly created user.
    
    Treat invite signup as a principal-change boundary — clear prior
    session cookies first, then store the new token via the canonical
    helper.
    
    * test: add 24hr Redis-backed VCR cache to additional test suites (#27159)
    
    * test: add 24hr Redis-backed VCR cache to additional test suites
    
    Extracts the existing llm_translation VCR plumbing into a reusable helper
    (tests/_vcr_conftest_common.py) and wires it into the conftest.py files
    of the test directories listed in LIT-2787:
    
      audio_tests, batches_tests, guardrails_tests, image_gen_tests,
      litellm_utils_tests, local_testing, logging_callback_tests,
      pass_through_unit_tests, router_unit_tests, unified_google_tests
    
    The same helper is also adopted by the pre-existing llm_translation and
    llm_responses_api_testing conftests to remove the copy-pasted VCR setup.
    
    Each consuming conftest:
    - registers the Redis persister via pytest_recording_configure
    - auto-marks collected tests with pytest.mark.vcr (skipping respx-using
      files where applicable, since respx and vcrpy both patch httpx)
    - gates cassette writes on test success via _vcr_outcome_gate
    
    The cache is opt-in via CASSETTE_REDIS_URL; when unset, VCR is disabled
    and tests hit live providers as before. LITELLM_VCR_DISABLE=1 still
    forces a bypass for ad-hoc local runs.
    
    Test directories that run LiteLLM proxy in Docker (build_and_test,
    proxy_logging_guardrails_model_info_tests, proxy_store_model_in_db_tests)
    are intentionally not included: VCR.py patches the in-process httpx
    transport and cannot intercept calls made from inside a Docker container.
    The installing_litellm_on_python* jobs make no LLM calls and don't
    benefit from caching.
    
    https://linear.app/litellm-ai/issue/LIT-2787/add-24hr-caching-to-additional-test-suites
    
    
    
    * test(vcr): add safe-body matcher to handle JSONL and binary request bodies
    
    vcrpy's stock body matcher inspects Content-Type and unconditionally
    runs json.loads on application/json bodies. JSON Lines payloads (used
    by the Bedrock batch S3 PUT and other upload paths) crash that with
    json.JSONDecodeError: Extra data, before the matcher can return
    'not a match'.
    
    This was the root cause of the batches_testing CI job failing on
    test_async_create_file once VCR auto-marking was applied to the
    batches_tests directory.
    
    Add a conservative byte-equality body matcher and use it in place of
    'body' in the shared match_on tuple. The matcher is strictly more
    conservative than vcrpy's default — the only thing it gives up is
    'different JSON key order is treated as the same body', which doesn't
    apply to deterministic litellm-built request payloads. It can never
    produce a false positive that the default would have rejected, so
    there is no cross-contamination risk.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): exclude tests that VCR replay actively breaks
    
    A few tests are incompatible with cassette replay and were failing on
    the latest CI run after VCR auto-marking was extended to local_testing
    and logging_callback_tests:
    
    - test_amazing_s3_logs.py (logging_callback_tests): the test asserts on
      a per-run response_id that should round-trip through a real S3
      PUT/LIST. vcrpy's boto3 stub intercepts the PUT and the LIST replays
      stale keys, so the freshly-generated id is never found.
    - test_async_embedding_azure (logging_callback_tests) and
      test_amazing_sync_embedding (local_testing): the failure branches
      deliberately pass api_key='my-bad-key' to assert that the failure
      callback fires. We scrub auth headers from cassettes (so the bad-key
      request matches the prior good-key request), and vcrpy replays the
      recorded 200 — the failure callback never fires.
    - test_assistants.py (local_testing): the OpenAI Assistants polling
      APIs mint fresh thread/run IDs every recording session and then poll
      until status=='completed'. Replays of those polled GETs can never
      match a freshly-generated run id, so every CI run effectively
      re-records and the suite blows past the 15m no_output_timeout.
    
    Skip these from VCR auto-marking so they continue to hit live providers
    as they did before this change. The remaining tests in each directory
    still get cached.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): expand skip lists for second batch of incompatible tests
    
    Followup to the previous commit. After re-running CI on the rebuilt
    branch, three more tests surfaced as VCR-replay-incompatible:
    
    - litellm_utils_testing :: test_get_valid_models_from_dynamic_api_key
      Calls GET /v1/models with api_key='123' to assert the result is empty.
      We scrub auth headers, so the bad-key request matches the prior
      good-key cassette and replays the recorded model list.
    - litellm_utils_testing :: test_litellm_overhead.py
      Measures litellm_overhead_time_ms as a percentage of total wall-clock
      time. With cached responses the upstream 'network' time collapses to
      microseconds, blowing past the 40%% threshold the test asserts on.
      Skip the whole file (every parametrization is at risk).
    - local_testing_part1 :: test_async_custom_handler_completion and
      test_async_custom_handler_embedding
      Same bad-key failure-callback pattern as the already-skipped
      test_amazing_sync_embedding.
    - litellm_router_testing :: test_router_caching.py
      Asserts on litellm's own router-level response cache by comparing
      response1.id to response2.id across repeat upstream calls (test
      bypasses litellm cache via ttl=0 and expects upstream to return a
      *new* id). With VCR replay both upstream calls return the same
      cassette body, so the ids are identical. Skip the whole file.
    - logging_callback_tests :: test_async_chat_azure (preemptive)
      Same shape as already-skipped test_async_embedding_azure; was masked
      by upstream OpenAI rate-limit failures on baseline.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): use item.path and tighten matcher docstring
    
    - Replace pytest's deprecated item.fspath with item.path in
      apply_vcr_auto_marker_to_items so we don't emit deprecation
      warnings under pytest 8.
    - Clarify _safe_body_matcher docstring to reflect actual behavior
      (direct == first, then UTF-8 bytes comparison, no repr fallback).
    
    Addresses Greptile review feedback on PR #27159.
    
    * test(vcr): swallow all RedisError on cassette save/load
    
    Cassette persistence is strictly best-effort: any Redis-side failure
    (connection blip, timeout, OutOfMemoryError when the maxmemory cap is
    hit, READONLY replicas, etc.) should degrade to 'test passed but
    cassette not cached' rather than fail the test on teardown.
    
    Previously the persister only caught ConnectionError and TimeoutError,
    so OutOfMemoryError — which Redis Cloud raises when the cassette cache
    hits its memory cap and there are no evictable keys — propagated out of
    vcrpy's autouse fixture and ERRORed otherwise-passing tests on
    teardown. This caused the litellm_utils_testing CircleCI job to fail on
    the latest commit's run, even though the underlying test was a unit
    test that used mock_response and produced no real upstream traffic
    (the cassette was dirtied by a background langfuse callback). The
    rerun only succeeded because Redis evictions happened to free enough
    room before the SET — i.e. it was timing-dependent flakiness.
    
    Catch redis.exceptions.RedisError (the common base of all server- and
    client-side Redis exceptions) on both save and load, and parametrize
    the regression tests across ConnectionError, TimeoutError, and
    OutOfMemoryError to pin the new behavior.
    
    * test(vcr): surface cassette-cache failures with warnings + session banner
    
    When the persister silently swallows a Redis OOM (or any RedisError) on
    save/load there is otherwise no visible signal that the cache is
    degraded — tests pass, the cassette just isn't persisted, and the next
    session still hits the same Redis at the same near-cap memory.
    
    Add three layers of observability so that failure mode is loud:
    
    1. Per-process health counters ("save_failures", "load_failures", and
       the last error string for each), exposed via cassette_cache_health()
       and reset via reset_cassette_cache_health(). The persister
       increments these in addition to logging.
    
    2. VCRCassetteCacheWarning (UserWarning subclass) emitted via
       warnings.warn() inside the persister's except block. Pytest's
       built-in warnings summary at session end automatically lists every
       such warning, so the failure is visible in CI logs without any
       conftest-level wiring.
    
    3. Session-end banner via emit_cassette_cache_session_banner() and a
       stderr-fallback atexit handler registered from
       register_persister_if_enabled(). Two states:
         - red "VCR CASSETTE CACHE DEGRADED" when save_failures or
           load_failures > 0
         - yellow "VCR CASSETTE CACHE NEAR CAPACITY" (no failures, but
           used_memory >= 85% of maxmemory) so the next session knows
           the Redis is approaching OOM before any SET actually fails
    
    Capacity comes from a best-effort INFO memory probe
    (cassette_cache_capacity_snapshot) that returns None on any failure or
    when maxmemory is uncapped. The atexit handler skips xdist workers so
    only the controller emits.
    
    Tests: parametrize the existing save/load swallow-error tests across
    ConnectionError/TimeoutError/OutOfMemoryError, add direct tests for
    the health counters and warning emission, and a new
    test_vcr_conftest_common_banner.py covering banner output for every
    state (silent/red/yellow/disabled/xdist-worker).
    
    * test(vcr): bucket cassettes by API key fingerprint, drop bad-key skips
    
    Tests that deliberately call an LLM API with a bad key (e.g. to assert
    that the failure callback fires, or that check_valid_key returns False)
    were being silently served the prior good-key cassette: we scrub the
    real Authorization / x-api-key header from the cassette before storing
    it, so a follow-up bad-key call is byte-identical to the good-key call
    under the existing match_on tuple.
    
    Add a 'key_fingerprint' custom matcher that distinguishes requests by
    the SHA-256 of their API-key headers. The fingerprint is stamped into
    a synthetic 'x-litellm-key-fp' header by a new before_record_request
    hook, which then strips the real auth headers (we have to do the
    scrubbing here instead of via vcrpy's filter_headers knob, because
    filter_headers runs *first* and would erase the value we want to hash).
    
    Bad-key requests now get a different cassette bucket than good-key
    requests, so vcrpy will not replay a recorded 200 in place of the
    expected 401. The fingerprint is a one-way hash of the secret, so
    cassettes never contain the key.
    
    This permanently removes the 'bad-key' category of skips:
    
    - tests/local_testing: dropped ::test_amazing_sync_embedding,
      ::test_async_custom_handler_completion,
      ::test_async_custom_handler_embedding
    - tests/logging_callback_tests: dropped ::test_async_chat_azure,
      ::test_async_embedding_azure
    - tests/litellm_utils_tests: dropped
      ::test_get_valid_models_from_dynamic_api_key
    
    Coverage: 7 new unit tests in tests/test_litellm/test_vcr_safe_body_matcher.py
    covering header stripping, fingerprint determinism, no-auth bucketing,
    good-vs-bad key discrimination, x-api-key (Anthropic/Azure) discrimination,
    and idempotence under replay.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): drop redundant comments and docstrings
    
    Trim narration of code that is already self-evident from function and
    variable names. Keep the two genuinely non-obvious bits:
    
    - ordering constraint between filter_headers and before_record_request,
      which would invite a maintainer to re-introduce the bug if removed
    - the per-directory _VCR_INCOMPATIBLE_FILES rationale, since 'why
      exactly is this skipped' is not knowable from the test name alone
    
    Also drop the 40-line commented-out drop-in conftest snippet at the
    bottom of _vcr_conftest_common.py — the consuming conftests are the
    canonical reference.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): make _before_record_request idempotent
    
    vcrpy invokes before_record_request more than once per request:
    can_play_response_for calls it, then __contains__ /
    _responses (reached via play_response) call it again on the
    result. The second invocation sees a request whose auth headers we
    already stripped, so a naive recompute yields "no-key" and
    overwrites the real fingerprint stored in the header.
    
    This makes can_play_response_for and play_response disagree on
    matchability — the former says "yes, we have a stored response for
    this" (matching no-key to no-key) and the latter throws
    UnhandledHTTPRequestError because it computes a fresh real
    fingerprint that doesn't match the stored no-key.
    
    In CI this manifested as ~30 failing tests across guardrails_testing,
    audio_testing, batches_testing, image_gen_testing, llm_responses_api,
    litellm_router_unit_testing, etc. Skip the recompute when the header
    is already set, so re-applying the hook is a no-op.
    
    Adds a regression test that fires the hook twice on the same dict and
    asserts the fingerprint stays put.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): drop more redundant docstrings and headers
    
    * test(vcr): enable 24hr cache for ocr_tests and search_tests
    
    These two directories were the only non-dockerized test suites in the
    build_and_test workflow that make live LLM/provider API calls but were
    not VCR-enabled by this PR. Together they account for 96 tests:
    
    - tests/ocr_tests/ (31): Mistral OCR, Azure AI OCR, Azure Document
      Intelligence, Vertex AI OCR. Pure-unit tests inside the same files
      (e.g. TestAzureDocumentIntelligencePagesParam) make no HTTP calls
      and become benign VCR NOOPs.
    - tests/search_tests/ (65): Brave, DataForSEO, DuckDuckGo, Exa,
      Firecrawl, Google PSE, Linkup, Parallel.ai, Perplexity, SearchAPI,
      Searxng, Serper, Tavily.
    
    Both directories use the canonical minimal conftest pattern from
    tests/audio_tests/conftest.py with no skip lists. None of the test
    files use respx, none assert on per-call upstream non-determinism
    (no response1.id != response2.id, no overhead-as-fraction-of-total,
    no live polling), so the default match_on tuple should cache cleanly.
    If a flake surfaces during the first cassette-recording CI run, we
    can add a targeted skip the same way we did for the other dirs.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: default avatarClaude <noreply@anthropic.com>
    Co-authored-by: default avatarCursor Agent <cursoragent@cursor.com>
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * [Fix] Team UI: handle legacy dict shape for metadata.guardrails (#27224)
    
    * [Fix] Team UI: handle legacy dict shape for metadata.guardrails
    
    A team can have metadata.guardrails stored as {"modify_guardrails": bool}
    (the permission-flag shape introduced in PR #4810) rather than the
    expected string[]. The opt-out logic added in PR #25575 calls .filter()
    on this field, which throws TypeError on a dict and crashes the team
    detail page.
    
    Add a safeGuardrailsList helper that returns [] when the field is not
    an array, and route the three read sites through it.
    
    * [Fix] Team UI: inline Array.isArray guards for guardrails metadata
    
    Replace the safeGuardrailsList helper with inline Array.isArray checks
    at each call site, and apply the same guard to opted_out_global_guardrails
    for consistency. No known legacy dict rows for opted_out_global_guardrails,
    but the unguarded `|| []` pattern is the same shape risk.
    
    Six call sites now defended directly: three for metadata.guardrails
    and three for metadata.opted_out_global_guardrails.
    
    * chore: update Next.js build artifacts (2026-05-05 22:45 UTC, node v20.20.2) (#27240)
    
    * [Infra] Bump deps (#27157)
    
    * bump: version 0.4.70 → 0.4.71
    
    * bump: version 0.1.39 → 0.1.40
    
    * uv lock
    
    ---------
    
    Co-authored-by: default avatarMichael Riad Zaky <michaelr@Mac.localdomain>
    Co-authored-by: default avatarMateo Wang <277851410+mateo-berri@users.noreply.github.com>
    Co-authored-by: default avatargreptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
    Co-authored-by: default avatarryan-crabbe-berri <ryan@berri.ai>
    Co-authored-by: default avataruser <70670632+stuxf@users.noreply.github.com>
    Co-authored-by: default avatarCursor Agent <cursoragent@cursor.com>
    Co-authored-by: default avatarClaude <noreply@anthropic.com>
    Co-authored-by: default avatarshivam <shivam@berri.ai>
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    Co-authored-by: default avatarYassin Kortam <yassin@berri.ai>
    Co-authored-by: default avatarKrrish Dholakia <krrish+github@berri.ai>
    Co-authored-by: default avatarshin-berri <shin-laptop@berri.ai>
    Co-authored-by: default avatarSameer Kankute <sameer@berri.ai>
    Co-authored-by: default avatarSameer Kankute <Sameerlite@users.noreply.github.com>
    Co-authored-by: default avatarMichael-RZ-Berri <michael@berri.ai>
    Co-authored-by: default avatarharish-berri <harish@berri.ai>
    Co-authored-by: default avatarYassin Kortam <yassinkortam@g.ucla.edu>
    Co-authored-by: default avatarMichael Riad Zaky <michaelr@Michaels-MacBook-Air.local>
    Co-authored-by: default avatarKrrish Dholakia <krrish-berri-2@users.noreply.github.com>
    6ff668c7
    [Infra] Promote internal staging to main (#27245)
    yuneng-jiang authored
    
    
    * default requested_model to empty string on litellm-side rejects
    
    * Update litellm/router.py
    
    Co-authored-by: default avatargreptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
    
    * fix: scope key access_group_ids override by team's assigned groups
    
    A team member could set any access_group_ids on their key (e.g. a group
    assigned only to a different team) and override the team's model
    restriction. Intersect the key's access_group_ids with team_object.access_group_ids
    in _key_access_group_grants_model so foreign groups are dropped before
    model expansion. Adds a regression test that asserts expansion is never
    called for foreign groups.
    
    * [Fix] Proxy: Skip Personal Budget Hook When Reservation Covers Counter
    
    The reservation path (PR #26845) atomically pre-fills `spend:user:{user_id}`
    and admits at the strict-`<` boundary. The legacy `_PROXY_MaxBudgetLimiter`
    pre-call hook re-reads the same counter with `>=`, so a reservation that
    fills the counter to exactly `max_budget` (e.g. a request without a
    `max_tokens` cap that falls back to reserving the smallest remaining
    headroom) is rejected by the hook even though the reservation already
    admitted it.
    
    Skip the hook when the request's active `budget_reservation` covers
    `spend:user:{user_id}`. The reservation is the source of truth for that
    counter cross-pod; the legacy `>=` path remains in place for requests
    without a reservation (e.g. paths that bypass the reservation entirely).
    
    Reproduces as `tests/otel_tests/test_prometheus.py::test_user_budget_metrics`
    on a fresh user with `max_budget=10` calling `fake-openai-endpoint` without
    `max_tokens`. Adds focused unit coverage in
    `tests/test_litellm/proxy/hooks/test_max_budget_limiter.py`.
    
    * harden bedrock file bucket validation
    
    * Fix syntax errors from botched merge in router.py
    
    * Fix Vertex batch output edge cases
    
    * [Fix] RBAC: Drop management_routes Write Fallback for Admin Viewer
    
    Greptile P1: the unsafe-method branch of `_check_proxy_admin_viewer_access`
    ended with a blanket `if route in management_routes: return`. That set is a
    mix of reads (info/list — handled via the safe-method GET branch above) and
    writes. The fallback let Admin Viewer POST to write endpoints not enumerated
    in `_ADMIN_VIEWER_BLOCKED_WRITE_ROUTES`, including:
      - /team/block, /team/unblock, /team/permissions_update
      - /jwt/key/mapping/{new,update,delete}
      - /key/bulk_update
      - /key/{key_id}/reset_spend
    
    Remove the fallback. The two remaining allow sets (admin_viewer_routes and
    global_spend_tracking_routes) are both read-only, so removal does not affect
    the legitimate POST-as-read cases (e.g. /spend/calculate, which is in
    spend_tracking_routes ⊂ admin_viewer_routes).
    
    Tests:
      - 8 new parametrized cases pinning each previously-leaking management write
        endpoint to 403 on POST for PROXY_ADMIN_VIEW_ONLY.
    
    * fix(tests): anchor VCR redis cassette key to repo root
    
    `os.path.relpath` with no `start` arg uses the current working
    directory, so running pytest from a subdirectory produced a
    different Redis key than running from the repo root. CI-recorded
    cassettes and locally-replayed runs would silently miss each
    other's cache.
    
    Anchor the path to the repo root (derived from `__file__`) so the
    key is stable regardless of CWD.
    
    https://claude.ai/code/session_018uCx7pcrkdUJZrCVMaTdPx
    
    * fix: gate key access_group override on group's own assignment
    
    Replaces the previous intersect-with-team.access_group_ids check, which
    made the override unreachable in practice (the team-gate fallback already
    covered every case the intersection allowed). The override now resolves
    each of the key's access_group_ids via get_access_object and accepts the
    group only if its assigned_team_ids includes the key's team_id, or its
    assigned_key_ids includes the key's token. This fulfills the original ask
    (a key can extend a team's allow-list via a group the admin granted to
    that team or that specific key) while still rejecting foreign groups
    referenced by team members of other teams.
    
    * [Fix] Proxy/Key Management: Honor team_member_permissions /key/list In /key/list Endpoint
    
    When a team grants /key/list via team_member_permissions, non-admin members
    should see all keys for that team — same as a team admin. Previously the
    classification in list_keys() only checked admin status, so permitted
    members fell into the service-account-only path and could not see other
    members' personal keys. Routes those members into the full-visibility set.
    
    * Fix access-group bypass via litellm-model fallback path
    
    When _get_all_deployments returns 0 candidates and the litellm-model
    fallback branch (_get_deployment_by_litellm_model) finds deployments that
    the access-group filter then empties, _access_group_filter_emptied_candidates
    remained False (it was captured before that branch ran). The router would
    then proceed to default fallbacks; the fallback model could have no
    access_groups and short-circuit the filter, silently serving a caller
    blocked by access-group restrictions.
    
    Update the flag inside the litellm-model branch when filtering empties a
    non-empty candidate set so the default-fallback guard still triggers.
    
    * fix(proxy): redact MCP server URL and headers for non-admin viewers (VERIA-8)
    
    Many MCP integrations (Zapier, etc.) embed an upstream API key
    directly in the server URL, e.g.
    ``https://actions.zapier.com/mcp/<api-key
    
    >/sse``. The list and
    single-server endpoints were returning the full URL to any
    authenticated user — `_redact_mcp_credentials` only stripped the
    explicit ``credentials`` field, and `_sanitize_mcp_server_for_virtual_key`
    only ran for restricted virtual keys. Non-admin internal users could
    read the dashboard, click the unmask toggle, and exfiltrate the raw
    token.
    
    Add `_sanitize_mcp_server_for_non_admin` that runs on top of the
    existing credential redaction and clears the credential-bearing
    fields:
    
    - ``url`` (the primary leak vector)
    - ``spec_path`` (OpenAPI spec URLs that may carry tokens)
    - ``static_headers`` / ``extra_headers`` (Authorization)
    - ``env`` (arbitrary secrets)
    - ``authorization_url`` / ``token_url`` / ``registration_url``
    
    Identity fields (``server_id``, ``alias``, ``mcp_info``, etc.) are
    preserved so the UI can still list servers a non-admin's team has
    access to.
    
    Apply the new sanitizer in `fetch_all_mcp_servers` and the per-server
    fetch path right after the existing virtual-key branch. Update the
    existing `test_list_mcp_servers_non_admin_user_filtered` assertions
    that previously checked URL visibility.
    
    Frontend defense-in-depth: hide the URL unmask toggle on
    `mcp_server_view.tsx` unless the viewer is a proxy admin.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * Fix runtime policy attachment initialization
    
    Mark runtime-created policies and attachments initialized so global policy attachments created from the policy builder apply immediately without requiring a restart.
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * test(router): cover _try_early_resolve_deployments_for_model_not_in_names
    
    The router_code_coverage CI check requires every function in router.py to
    be referenced by at least one test under tests/{local_testing,
    router_unit_tests,test_litellm} in a file with "router" in its name.
    The recently-extracted helper had no direct test, so the check failed
    with "0.45% of functions in router.py are not tested".
    
    Add a focused test that exercises the four return paths: model already
    in self.model_names, no fallback applies, pattern-router match, and
    default_deployment substitution (also asserting the stored default
    isn't mutated).
    
    https://claude.ai/code/session_019AVp1XL7RT9RxRe4qRLkay
    
    
    
    * Fix policy registry teardown in tests
    
    Reset the policy ID index during policy engine test cleanup so stale policy versions cannot leak between tests.
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * fix(batches): count non-chat tokens, validate batch-file model access (VERIA-39) (#27015)
    
    * fix(batches): count non-chat tokens and validate every model in batch file
    
    Two security control bypasses on POST /v1/batches:
    
    1. `_get_batch_job_input_file_usage` only summed tokens for
       `body.messages` (chat completions). Embedding (`input`) and text
       completion (`prompt`) batches reported zero, letting massive
       non-chat workloads slip past TPM rate limits. Extend the counter
       to handle string and list shapes for both fields.
    
    2. The batch input file was forwarded to the upstream provider
       without inspecting the models named inside the JSONL — only the
       outer `model` query parameter was checked against the caller's
       allowlist. A caller restricted to gpt-3.5 could submit a batch
       targeting gpt-4o and the upstream would execute it under the
       proxy's shared API key.
    
    Add `_get_models_from_batch_input_file_content` (returns the
    distinct `body.model` values) and call it from
    `_enforce_batch_file_model_access` in the pre-call hook, which runs
    each model through `can_key_call_model` so the same allowlist
    semantics (wildcards, access groups, all-proxy-models, team aliases)
    the proxy enforces on `/chat/completions` apply here too. Any
    unauthorized model raises a 403 before the file is forwarded.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(batches): count pre-tokenized prompt/input shapes, classify 403 logs
    
    Two follow-ups from the Greptile review on the batch validation PR:
    
    1. P1 TPM bypass via integer token arrays. The OpenAI batch schema
       accepts ``prompt`` and ``input`` as ``list[int]`` (a single
       pre-tokenized prompt) or ``list[list[int]]`` (multiple) in addition
       to the string and ``list[str]`` shapes. Pre-fix only the string
       shapes were counted, so a caller could submit a batch with hundreds
       of millions of pre-tokenized tokens and the rate limiter would
       record zero. Extract the per-field logic into
       ``_count_prompt_or_input_tokens`` and count each int as one token.
    
    2. P2 access-denial logs were indistinguishable from I/O failures.
       ``count_input_file_usage`` caught every exception under a generic
       "Error counting input file usage" message, so an intentional 403
       from ``_enforce_batch_file_model_access`` looked the same in the
       logs as a missing file or a Prisma timeout. Catch ``HTTPException``
       separately and log 403s at WARNING level with a security-relevant
       message before re-raising.
    
    Tests cover the new shapes: single ``list[int]``, ``list[list[int]]``
    (the worst-case bypass vector), and embeddings ``input`` with
    pre-tokenized arrays.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(proxy): re-validate user_id after /user/info re-parses query (#27009)
    
    * fix(proxy): re-validate user_id ownership after /user/info re-parses query
    
    The route-level access check in `RouteChecks.non_proxy_admin_allowed_routes_check`
    reads `request.query_params.get("user_id")`, which decodes literal `+` to
    spaces. The endpoint then re-parses the raw query string with `urllib.unquote`
    in `get_user_id_from_request` to preserve `+` characters (so plus-addressed
    emails work as user_ids). Those two paths produce different ids: a caller
    who registered a user_id containing a literal space could pass the route
    check and then read another user's row by sending the encoded `+` form.
    
    Add `_enforce_user_info_access` and call it after `_normalize_user_info_user_id`
    returns the final id. Proxy admin / view-only admin still bypass; everyone
    else must match the resolved user_id (or have no user_id, which falls back
    to the caller's own id later in the handler).
    
    Tests cover the admin bypass, owner-match path, and the cross-user lookup
    that this change blocks.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(proxy): apply user_info ownership check to PROXY_ADMIN_VIEW_ONLY
    
    `_enforce_user_info_access` was bypassing both PROXY_ADMIN and
    PROXY_ADMIN_VIEW_ONLY, but the upstream route check in
    `RouteChecks.non_proxy_admin_allowed_routes_check` only treats
    PROXY_ADMIN as a true admin for the `/user/info` route — view-only
    admins go through the `user_id == valid_token.user_id` enforcement
    along with regular users. Mirroring that asymmetry left the same
    encoded-`+` bypass open for view-only admins whose user_id contains a
    literal space.
    
    Drop the PROXY_ADMIN_VIEW_ONLY exemption so the post-decode re-check
    matches the upstream rule. Update tests: a view-only admin must now
    be blocked from cross-user lookups but still allowed to read their
    own row.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: default avataryuneng-jiang <yuneng@berri.ai>
    Co-authored-by: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * feat(spend-logs): opt-in suppression of stack traces in spend-tracking error logs
    
    Adds LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS env var. When set to true and the
    proxy log level is INFO or above, spend-tracking error paths emit a single
    ERROR line without the full traceback. Stack traces are preserved at DEBUG
    and the Sentry / proxy_logging_obj.failure_handler path is unchanged.
    
    The new spend_log_error helper is wired through the spend write hot path:
      - DBSpendUpdateWriter (update_database, _update_*_db, batch upsert,
        redis-commit fallbacks)
      - _ProxyDBLogger._PROXY_track_cost_callback
      - get_logging_payload exception path
      - update_spend / update_daily_tag_spend / spend logs queue monitor
    
    Resolves LIT-2704.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * fix(spend-logs): preserve no-traceback behavior for update_daily_tag_spend
    
    This call site previously logged a single-line error via verbose_proxy_logger.error()
    with no traceback. Switching it to spend_log_error(..., exc=e) caused a full stack
    trace to render by default (when LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS is unset),
    which contradicts the PR goal of leaving default behavior unchanged. Revert this
    specific site to the original error log call.
    
    * fix(spend-logs): preserve no-traceback behavior for update_daily_tag_spend
    
    Bugbot caught a regression: the previous error log here was a single-line
    verbose_proxy_logger.error(...) with no traceback. spend_log_error attaches
    the active exception's traceback by default (when the suppression env var
    is unset), so swapping it in changed default behavior. Revert this one site
    to its original .error() call to keep the PR strictly opt-in.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * feat(spend-logs): suppress traceback in SpendLogs error_information row
    
    Extend LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS to the failure callback so the
    per-row Metadata pane in the UI no longer shows the stack trace when the
    opt-in env var is set, matching the existing console-side suppression.
    
    https://claude.ai/code/session_014dztoRbRnRvq54HL9EyHx6
    
    
    
    * [Fix] Proxy: Repair Merge Fallout In Router-Override Fallback Auth
    
    Conflict resolution for #26968 dropped the `Iterator` typing import
    (NameError at module load), left a dead `fallback_models = cast(...)`
    block, and the new tests called `_enforce_key_and_fallback_model_access`
    without the now-required `request` kwarg.
    
    * isolate dual OTEL handlers
    
    * harden cloud file compatibility path
    
    * harden cloud file compatibility path
    
    * [Fix] Proxy/Key Management: Align Key-Org Membership Checks On Generate And Regenerate
    
    Mirrors the membership rule on /key/update so that /key/generate and
    /key/{key}/regenerate apply the same `_validate_caller_can_assign_key_org`
    gate when the caller specifies an `organization_id`. Proxy admins bypass.
    The check no-ops when `organization_id` is not being set.
    
    * thread trusted params through vertex file content
    
    * trust only server legacy file flag
    
    * chore(proxy): keep public AI hub unauthenticated
    
    * fix(proxy): preserve low-detail readiness status
    
    * [Test] Anthropic: Replace Legacy Claude-4-Sonnet Alias With Haiku 4.5
    
    Three live-API tests pinned to claude-4-sonnet-20250514, which is a
    non-canonical alias of claude-sonnet-4-20250514. Anthropic's main API
    no longer resolves the legacy form under freshly issued keys, so the
    tests fail with not_found_error. The token counter test pinned to
    claude-sonnet-4-20250514 itself (deprecation_date 2026-05-14, two weeks
    out) was on borrowed time too.
    
    Bump all four to claude-haiku-4-5-20251001 — capability superset for what
    these tests exercise (streaming, parallel tool calling, extended thinking,
    token counting), no upcoming deprecation, cheaper per-token.
    
    * chore(proxy): move URL-valued model/file_id guard from SDK to proxy
    
    The previous per-provider guards in HuggingFace, Oobabooga, and Gemini
    files lived in the SDK layer, breaking SDK callers who legitimately pass
    URL-valued model identifiers. Move the check to the proxy boundary in
    add_litellm_data_to_request so SDK users keep working while proxy users
    default-deny URL-valued model and file_id, with admin opt-in via
    litellm.provider_url_destination_allowed_hosts.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * [Chore] Proxy/UI: Drop stray _experimental/out/chat/index.html
    
    This file is a regenerable UI build artifact that should not be tracked
    in source. Removing so the merge into litellm_internal_staging stays clean.
    
    * [Test] Anthropic Passthrough: Bump Streaming Cost-Injection Test To Haiku 4.5
    
    test_anthropic_messages_streaming_cost_injection hits the proxy's
    /v1/messages route, which routes via the anthropic/* wildcard to
    api.anthropic.com. The 404 surfaced in the test was Anthropic's own
    not_found_error propagated back through the proxy (visible from the
    x-litellm-model-id hash on the response — the proxy did route).
    
    Same root cause as the prior commit: the legacy claude-4-sonnet-20250514
    alias is no longer recognized by Anthropic's main API under the new key.
    Swap to claude-haiku-4-5-20251001 — same routing path, canonical model.
    
    * fix(proxy): handle ownership-recording failures after upstream create
    
    If record_container_owner raises after the upstream container is created,
    the user previously got a 500 with no usable container — they were billed
    for an unreachable resource. Move ownership recording into the create
    path's exception handling and split the two failure modes:
    
    - HTTPException from the recorder (auth conflicts) propagates verbatim
      so the client sees the real status code, not a generic LLM error.
    - Unexpected exceptions are logged and swallowed; the response is
      returned to the caller so they aren't billed for a container they
      can't address. The DB row stays untracked until an operator reconciles.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(guardrails): close post-call coverage gaps
    
    * fix(types): add /team/permissions_bulk_update to management_routes
    
    The blocklist check in _check_proxy_admin_viewer_access only fires for
    routes that match LiteLLMRoutes.management_routes — the bulk-update
    endpoint was missing from that list, so the test for view-only admins
    on /team/permissions_bulk_update fell through to "allow."
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * [Test] Anthropic Passthrough: Bump Thinking Tests Off Legacy Sonnet 4 Alias
    
    base_anthropic_messages_test.test_anthropic_messages_with_thinking and
    test_anthropic_streaming_with_thinking still pinned to
    claude-4-sonnet-20250514 — the same legacy alias Anthropic no longer
    recognizes under freshly issued keys. The other four tests in this base
    class already use claude-sonnet-4-5-20250929; these two were missed.
    
    Bump to claude-haiku-4-5-20251001 (supports_reasoning=true, no upcoming
    deprecation). Subclasses including TestAnthropicPassthroughBasic
    inherit these methods.
    
    * fix(guardrails): cover multi-choice output variants
    
    * fix(proxy): preserve public ai hub ui setting
    
    * fix(scim): cascade FK cleanup on user delete and surface block status in UI
    
    SCIM DELETE /Users/{id} previously called litellm_usertable.delete without
    clearing rows that FK back to the user, so Postgres rejected the delete with
    LiteLLM_InvitationLink_user_id_fkey and the SCIM caller saw a 500. Add a
    helper to drop invitation_link, organization_membership, and team_membership
    rows before the user delete (mirrors /user/delete in internal_user_endpoints).
    
    Also add a Status column to the Virtual Keys and Internal Users tables so
    admins can see at a glance which keys are blocked and which users SCIM has
    deactivated. SCIM-blocked keys carry a tooltip explaining the origin.
    
    Pin the dashboard's Node version to 20 via .nvmrc to match CI.
    
    * chore: update Next.js build artifacts (2026-05-02 03:21 UTC, node v20.20.2)
    
    * perf(proxy): cache container/skill ownership reads on the hot path
    
    Container ownership and skill rows are looked up on every retrieve /
    delete / list / file-content / chat-completion-with-skill call. The new
    stores wrapped raw Prisma queries with no cache, putting one DB
    round-trip on each request. Add an in-process TTL'd cache mirroring the
    _byok_cred_cache pattern in mcp_server/server.py: per-key (value,
    monotonic_timestamp), 60s TTL, 10000-entry cap with full-clear on
    overflow, invalidated by every write. Negative results (`None`) are
    cached too so untracked-resource checks also skip the DB.
    
    Tests cover: cache-after-first-hit, negative caching, write
    invalidation, no-caching-on-DB-error, TTL expiry, capacity eviction.
    56 tests pass.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * chore: update Next.js build artifacts (2026-05-02 03:39 UTC, node v20.20.2)
    
    * fix: remove traceback key instead of it being ""
    
    * fix: linting error
    
    * fix(scim): preserve scim_active on PUT when client omits the field
    
    A SCIM PUT may legally omit `active` (full-replace with the field
    absent). Pydantic fills the SCIMUser.active default of True, so the PUT
    handler was overwriting metadata.scim_active with True even when the
    client never sent it — silently reactivating a previously SCIM-blocked
    user and unblocking their keys.
    
    Use model_fields_set to detect whether the client actually sent
    `active`. If omitted, preserve the prior scim_active value and skip
    the cascade to virtual keys.
    
    Also drop comments added in this PR that just narrate what the code
    does; keep only the docstrings and the SQL-NULL pitfall note that
    explain non-obvious behaviour.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * fix(proxy): use set lookup for permitted agent filters
    
    * fix(mcp): redact command fields for non-admin server views
    
    * fix(proxy): forward decoded container ids after ownership checks
    
    * fix(caching): handle stale isolated Redis semantic index
    
    * fix(cloudflare): support response_text in streaming chunk parser
    
    Newer Cloudflare Workers AI models (e.g. Nemotron) emit 'response_text'
    instead of 'response' on streamed chunks. The non-streaming path was
    already updated to fall back to 'response_text' (#26385), but the
    streaming chunk parser still only read 'response', which caused
    streaming requests against those models to silently produce empty
    content.
    
    Mirror the non-streaming fallback in CloudflareChatResponseIterator.chunk_parser
    and add a streaming test for the response_text shape.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * Fix code qa
    
    * Address bugbot: drop dead encode/decode helpers; preserve empty custom_id
    
    - Remove unused _encode_gcp_label_value / _decode_gcp_label_value singular
      helpers; only the _chunks variants are actually called.
    - Use 'is not None' check for custom_id so empty-string custom_ids are
      still labeled and round-trip through batch outputs.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * Forward Vertex file content logging context
    
    * test vertex file content logging forwarding
    
    Co-authored-by: default avatarSameer Kankute <Sameerlite@users.noreply.github.com>
    
    * Fix Vertex batch output logging mutation
    
    * fix: don't mutate caller's logging_obj in _try_transform_vertex_batch_output_to_openai
    
    The method was overwriting logging_obj.optional_params, logging_obj.model,
    and logging_obj.start_time on the caller's Logging instance. When invoked
    from llm_http_handler.py's generic framework path, the framework's own
    logging_obj (which already went through pre_call) had its properties
    clobbered, causing model and start_time to reflect the last batch line's
    values rather than the original call context.
    
    Fix: create a fresh local Logging instance for the per-line transformation
    instead of mutating the incoming logging_obj. The caller's object is now
    left entirely untouched regardless of whether a logging_obj was passed in
    or not.
    
    Regression tests added to verify model, start_time, and optional_params
    are not mutated on the caller's logging_obj.
    
    Co-authored-by: default avatarSameer Kankute <Sameerlite@users.noreply.github.com>
    
    * feat: add opt-out flag for Vertex batch output transformation
    
    Adds litellm.disable_vertex_batch_output_transformation (default False).
    When True, afile_content returns raw Vertex predictions.jsonl untouched
    so users that parse candidates/modelVersion directly are not broken.
    
    * fix(anthropic,bedrock): omit thinking/output_config when reasoning_effort="none"
    
    Setting reasoning_effort="none" on Anthropic chat models (direct, Bedrock
    Invoke, Bedrock Converse, Vertex AI Anthropic, Azure AI Anthropic) crashed
    LiteLLM with:
    
      litellm.APIConnectionError: 'NoneType' object has no attribute 'get'
    
    Both the Anthropic chat transformation and Bedrock Converse called
    ``AnthropicConfig._map_reasoning_effort`` and assigned the ``None`` it returns
    for ``"none"`` directly to ``optional_params["thinking"]``. Downstream
    ``is_thinking_enabled`` then did ``optional_params["thinking"].get("type")``
    and crashed.
    
    Pop ``thinking`` (and on Claude 4.6/4.7, ``output_config``) instead of
    assigning ``None``, restoring the documented contract that
    ``reasoning_effort="none"`` means "do not enable thinking". This also
    prevents downstream Anthropic 400s ("thinking: Input should be an object",
    "output_config.effort: Input should be ...") if the bug were ever masked.
    
    Verified end-to-end against the live Anthropic API and Bedrock Converse
    on claude-opus-4-{5,6,7} and claude-sonnet-4-6, plus Bedrock Invoke for
    Claude 4.5/4.6. Vertex AI Anthropic and Azure AI Anthropic inherit the
    fixed ``map_openai_params`` from ``AnthropicConfig`` and need no further
    changes.
    
    * fix(vertex-ai): set response=null on batch error entries per OpenAI spec
    
    The Vertex batch output transformer was emitting both a populated 'response' and 'error' for failed batch entries. The OpenAI Batch output spec defines them as mutually exclusive: on error 'response' MUST be null. This broke any consumer using 'result["response"] is None' to detect failures.
    
    * test(vertex-ai): cover transformation_error path emits response=null
    
    * fix(security): sandbox jinja2 in gitlab/arize/bitbucket prompt managers
    
    DotpromptManager was hardened to render through
    ImmutableSandboxedEnvironment. The three sibling managers (gitlab,
    arize, bitbucket) were missed and still instantiate plain
    jinja2.Environment(), leaving the same attribute-traversal SSTI
    primitive open: a template fetched from a GitLab/BitBucket repo or
    Arize Phoenix workspace can reach __class__.__init__.__globals__ and
    execute arbitrary Python on the proxy host.
    
    Match the dotprompt pattern by switching all three to
    ImmutableSandboxedEnvironment. The sandbox blocks the dunder-traversal
    chain while leaving normal {{ var }} substitution intact, so the
    template surface is unchanged for legitimate use.
    
    Adds tests/test_litellm/integrations/test_prompt_manager_ssti.py
    (18 cases) verifying each manager's jinja_env is a sandbox, that
    classic SSTI payloads raise SecurityError, and that ordinary variable
    rendering still works.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * chore(proxy): drop client-supplied pricing fields from request bodies
    
    The proxy currently forwards request-body pricing parameters (the fields
    on `CustomPricingLiteLLMParams`, plus `metadata.model_info`) into the
    core call path. Those fields belong to deployment configuration, not to
    per-request input — sending them from a client mutates the request's
    recorded cost and, via `litellm.completion` → `register_model`, the
    process-wide `litellm.model_cost` map for every later caller in the
    worker. Strip them at the boundary.
    
    The strip set is built from `CustomPricingLiteLLMParams.model_fields` so
    pricing fields added later are covered automatically. Operators who do
    want clients to supply per-request pricing can opt back in per key or
    team via `metadata.allow_client_pricing_override = true`, mirroring the
    existing `allow_client_mock_response` and
    `allow_client_message_redaction_opt_out` flags.
    
    Tests cover the strip set's coverage, root and metadata strips, the
    opt-in skip on both key and team metadata, and a regression check that
    the global `litellm.model_cost` map is unmutated after a stripped
    request.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * chore(proxy): log stripped pricing fields at debug for operator visibility
    
    Operators upgrading would otherwise see client-supplied pricing overrides
    silently stop applying with no diagnostic. Emit a debug-level line listing
    the dropped fields and pointing at the opt-in flag when any are stripped;
    stay silent on the no-op path so the log isn't filled with noise.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(proxy): move pricing strip below the litellm_metadata JSON-string parse
    
    The strip ran before the proxy parses ``litellm_metadata`` from a JSON
    string into a dict (a path used by multipart/form-data and ``extra_body``
    callers), so ``isinstance(metadata, dict)`` was False and ``model_info``
    survived the strip. Move the call to the same post-parse position the
    ``user_api_key_*`` strip already uses for the same reason. Adds a
    regression test exercising the JSON-string ``litellm_metadata`` path.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * test(responses): replace legacy claude-4-sonnet alias in multiturn tool-call test
    
    Anthropic's main API no longer resolves the non-canonical 'claude-4-sonnet-20250514'
    alias for freshly issued keys, returning 404 not_found_error. PR #27031 already
    swept three other live tests pinned to this alias to claude-haiku-4-5-20251001
    but missed test_multiturn_tool_calls in the responses API suite, which is now
    failing reliably on PR CI runs (e.g. PR #27074, job 1603363).
    
    Bump the two model references in test_multiturn_tool_calls to the same
    claude-haiku-4-5-20251001 snapshot used by PR #27031 -- it covers everything
    this test exercises (tool calling, multi-turn) and isn't on a deprecation
    schedule.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * chore(proxy): close callback-config and observability-credential side channels
    
    Two related gaps in the proxy's request bouncer:
    
    1. ``is_request_body_safe`` (auth_utils.py) walked the request-body root
       and the ``litellm_embedding_config`` nested dict, but not ``metadata``
       or ``litellm_metadata``. The same fields it bans at root — Langfuse /
       Langsmith / Arize / PostHog / Braintrust / Phoenix / W&B Weave / GCS /
       Humanloop / Lunary credentials and routing — were silently accepted
       when the caller put them inside metadata, retargeting observability
       callbacks to a caller-controlled host with caller-supplied creds.
       Walk both metadata containers (and parse the JSON-string form sent via
       multipart / ``extra_body``) through the same banned-params helper, so
       the existing ``allow_client_side_credentials`` opt-in covers both
       paths consistently.
    
    2. The banned-params list was hand-maintained and lagged the canonical
       ``_supported_callback_params`` allow-list in
       ``initialize_dynamic_callback_params``. Derive the observability bans
       from that allow-list (minus a small ``_SAFE_CLIENT_CALLBACK_PARAMS``
       set for informational fields like ``langfuse_prompt_version`` and
       ``langsmith_sampling_rate``) so future integrations are covered
       automatically; ``_EXTRA_BANNED_OBSERVABILITY_PARAMS`` carries the
       handful of fields integrations read but the allow-list hasn't caught
       up to. A guard test fails CI if a new entry is added to
       ``_supported_callback_params`` without an explicit safe-list decision.
    
    Separately in ``litellm_pre_call_utils.py``: add ``callbacks``,
    ``service_callback``, ``logger_fn``, and ``litellm_disabled_callbacks``
    to ``_UNTRUSTED_ROOT_CONTROL_FIELDS``. The first three are appended to
    worker-wide ``litellm.{input,success,failure,_async_*,service}_callback``
    lists / ``litellm.user_logger_fn`` from inside ``function_setup`` — one
    request poisons every subsequent caller in that worker. The last is the
    inverse primitive: the legitimate path reads it from key/team metadata,
    the request-body version silently disables admin-configured audit /
    observability for the call.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(auth): per-param allow must continue, not return early
    
    A pre-existing logic bug in ``_check_banned_params``: when the
    deployment-level ``configurable_clientside_auth_params`` permitted one
    banned field, the loop ``return``-ed on the first match instead of
    ``continue``-ing, so any other banned param later in the same body or
    metadata dict was never checked. This PR's metadata walk multiplies the
    surface where that bypass matters — a body pairing an allowed
    ``api_base`` with an observability credential like ``langfuse_host``
    would silently pass.
    
    Proxy-wide ``allow_client_side_credentials`` keeps ``return`` (it's a
    global opt-in for every banned param). The per-param branch becomes
    ``continue`` so only the one explicitly-permitted field is skipped.
    
    Adds a regression test that exercises the api_base + langfuse_host pair.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(vector_store): resolve embedding config at request time, never persist creds
    
    The vector store create/update path previously called
    ``_resolve_embedding_config`` against the admin-configured router/DB
    model and persisted the resolved ``litellm_embedding_config`` dict
    (``api_key`` / ``api_base`` / ``api_version``) into the
    ``litellm_managedvectorstorestable.litellm_params`` column. Because the
    resolver expanded ``os.environ/...`` references via ``get_secret``, the
    DB row carried cleartext provider credentials, and the
    ``/vector_store/{new,info,update,list}`` responses returned them to any
    authenticated caller who could supply a known admin model name.
    
    Move the auto-resolve out of ``create_vector_store_in_db`` and out of
    the update path. Persist only the user-supplied ``litellm_embedding_model``
    reference. Resolve at request-handling time inside
    ``_update_request_data_with_litellm_managed_vector_store_registry`` so
    the resolved config lives in the per-request ``data`` dict and is
    garbage-collected after the response. Legacy rows that were created by
    an earlier proxy version and already carry a resolved
    ``litellm_embedding_config`` skip the re-resolution and pass through
    unchanged so embedding calls keep working.
    
    The ``new_vector_store`` response now also runs the existing
    ``_redact_sensitive_litellm_params`` masker (already used by ``info``,
    ``update``, and ``list``), defending against caller-supplied cleartext
    on the create path and against legacy rows whose persisted credentials
    are still in the database.
    
    Existing tests that asserted the old write-time-resolve behaviour are
    updated to assert the new persistence shape (no embedding config
    stored, just the model reference). Two new tests cover the use-time
    path: one asserting fresh resolution happens when a row carries only
    the model reference, the other asserting legacy rows with persisted
    config skip re-resolution and continue to work.
    
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * fix(vector_store): tighten registry-mutation comment and dedupe test helpers
    
    * fix(vector_store): cache use-time embedding-config resolution
    
    Hold the resolved config in a process-memory TTL cache so the
    request-handling path doesn't run litellm_proxymodeltable.find_first
    on every vector-store call.
    
    * fix(anthropic,bedrock,vertex): forward output_config.effort + 400 on garbage reasoning_effort
    
    Follow-up bugs surfaced by the QA sweep on PR #27039
    (https://github.com/BerriAI/litellm/pull/27039#issuecomment-4363363610
    
    ).
    
    1. Stop stripping output_config.effort on Bedrock + Vertex adaptive routes.
       - Vertex AI Claude 4.6/4.7 accepts output_config.effort on rawPredict
         (verified end-to-end against us-east5 / global). The strip helper now
         no-ops for effort.
       - Bedrock Converse routes output_config into additionalModelRequestFields
         for anthropic base models so the requested adaptive tier (low/medium/
         high/xhigh/max) actually reaches the wire instead of all collapsing to
         identical thinking.
       - Bedrock Invoke chat transformation (AmazonAnthropicClaudeConfig) stops
         popping output_config from the post-AnthropicConfig request body.
       - Bedrock Invoke /v1/messages allowlist (BedrockInvokeAnthropicMessagesRequest)
         now lists output_config so the runtime allowlist filter forwards it.
    
    2. Validate effort across Bedrock Converse so 'disabled' / 'invalid' / '' /
       unsupported tiers (xhigh/max on Sonnet 4.6 or budget-mode 4.5 models)
       surface as a clean 400 BadRequestError instead of 500.
    
    3. ValueError -> BadRequestError throughout (AnthropicConfig.map_openai_params,
       _apply_output_config, AmazonConverseConfig._handle_reasoning_effort_parameter).
       Empty-string effort is now rejected (was silently passing the
       'if effort and ...' short-circuit).
    
    4. Floor reasoning_effort='minimal' at the Anthropic provider minimum
       (1024 budget_tokens) via new ANTHROPIC_MIN_THINKING_BUDGET_TOKENS so it's
       a usable tier on direct Anthropic / Azure AI Anthropic / Vertex AI Anthropic /
       Bedrock Invoke (all of which 400 below 1024).
    
    5. model_prices: dedupe duplicate supports_max_reasoning_effort key on
       claude-opus-4-7 / claude-opus-4-7-20260416.
    
    Adds regression tests across all five affected paths; existing tests asserting
    the silent-strip behavior were updated to reflect the new pass-through and
    clean 400 surfaces.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * fix(constants): make ANTHROPIC_MIN_THINKING_BUDGET_TOKENS a plain constant
    
    The documentation CI test (tests/documentation_tests/test_env_keys.py)
    asserts every os.getenv() key in the source has a matching entry in the
    litellm-docs config_settings.md table. ANTHROPIC_MIN_THINKING_BUDGET_TOKENS
    tracks Anthropic's published wire-protocol minimum (1024) — it's not a
    user-tunable, so making it env-overridable was wrong anyway. Drop the
    os.getenv() wrapper; the value is now a plain literal.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * fix(anthropic,bedrock): correct effort error message and dedupe effort_map
    
    - Remove 'none' from the Bedrock _validate_anthropic_adaptive_effort error
      message; it was listed as a valid value but rejected by the membership
      check, leaving users in a feedback loop if they tried 'none'.
    - Hoist the duplicated reasoning_effort -> output_config.effort mapping
      out of AnthropicConfig.map_openai_params and
      AmazonConverseConfig._handle_reasoning_effort_parameter into a single
      AnthropicConfig.REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT class constant
      so the two routes cannot drift.
    
    * fix(anthropic): translate reasoning_effort on /v1/messages route
    
    Closes the remaining QA-sweep gap on PR #27074: Bedrock Invoke
    /v1/messages was silently ignoring ``reasoning_effort`` because the
    shared param filter only kept native Anthropic keys, so every effort
    tier collapsed to the same behavior on the wire (27/231 cells failing
    across opus-4-5 / opus-4-6 / sonnet-4-6).
    
    Map ``reasoning_effort`` to native Anthropic ``thinking`` /
    ``output_config.effort`` at the ``AnthropicMessagesConfig`` layer so
    all four /v1/messages routes (direct Anthropic, Azure AI, Vertex AI,
    Bedrock Invoke) inherit the same translation:
    
    - Add ``reasoning_effort`` to ``AnthropicMessagesRequestOptionalParams``
      so the param filter in
      ``AnthropicMessagesRequestUtils.get_requested_anthropic_messages_optional_param``
      no longer drops it before the transformation runs.
    
    - Add ``_translate_reasoning_effort_to_anthropic`` and call it from
      ``transform_anthropic_messages_request``. Mirrors
      ``AnthropicConfig.map_openai_params`` on the chat completion path
      (re-uses ``_map_reasoning_effort`` and
      ``REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT``) so the two routes
      cannot drift. Pops ``reasoning_effort`` so it never reaches the wire.
    
    - Caller-supplied native ``thinking`` / ``output_config.effort`` always
      win — same precedence as
      ``_translate_legacy_thinking_for_adaptive_model``.
    
    - Garbage values (``""``, ``"disabled"``, ``"invalid"``) raise
      ``AnthropicError(status_code=400)`` instead of falling through and
      surfacing as 500s from the provider.
    
    - ``"none"`` clears thinking + output_config so callers can opt out
      per request.
    
    Also restores the non-adaptive-model test coverage on Bedrock Invoke
    /v1/messages that the previous commit lost when
    ``test_bedrock_messages_strips_output_config`` was renamed to the
    ``forwards`` variant on Opus 4.7.
    
    Adds a new test file
    ``test_reasoning_effort_translation.py`` covering the translation at
    the shared config level (adaptive + non-adaptive models, none, garbage,
    caller precedence) so all four /v1/messages routes are exercised by a
    single suite.
    
    Adds parametrized + behavioral tests on the Bedrock Invoke /v1/messages
    suite covering: minimal/low/medium/high/xhigh/max mapping for adaptive
    models, thinking-budget mapping for non-adaptive Opus 4.5, ``none``
    clears both, garbage raises 400, explicit ``output_config`` wins.
    
    Refs: https://github.com/BerriAI/litellm/pull/27074
    
    
    
    * fix(anthropic,bedrock): reject unmapped reasoning_effort at mapping site
    
    Both the chat completion path (AnthropicConfig.map_openai_params) and the
    Bedrock Converse path (_handle_reasoning_effort_parameter) used
    REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get(value, value) which falls
    back to the raw input on unmapped keys. Combined with _map_reasoning_effort
    returning type='adaptive' for any string on Claude 4.6/4.7, garbage values
    (e.g. 'disabled') could leak into optional_params['output_config']['effort']
    unvalidated if map_openai_params ran without the downstream transform_request
    or _validate_anthropic_adaptive_effort check.
    
    Mirror the /v1/messages pattern: use .get(value) (no fallback) and raise
    BadRequestError immediately when the value is unmapped, co-locating
    validation with the mapping for defense in depth.
    
    * style: black formatting
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * fix(anthropic): stop class-attr leak; gate xhigh/max on every route
    
    The reasoning-effort mapping dict was a public class attribute on
    AnthropicConfig, so BaseConfig.get_config returned it as a request
    parameter and every Anthropic-backed call (Anthropic / Azure / Vertex /
    Bedrock Invoke) hit a 400 'REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT:
    Extra inputs are not permitted' from the provider. Move the mapping
    to a module-level constant.
    
    _supports_effort_level only looked the model up under
    custom_llm_provider='anthropic', so bedrock-prefixed model ids
    (e.g. bedrock/invoke/us.anthropic.claude-opus-4-7) returned False
    for both 'max' and 'xhigh' even when the underlying model entry has
    the flag set. Strip known provider prefixes and retry the lookup
    against litellm.model_cost directly so per-model gating works on
    every route.
    
    Mirror the per-model xhigh/max gate from
    AnthropicConfig._apply_output_config in
    AnthropicMessagesConfig._translate_reasoning_effort_to_anthropic so
    the /v1/messages route also raises a clean 400 instead of forwarding
    the unsupported tier.
    
    * feat(anthropic,bedrock): strip output_config under drop_params for non-effort models
    
    When a proxy fronts Claude Code (which always sends `output_config.effort`)
    at a pre-4.5 Anthropic model — haiku-3, sonnet-3.5, opus-3, sonnet-4 — the
    forwarded knob causes a forced 400 the client can't fix. Gating a strip
    behind the existing `drop_params` flag lets operators opt into silent
    fixup once and stop worrying about per-model param hygiene.
    
    Default (`drop_params=False`) still forwards and surfaces the provider's
    error, preserving the strict, debuggable contract from #27074.
    
    Per https://platform.claude.com/docs/en/build-with-claude/effort the
    supporting set is Opus 4.5+, Sonnet 4.6+, and Mythos Preview; everything
    else is dropped (with a verbose_logger warning so the strip is visible).
    Recognition uses model-name patterns plus a fallback to any
    `supports_*_reasoning_effort` flag in the model map for forward
    compatibility with new entries.
    
    https://claude.ai/code/session_01WjHq31rvXT6xYNdVmSJvRp
    
    
    
    (cherry picked from commit 1233943e7861ba8a9062f792310ebd401cb03db8)
    
    * fix(base_llm): filter all _-prefixed class attrs from get_config
    
    The drop_params strip work added `AnthropicConfig._EFFORT_SUPPORTING_MODEL_PATTERNS`
    as a private class-level lookup tuple. `BaseConfig.get_config()` only
    filtered the `__`-prefixed names plus `_abc` / `_is_base_class`, so
    `_EFFORT_SUPPORTING_MODEL_PATTERNS` would have leaked into the request
    body the same way `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT` did before
    the previous commit.
    
    Generalize the existing `_abc` / `_is_base_class` carve-outs to skip
    every `_`-prefixed name. `AmazonConverseConfig.get_config()` overrides
    the base method, so apply the same change there.
    
    Also unblocks future internal helpers from accidentally serialising into
    the wire body.
    
    * fix(anthropic): drive output_config.effort support from model map flags
    
    Replace hardcoded _EFFORT_SUPPORTING_MODEL_PATTERNS with a JSON-backed
    check that uses supports_*_reasoning_effort flags from the model map.
    Add supports_minimal_reasoning_effort: true to opus-4-5 and mythos-preview
    entries (which previously only carried supports_reasoning) so the JSON
    remains the single source of truth for effort capability.
    
    * fix(anthropic,bedrock,databricks): four reasoning_effort follow-ups
    
    - claude-sonnet-4-6 + reasoning_effort=max no longer 400s. Renamed
      _is_opus_4_6_model to _is_claude_4_6_model at three sites and added
      supports_max_reasoning_effort: true to 12 model entries in the JSON
      cost map (10 sonnet 4.6 ids + OpenRouter opus 4.6/4.7).
    - _map_reasoning_effort now raises BadRequestError(400) directly with
      llm_provider, instead of letting Databricks (and similar callers)
      surface its raw ValueError as a 500.
    - output_config.effort on Opus 4.5 over Bedrock no longer 400s for
      missing effort-2025-11-24 beta. Flipped JSON to "effort-2025-11-24"
      for bedrock + bedrock_converse and added an auto-attach branch in
      _process_tools_and_beta for non-adaptive Anthropic + output_config
      on Converse.
    - reasoning_effort=xhigh / =max on legacy budget-mode models
      (Haiku 4.5, Sonnet 4.5, Opus 4.5) now map to thinking.budget_tokens
      8192 / 16384 instead of returning 400. Added two constants in
      litellm/constants.py.
    
    Tests updated for all four flips. Validated end-to-end via 306-cell
    live proxy matrix (6 model families x 3 routes x 17 effort cases),
    all pass.
    
    * fix(databricks): validate reasoning_effort and set output_config on adaptive Claude
    
    The Databricks path called `AnthropicConfig._map_reasoning_effort` for
    Claude models but never validated the effort string nor set
    `output_config.effort` for adaptive models (Claude 4.6/4.7). Since
    `_map_reasoning_effort` returns `type=adaptive` for ANY non-None /
    non-"none" string on adaptive models (including "disabled",
    "invalid", ""), Databricks silently accepted garbage and emitted a
    request without an `output_config.effort`, collapsing every adaptive
    tier to identical behavior.
    
    Match the Anthropic native, Bedrock Converse, Bedrock Invoke, and
    /v1/messages paths: when the resolved `thinking` is non-None on a
    4.6/4.7 model, look up the value in
    `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT` and either raise a clean
    `BadRequestError` or set `optional_params["output_config"]`.
    
    * fix(azure): omit model from image generation and image edit deployment requests
    
    Azure OpenAI routes image gen/edit by deployment in the URL; sending the
    deployment id in model breaks gpt-image-2 (invalid_value). Strip model from
    JSON for deployments/.../images/generations and from multipart data for
    .../images/edits. Non-deployment URLs (e.g. Azure AI FLUX) unchanged.
    
    Fixes #26316.
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * test(azure): exercise image gen JSON filter via HTTP client; dedupe image edit URL
    
    - Image generation tests patch HTTPHandler.post / get_async_httpx_client so
      make_*_azure_httpx_request runs and wire json is asserted on call kwargs.
    - Azure image edit: strip model in finalize_image_edit_multipart_data using the
      same URL string the handler passes to POST (no second get_complete_url in
      transform). BaseImageEditConfig default finalize is a no-op.
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * fix(azure_ai/anthropic): promote output_config out of extra_body so validation runs
    
    `azure_ai` is registered in `litellm.openai_compatible_providers`, so
    `add_provider_specific_params_to_optional_params` (litellm/utils.py)
    auto-stuffs any non-OpenAI kwarg (e.g. `output_config={"effort": "..."}`)
    into `optional_params["extra_body"]`. `AzureAnthropicConfig.transform_request`
    then strips `extra_body` entirely on the way out, silently dropping the
    param — and `AnthropicConfig._apply_output_config` never sees it, so
    `effort="invalid"` / `effort="xhigh"` on a non-supporting model
    quietly reaches the model with default behavior instead of returning a
    clean 400 (as the native `anthropic` provider does).
    
    Promote the keys back to top-level `optional_params` (using `setdefault`
    so explicit top-level values win) before delegating to the parent
    `AnthropicConfig`. Apply in both `validate_environment` and
    `transform_request` so flag detection (`is_mcp_server_used`, etc.) and
    output-config validation both run.
    
    Surfaced by the QA matrix expansion on PR #27074: 20 cells where Azure
    returned 200 while `anthropic` returned 400 — all `output_config` mode
    across haiku_4_5, sonnet_4_5, opus_4_5, sonnet_4_6, opus_4_6, opus_4_7
    families with `effort` in {invalid, xhigh, max, low, medium, high}.
    
    Tests:
    * `test_output_config_promoted_from_extra_body`: valid effort reaches data
    * `test_invalid_output_config_effort_raises_via_extra_body`: 400 on bad effort
    * `test_unsupported_effort_xhigh_raises_via_extra_body`: 400 on xhigh-on-Sonnet-4.6
    * `test_extra_body_promotion_does_not_clobber_top_level`: setdefault semantics
    
    * test(image_gen): expect no model in Azure image edit multipart (#26316)
    
    Align test_azure_image_edit_litellm_sdk with deployment-scoped Azure edits.
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * refactor(anthropic): extract _validate_effort_for_model to prevent drift
    
    The chat completion path (`_apply_output_config`) and the /v1/messages
    pass-through (`AnthropicMessagesConfig._translate_reasoning_effort_to_anthropic`)
    both gate `max` / `xhigh` per model. The two sites had diverged from
    near-identical copies into separately maintained blocks, creating a real
    drift risk when a new model tier (e.g. Claude 4.8) lands -- a contributor
    could update one site and miss the other.
    
    Centralise the gating in `AnthropicConfig._validate_effort_for_model`,
    which returns an error message string or `None`. Each call site keeps
    its own provider-appropriate exception type (`BadRequestError` for the
    chat path, `AnthropicError` for the /v1/messages pass-through) but the
    gating decision now comes from one place. Net -11 LOC.
    
    Adds a parametrised unit test exercising the helper directly across
    4.5 / 4.6 / 4.7 model families and `max` / `xhigh` / lower-effort
    inputs. Existing tests at both call sites continue to pass unchanged.
    
    Addresses Greptile finding on PR #27074.
    
    * fix(databricks): narrow reasoning_effort_value to str for mypy
    
    `non_default_params.get("reasoning_effort")` returns `Any | None`,
    but `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get()` expects `str`.
    Mypy flagged this on the strict pass. Narrow with `isinstance` before
    the lookup; non-strings fall through to the existing `BadRequestError`
    below with a clean validation message, so behavior is unchanged.
    
    Fixes a regression introduced by 1a10746e95 in this PR.
    
    * feat(proxy): add health_check_reasoning_effort for model health checks
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * test(image_gen): align Azure image gen fixture with body omitting model
    
    Expected JSON matches deployment-scoped Azure POST (#26316).
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * test(anthropic/chat): force PR-local model_cost map via autouse fixture
    
    CI runs without LITELLM_LOCAL_MODEL_COST_MAP=True, so litellm.model_cost
    is loaded from main-branch JSON (default model_cost_map_url) instead of
    the PR's checked-out model_prices_and_context_window.json. Tests that
    assert per-model flags added in this PR (supports_max_reasoning_effort,
    supports_xhigh_reasoning_effort) therefore pass locally but fail in CI
    with 'AssertionError: assert False is True' on 5 cases:
    
      - test_anthropic_model_supports_effort_param_recognizes_supporting_models
        [anthropic.claude-mythos-preview, bedrock/.../mythos-preview,
         claude-opus-4-5-20251101]
      - test_supports_effort_level_handles_provider_prefixes
        [bedrock/invoke/us.anthropic.claude-sonnet-4-6-max-True,
         claude-sonnet-4-6-max-True]
    
    Add an autouse fixture at tests/test_litellm/llms/anthropic/chat/conftest.py
    that monkey-patches litellm.model_cost to the PR-local JSON for every test
    in this directory. The parent conftest already snapshots+restores
    litellm.model_cost per-function, so the mutation is contained.
    
    This is a scoped workaround. The proper fix is to set the env var
    globally in the test workflow once the ~10 inline self-set test files
    are audited; tracking that as a follow-up issue.
    
    * [Fix] Docker: Pin Wolfi And Uv To Multi-Arch Index Digests
    
    The previous pins resolved to single-platform amd64 manifests, so buildx
    pulled the same amd64 base for both linux/amd64 and linux/arm64 targets.
    The published OCI index then advertised an arm64 entry whose layers are
    byte-identical to amd64 -- arm64 users got an amd64 binary.
    
    Switch all three Dockerfiles to the multi-arch image-index digests:
      - cgr.dev/chainguard/wolfi-base   (index has linux/amd64 + linux/arm64)
      - ghcr.io/astral-sh/uv:0.11.7     (index has linux/amd64 + linux/arm64)
    
    Resolved with `docker buildx imagetools inspect <ref>` -- that returns
    the index digest. `docker pull` + `docker inspect` returns the per-host
    platform digest, which is what slipped in last time.
    
    * [Fix] Docker: Pin Uv To Multi-Arch Index Digest In Remaining Dockerfiles
    
    Apply the same fix to the three Dockerfiles not in the release pipeline
    today (alpine, dev, health_check) so they stay correct if/when they're
    built for arm64 in the future.
    
    Wolfi pins are not present in these files; the python:3.11-alpine and
    python:3.13-slim digests they already use are multi-arch indexes that
    include arm64/v8, so only the uv pin needed swapping.
    
    * fix(xai): fold reasoning_tokens into completion_tokens to satisfy OpenAI invariant
    
    xAI's chat completions API accounts reasoning_tokens separately from
    completion_tokens, but rolls them into total_tokens. This breaks the
    OpenAI invariant total_tokens == prompt_tokens + completion_tokens
    that downstream consumers (including litellm's own _usage_format_tests
    in tests/llm_translation/base_llm_unit_tests.py:58) rely on.
    
    Live capture (grok-3-mini-beta, 2026-05-04):
        prompt=14, completion=10, total=336, reasoning=312
        14 + 10 = 24, NOT 336.
    
    OpenAI's o1/o3 reasoning models include reasoning_tokens in
    completion_tokens, leaving the prompt+completion=total invariant
    intact. xAI deviates. This patch aligns xAI to OpenAI semantics by
    folding reasoning_tokens into completion_tokens after the parent
    OpenAI parser runs.
    
    The fold is idempotent and defensive:
    - Only fires when total_tokens == prompt_tokens + completion_tokens
      + reasoning_tokens (the documented xAI shape). Refuses to fold if
      the gap doesn't match, guarding against silent corruption when xAI
      changes accounting.
    - Skips if completion_tokens already covers the gap (already
      normalised — e.g. cost calc replays a previously-folded Usage).
    
    xai.cost_calculator.cost_per_token already added reasoning_tokens to
    the visible completion count for billing. Post-fold the Usage block
    now satisfies that invariant directly, so the cost calc would
    double-bill. Updated cost_per_token to detect the OpenAI-normalised
    shape (total == prompt + completion) and skip the reasoning add-on
    in that case, falling through to the legacy raw-shape behaviour for
    callers that bypass the transformation (e.g. proxy log replay).
    
    Tests:
    - Adds TestXAIReasoningTokenFolding covering: gap-explained-fold,
      idempotent-no-double-fold, no-reasoning-skip, gap-mismatch-skip.
    - Adds test_already_normalised_usage_does_not_double_count_reasoning
      to lock the cost-calc idempotency.
    - Updates 7 pre-existing cost-calc tests whose total_tokens was
      internally inconsistent (used the OpenAI-normalised total but kept
      reasoning_tokens external) to use the documented xAI raw shape
      total = prompt + visible completion + reasoning. Pre-existing
      values masked the missing-fold by accident.
    
    Verified end-to-end against the live xAI API:
        LITELLM_LOCAL_MODEL_COST_MAP=False (CI default) +
        XAI_API_KEY set +
        pytest tests/llm_translation/test_xai.py::TestXAIChat::test_prompt_caching
            -> PASSED in 18.81s (was: AssertionError on
            usage.total_tokens == usage.prompt_tokens + usage.completion_tokens)
    
    20/20 tests in tests/test_litellm/llms/xai/test_xai_cost_calculator.py
    and 8/8 in tests/test_litellm/llms/xai/test_xai_chat_transformation.py
    pass.
    
    * refactor(bedrock/converse): delegate effort gating to AnthropicConfig._validate_effort_for_model
    
    Removes the duplicated max/xhigh gating logic in
    _validate_anthropic_adaptive_effort and the now-unused
    _supports_effort_level_on_bedrock helper. Per-model gating now flows
    through the centralized AnthropicConfig._validate_effort_for_model
    (whose _supports_effort_level already strips Bedrock prefixes), so the
    chat completion, /v1/messages, and Bedrock Converse paths can't drift
    when a new gated effort tier is added.
    
    * Implement normalize_nonempty_secret_str function to trim whitespace from secrets and treat empty values as unset. Update proxy_server to use this function for Grafana credentials. Enhance tests to validate the new normalization behavior.
    
    * Fix qdrant semantic cache miss metadata
    
    * chore(deps): refresh dependency locks
    
    * chore(deps): authorize pytest license
    
    * fix: preserve tokenizer decode round trips
    
    * refactor(anthropic): drive adaptive-thinking gate via supports_adaptive_thinking flag
    
    Three of greptile's open comments on #27074 (P2 converse:512, P1
    databricks:361, and the underlying capability-flag policy rule) flagged
    the same pattern: _is_claude_4_6_model(...) or _is_claude_4_7_model(...)
    used inline as a runtime 'is this an adaptive-thinking model?' check.
    That requires a code release each time a new adaptive Claude lands.
    
    Consolidate the inline gating to AnthropicModelInfo._is_adaptive_thinking_model,
    and switch the helper itself to read a new supports_adaptive_thinking
    flag from `model_prices_and_context_window.json` via `_supports_factory`,
    falling back to the family pattern only when the model-map entry doesn't
    carry the flag (preserves OpenRouter / Vercel / Bedrock-prefixed variants
    that route through the same code path with non-canonical ids).
    
    Adds `supports_adaptive_thinking: true` to the four 4.6/4.7 anthropic
    entries (opus-4-6 + dated, opus-4-7 + dated, sonnet-4-6). Bedrock-prefixed
    and Vertex-prefixed entries don't need the flag because both fall back
    through the family pattern (the helper short-circuits early on True from
    either path) and the bedrock/vertex Claude IDs all match the existing
    opus-4-{6,7} / sonnet-4-{6,7} pattern.
    
    Affected call sites:
    
    - `bedrock/chat/converse_transformation.py:_handle_reasoning_effort_parameter`
    - `anthropic/chat/transformation.py:_map_reasoning_effort`
    - `anthropic/chat/transformation.py:map_openai_params` (output_config branch)
    - `databricks/chat/transformation.py:map_openai_params` (output_config branch)
    
    The remaining `_is_claude_4_6_model` / `_is_claude_4_7_model` references
    in `AnthropicConfig._validate_effort_for_model` and
    `AnthropicConfig.get_supported_openai_params` are intentionally retained:
    they're per-model gating fallbacks for variants whose model-map entries
    don't yet carry the `supports_max_reasoning_effort` /
    `supports_reasoning` flag. Those are documented in-place.
    
    Tests: 537 anthropic/bedrock/databricks/vertex/messages tests pass.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * chore(deps): address dependency review notes
    
    * test(model_prices): add supports_adaptive_thinking to schema
    
    `test_aaamodel_prices_and_context_window_json_is_valid` validates the
    model-map JSON against an explicit schema with `additionalProperties`,
    so the new `supports_adaptive_thinking` flag added in
    98ced0ae43 needs a matching schema entry.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * refactor: remove unnecessary comments from #27074
    
    Strip out the explanatory and historical comments that don't carry
    business-logic justification. Comments that simply narrate what code
    does — or that explain prior behavior, what was changed, or which PR
    introduced a fix — are removed. Docstrings are reduced to a one-line
    summary where the long form repeated information already evident from
    the code or test data.
    
    No code-behavior changes. All 643 affected unit tests still pass.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test: keep decode token test local
    
    * chore(deps): align dashboard node engine
    
    * feat: selectively apply routing strategy according to model name
    
    * style: make _model_supports_effort_param more concise
    
    * refactor(anthropic,bedrock): hoist drop_params output_config warning to module constant
    
    Three call sites (anthropic chat, bedrock converse, bedrock invoke messages)
    emitted the same '...Effort is only supported on Opus 4.5+, Sonnet 4.6+, and
    Mythos Preview' warning verbatim. Extract DROP_UNSUPPORTED_OUTPUT_CONFIG_WARNING
    in litellm/llms/anthropic/chat/transformation.py and import it from the bedrock
    sites so future copy edits live in one place.
    
    Addresses Michael's review on PR #27074.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * refactor(anthropic,bedrock,databricks): factor BadRequestError for unknown reasoning_effort
    
    Three call sites raised the same BadRequestError("Invalid reasoning_effort:
    ... Must be one of 'minimal', 'low', ...") block when REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT
    returned None: anthropic chat map_openai_params, bedrock converse
    _handle_reasoning_effort_parameter, and databricks chat reasoning_effort path.
    
    Extract AnthropicConfig._raise_invalid_reasoning_effort(model, value, llm_provider)
    so future copy edits / valid-set changes happen in one place. Typed as NoReturn
    so type-checkers correctly narrow control flow at call sites.
    
    Addresses Michael's review on PR #27074.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * Clean up Redis semantic cache isolation fallback
    
    * fix(guardrails): align banned_keywords + azure_content_safety call_type gates with runtime route_type
    
    The hooks gated on ``call_type == "completion"`` but the proxy ingress
    passes ``route_type`` straight through as ``call_type`` —
    ``"acompletion"`` for /v1/chat/completions and ``"aresponses"`` for
    /v1/responses. Tests passed because they used the literal sync
    ``"completion"`` value, masking the gap.
    
    Switch both hooks to ``is_text_content_call_type`` (matches the
    canonical runtime values: completion / acompletion / aresponses) and
    update existing tests to assert against runtime values, plus parametrize
    a regression test that pins the gate.
    
    * fix: remove unused import
    
    * Add semantic cache legacy migration flag
    
    * Treat 0 team_member_budget as no cap
    
    * chore(caching): annotate qdrant quantization_params dict type
    
    Mypy infers the dict's value type from the first branch
    (Dict[str, bool]) which clashes with the scalar branch's mixed-type
    inner dict. Explicit Dict[str, Any] annotation lifts the inference.
    
    * chore(caching): remove allow_legacy_unscoped_cache_hits opt-in
    
    The flag was an opt-in escape hatch for the cross-tenant leak the rest
    of the patch closes — flipping it on (env var or constructor param)
    re-enables exactly the VERIA-54 primitive on either backend. There is
    no operational need that the secure path doesn't already meet:
    
    - Qdrant: legacy points without ``litellm_cache_key`` payload are
      excluded by the must-clause filter and treated as misses; new sets
      populate the cache key, so cold-start lasts only as long as the
      natural cache rebuild.
    - Redis: existing unscoped index can't carry the new schema; the init
      path falls back to ``{name}_isolated`` (and recreates it on stale
      schema), leaving the legacy index untouched.
    
    Drop the constructor param, env-var fallback, ``_using_legacy_unscoped_index``
    flag, the legacy-reuse branch in ``_init_semantic_cache``, and the
    matching guards in set/get paths. Update tests to drop the legacy-mode
    cases and assert the secure-only behaviour.
    
    * fix(container): keep ownership-filter exceptions out of the LLM-error path
    
    filter_container_list_response runs after the upstream call has
    already succeeded; treating an ownership-lookup failure as an LLM-API
    error fires post_call_failure_hook for a successful upstream call and
    returns a misleading provider-shaped error to the client. Run the
    filter outside the try/except so genuine LLM errors stay scoped to
    the upstream call.
    
    * chore(container,skills): LRU eviction for owner caches; widen file_purpose Literal
    
    Two cleanups from the /simplify pass:
    
    * ``_CONTAINER_OWNER_CACHE`` and ``_SKILL_CACHE`` now LRU-evict via
      ``OrderedDict.popitem(last=False)`` instead of full ``clear()`` at
      capacity. Full clears converted a steady-state cached workload into a
      periodic full-DB-load oscillation as the cache repopulated from zero
      and cleared again. Reads now ``move_to_end`` so the just-touched
      entry survives the next eviction. Mirrors the pre-existing LRU
      pattern in ``_remember_container_owner``.
    
    * ``LiteLLM_ManagedObjectTable.file_purpose`` Literal now includes
      ``"container"`` so Pydantic validation accepts rows written by the
      ownership store.
    
    * chore(container,skills): drop legacy-access opt-out env vars
    
    LITELLM_ALLOW_UNTRACKED_CONTAINER_ACCESS and
    LITELLM_ALLOW_UNOWNED_SKILL_ACCESS were operator-toggleable opt-outs
    for the cross-tenant access primitive this PR closes — flipping either
    on re-enabled exactly the VERIA-20 read path. Default-secure with no
    escape hatch matches sibling fixes (vector-store cred isolation, semantic
    cache key isolation, user_config strip): all rejected the
    opt-out-of-security pattern.
    
    Untracked containers and unowned skills (rows that pre-date this
    enforcement) are admin-only. Non-admin owners need to either re-create
    via the now-tracked flow or have an admin assign ``created_by`` on the
    existing row. Update tests to assert the strict-only behaviour.
    
    * fix(ownership): reject identity-less callers instead of sharing a sentinel scope
    
    UNSCOPED_RESOURCE_OWNER_SCOPE collapsed every caller without an
    identity field (no user_id / team_id / org_id / api_key / token) into
    a single shared owner — a cross-tenant access primitive: any two such
    callers could see and delete each other's containers and skills.
    
    Drop the sentinel. ``get_primary_resource_owner_scope`` returns
    ``None`` and ``get_resource_owner_scopes`` returns ``[]`` for
    identity-less callers. ``record_container_owner`` and
    ``LiteLLMSkillsHandler.create_skill`` now reject creates from
    identity-less callers with a 403 instead of stamping the placeholder.
    Read paths already deny ``owner is None`` correctly so legacy rows
    (if any) are admin-only.
    
    * fix(proxy): include request-blocked callback params in auth bans
    
    * fix: keep skills handler FastAPI-free; fold gcs deny list into the body bouncer
    
    Two cleanups:
    
    * ``LiteLLMSkillsHandler.create_skill`` raised ``HTTPException`` for
      identity-less callers, importing FastAPI from a ``litellm/llms/``
      module — that violates the project rule that FastAPI lives only
      under ``proxy/``. Switch to ``ValueError`` (the same shape the rest
      of the handler uses for not-found/forbidden) and update the test.
    
    * The proxy-auth body bouncer derived its observability ban list from
      ``_supported_callback_params`` only, missing
      ``_request_blocked_callback_params`` (where ``gcs_bucket_name`` and
      ``gcs_path_service_account`` live). Two recently-merged sibling PRs
      (#27019 added the deny list, #27081 added the test asserting these
      are rejected at the request body root) crossed without folding them
      together. Union the GCS deny list into the bouncer's derivation so
      the single source of truth covers both code paths.
    
    * fix(proxy): normalize managed resource team owner field
    
    * chore: simplify ownership tracking — drop thin stores, in-memory fallback, hand-rolled cache
    
    Substantial reduction (~765 LOC) without changing the security
    boundary:
    
    * Drop ContainerOwnershipStore and LiteLLMSkillsStore — both were
      one-method-per-Prisma-call wrappers. Inline the calls instead,
      matching the established pattern in vector_store_endpoints,
      agent_endpoints, and mcp_server/db.py.
    
    * Drop the prisma_client is None in-memory fallback. Production
      deploys always have Prisma; running ownership-critical paths on a
      process-local dict is a security footgun in the dev-mode case it
      was meant to support, and complicates every code path with a
      branch. Fail-secure: skip recording if Prisma is unavailable, and
      treat reads as "not found" (admin-only).
    
    * Drop the hand-rolled module-level cache. Replace with the existing
      litellm.caching.in_memory_cache.InMemoryCache, which already has
      TTL + max-size + eviction tested in its own module. Sentinel string
      for negative caching since InMemoryCache can't disambiguate "miss"
      from "cached as None".
    
    * Tests: drop coverage for removed code paths (in-memory fallback,
      hand-rolled cache internals). Keep tests for actual behavior (cache
      hit-rate, negative caching, owner check, list filtering,
      identity-less reject, admin bypass).
    
    * fix(container): cache list-allow-set, track admin-created containers
    
    Address Greptile P2 follow-ups from the prior round:
    
    * Cache ``_get_allowed_container_ids`` (60s LRU/TTL keyed by sorted
      owner-scope tuple) so ``GET /v1/containers`` doesn't issue a fresh
      ``find_many`` against ``litellm_managedobjecttable`` on every list
      call. Invalidate the caller's own cache entry when they record a
      new owner so the just-created container shows up on their next list.
    
    * Tighten the admin early-return in ``record_container_owner`` to skip
      ONLY when there's literally no container ID to stamp. An admin with
      identity (the master-key path populates ``user_id`` + ``api_key``)
      flows through the normal record path so admin-created containers are
      tracked like any other caller's. The truly-identity-less admin case
      still falls through to the 403 below — correct fail-secure default.
    
    Skill-cache invalidation gap (also flagged by Greptile) is moot: there
    is no skill update endpoint exposed; ownership-affecting mutations are
    only delete (already invalidates) and create (new ID, no cache entry
    to update).
    
    * chore(container): use delete_cache, json-encode scope key, clean test
    
    /simplify follow-ups:
    
    * Replace the two-``pop`` reach into ``cache_dict``/``ttl_dict`` with
      the existing public ``InMemoryCache.delete_cache(key)`` — the same
      idiom used elsewhere in the proxy. Bonus: ``delete_cache`` calls
      ``_remove_key`` which also handles ``expiration_heap`` consistency
      the direct pops were silently leaking.
    
    * JSON-encode the sorted scope list for the cache key instead of
      ``"|".join``. ``user_id`` / ``team_id`` / ``org_id`` / ``api_key``
      are free-form strings and could contain a literal ``|`` — JSON
      quoting escapes any in-string separator unambiguously.
    
    * Extract ``_allowed_container_ids_cache_key()`` so the read and
      invalidation sites compute the key the same way.
    
    * Fix a placeholder-then-overwrite test construction: the
      ``__module__.split(".")[0] and "proxy_admin"`` line evaluated to a
      literal string that was immediately overwritten with the real enum
      value. Hoist the import and construct directly.
    
    * [Fix] Tests: Replace deprecated openrouter/claude-3.7-sonnet with claude-sonnet-4.5
    
    OpenRouter has dropped active endpoints for anthropic/claude-3.7-sonnet,
    causing test_reasoning_content_completion to fail with a 404 "No endpoints
    found" error. Switch to anthropic/claude-sonnet-4.5, which is current and
    supports reasoning streaming.
    
    * feat: routing groups ui
    
    * fix(security): prevent secret_fields from leaking into spend logs
    
    secret_fields (containing raw HTTP headers including Authorization
    Bearer tokens) was being included in proxy_server_request['body']
    because the body snapshot was a copy.copy(data) of the full request
    dict. This body gets serialized and persisted in the LiteLLM_SpendLogs
    table, exposing user credentials in the database.
    
    Root cause: data['secret_fields'] was set before the body snapshot at
    data['proxy_server_request']['body'] = copy.copy(data), so the full
    raw headers (including auth tokens) ended up in the snapshot.
    
    Fix (defense in depth):
    1. Exclude 'secret_fields' when creating the body snapshot in
       litellm_pre_call_utils.py (primary fix)
    2. Strip 'secret_fields' in _sanitize_request_body_for_spend_logs_payload
       as a secondary safeguard
    
    secret_fields remains available on the live data dict for legitimate
    downstream consumers (MCP, Responses API).
    
    Co-authored-by: default avatarKrrish Dholakia <krrish-berri-2@users.noreply.github.com>
    
    * chore: update Next.js build artifacts (2026-05-05 02:13 UTC, node v20.20.2)
    
    * [Fix] Proxy: Break managed-resources import cycle on Python 3.13
    
    The Python 3.13 CCI smoke matrix surfaces a partially-initialized-module
    ImportError when loading the managed files hook chain:
    
      litellm.proxy.hooks/__init__ (mid-import)
        -> enterprise.enterprise_hooks
        -> litellm_enterprise.proxy.hooks.managed_files
        -> litellm.llms.base_llm.managed_resources.isolation
        -> litellm.proxy.management_endpoints.common_utils
        -> litellm.proxy.utils  (re-enters litellm.proxy.hooks)
    
    The except ImportError block in hooks/__init__.py silently swallowed the
    failure, leaving managed_files unregistered and POST /files returning
    500 "Managed files hook not found".
    
    Two-layer fix:
    - Inline the 3-line _user_has_admin_view check in isolation.py instead
      of importing it from litellm.proxy.management_endpoints.common_utils.
      litellm.llms.* should not depend on litellm.proxy.* — removing this
      layering violation breaks the cycle at its root.
    - Define PROXY_HOOKS and get_proxy_hook before the conditional
      enterprise import in litellm/proxy/hooks/__init__.py, so any future
      re-entry resolves the public names instead of hitting an
      ImportError on a partially-initialized module.
    
    Also fold in two unrelated CCI repairs surfaced in the same staging run:
    - tests/otel_tests/test_key_logging_callbacks.py: per-key
      gcs_bucket_name / gcs_path_service_account are now stripped by
      initialize_dynamic_callback_params, so the GCS client falls through
      to the env-only branch. Update the assertion to match the new
      "GCS_BUCKET_NAME is not set" message.
    - .circleci/config.yml: tests/pass_through_tests now resolves
      google-auth-library@10.x via the @google-cloud/vertexai 1.12.0 bump,
      which uses dynamic ESM imports Jest 29 cannot load without
      --experimental-vm-modules. Pass that flag in the Vertex JS test step.
    
    Adds tests/test_litellm/proxy/hooks/test_proxy_hooks_init.py as a
    regression guard: managed_files / managed_vector_stores must register,
    and isolation.py must not transitively import litellm.proxy.utils.
    
    * [Fix] Proxy: Address Greptile feedback on hook-cycle PR
    
    - Move _user_has_admin_view to litellm.proxy._types as
      user_api_key_has_admin_view (single source of truth). common_utils.py
      and isolation.py both import from there now, removing the duplicated
      role-check that could silently diverge if new admin roles are added.
    - Add pytest.importorskip("litellm_enterprise") to the two regression
      tests that assert managed_files / managed_vector_stores are registered;
      those keys come from ENTERPRISE_PROXY_HOOKS so the tests would fail
      unconditionally in a checkout without the enterprise extra installed.
    
    * [Fix] Lint: Mark _user_has_admin_view re-export in common_utils
    
    Ruff F401 flagged the aliased import as unused within common_utils.py
    because the name is consumed only by external modules (~15 callers
    across guardrails, spend tracking, MCP, agents, management endpoints).
    Add `# noqa: F401  re-exported` so the alias survives lint while
    keeping a single source of truth in litellm.proxy._types.
    
    * refactor(azure): move image gen JSON helper; rename image edit finalize hook
    
    - Add image_generation/http_utils.azure_deployment_image_generation_json_body; call
      from azure.py (keeps AzureChatCompletion focused on chat).
    - Rename finalize_image_edit_multipart_data to finalize_image_edit_request_data with
      docstring covering multipart and JSON POST payloads (review feedback).
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * test(proxy): cover health_check_reasoning_effort for completion mode
    
    Co-authored-by: default avatarCursor <cursoragent@cursor.com>
    
    * [Fix] Tests: Use master key for /otel-spans in test_chat_completion_check_otel_spans
    
    /otel-spans now requires proxy admin (returns 401 'Only proxy admin
    can be used to generate, delete, update info for new keys/users/teams.
    Route=/otel-spans' for non-admin callers). Switch the GET call to use
    the master key sk-1234 while keeping the generated key for the
    chat-completion request that produces the spans.
    
    * [Fix] Tests: Pick chat-completion OTEL trace by content, not recency
    
    The /otel-spans endpoint returns process-wide spans and tags
    most_recent_parent by max start_time. After tightening that route to
    proxy_admin (sk-1234), the GET /otel-spans request itself emits auth
    spans that beat the chat-completion spans on start_time, so
    most_recent_parent now points at the request's own auth trace
    (['postgres', 'postgres']) and the >=5-span assertion fails.
    
    Pick the chat-completion trace by content: it is the only trace whose
    span list is a superset of {postgres, redis, raw_gen_ai_request,
    batch_write_to_db}. Verified locally end-to-end against
    otel_test_config.yaml + OTEL_EXPORTER=in_memory: 3/3 runs green.
    
    * [Fix] CI: Enable VCR replay for test_azure_o_series
    
    The Azure o-series tests were excluded from the conftest's VCR auto-marker
    because of a respx/vcrpy transport-patching conflict, but the only respx
    reference in the file was an unused `MockRouter` import. Drop the dead
    import and remove the file from the conflict set so cassettes record on
    first run and replay thereafter, eliminating the 60-95s live Azure latency
    that was crashing xdist workers under --timeout=120 thread-mode timeouts.
    
    * [Fix] Tests: Restore /metrics access for prometheus test suite
    
    /metrics now requires auth by default; tests/otel_tests/test_prometheus.py
    makes 4+ unauthenticated GETs against http://0.0.0.0:4000/metrics, so
    every prometheus test in CI now fails the metric assertion.
    
    Set require_auth_for_metrics_endpoint: false in otel_test_config.yaml
    to opt out for this test job, which scrapes /metrics directly. Verified
    locally: 8/8 prometheus tests green (one flaky retry on
    test_proxy_success_metrics that pre-dates this PR).
    
    Also drop the -x stop-on-first-failure flag from the otel test command
    so all failures in the job surface in a single CI run rather than
    hiding behind whichever one trips first.
    
    * [Perf] CI: Skip Redundant Playwright Apt Install in E2E UI Job
    
    The cimg/python:3.12-browsers base image already ships every Chromium
    system dependency Playwright needs (libnss3, libatk-bridge2.0-0,
    libcups2, etc. — the install log shows them all as "already the newest
    version"). Passing --with-deps to `npx playwright install` therefore
    runs an apt-get update + install for nothing, but pays the full cost of
    hitting Ubuntu mirrors. On a recent run those mirrors stalled hard:
    apt-get update alone took 6m53s at 81.5 kB/s with several archives
    returning connection refused.
    
    Drop --with-deps and persist ~/.cache/ms-playwright alongside
    node_modules so the Chromium binary is also reused across runs. Bump
    the cache key to v2 so the existing v1 entry (which only contained
    node_modules) is not loaded and skipped over the new browser path.
    
    * [Fix] Docker: Remove Hardcoded Prisma Binary Target For Multi-Arch Builds
    
    PRISMA_CLI_BINARY_TARGETS="debian-openssl-3.0.x" was hardcoded in
    docker/Dockerfile.non_root by #17695. On a buildx linux/arm64 leg this
    forces prisma to download the amd64 schema-engine into an arm64 image,
    so 'prisma migrate deploy' fails at startup with 'Could not find
    schema-engine binary'.
    
    Removing the env lets prisma auto-detect per build platform: amd64
    builds still resolve to debian-openssl-3.0.x (Wolfi falls back to
    debian, same binary as before), and arm64 builds now correctly fetch
    linux-arm64-openssl-3.0.x. The offline-cache pre-warm goal of #17695 is
    preserved — only which binaries fill the cache changes.
    
    Fixes #19458
    
    * [Fix] UI: Clear Admin Session Cookies Before Establishing Invited User's Session (#27227)
    
    The invite-signup form was writing the new user's token via raw
    `document.cookie` at `path=/`, while the rest of the auth surface uses
    `storeLoginToken` (which writes at `path=/ui` and mirrors to
    sessionStorage). After signup the inviter's `path=/ui` cookie kept
    winning path-specificity matching, and sessionStorage still held the
    inviter's token, so the dashboard rendered as the inviter rather than
    the newly created user.
    
    Treat invite signup as a principal-change boundary — clear prior
    session cookies first, then store the new token via the canonical
    helper.
    
    * test: add 24hr Redis-backed VCR cache to additional test suites (#27159)
    
    * test: add 24hr Redis-backed VCR cache to additional test suites
    
    Extracts the existing llm_translation VCR plumbing into a reusable helper
    (tests/_vcr_conftest_common.py) and wires it into the conftest.py files
    of the test directories listed in LIT-2787:
    
      audio_tests, batches_tests, guardrails_tests, image_gen_tests,
      litellm_utils_tests, local_testing, logging_callback_tests,
      pass_through_unit_tests, router_unit_tests, unified_google_tests
    
    The same helper is also adopted by the pre-existing llm_translation and
    llm_responses_api_testing conftests to remove the copy-pasted VCR setup.
    
    Each consuming conftest:
    - registers the Redis persister via pytest_recording_configure
    - auto-marks collected tests with pytest.mark.vcr (skipping respx-using
      files where applicable, since respx and vcrpy both patch httpx)
    - gates cassette writes on test success via _vcr_outcome_gate
    
    The cache is opt-in via CASSETTE_REDIS_URL; when unset, VCR is disabled
    and tests hit live providers as before. LITELLM_VCR_DISABLE=1 still
    forces a bypass for ad-hoc local runs.
    
    Test directories that run LiteLLM proxy in Docker (build_and_test,
    proxy_logging_guardrails_model_info_tests, proxy_store_model_in_db_tests)
    are intentionally not included: VCR.py patches the in-process httpx
    transport and cannot intercept calls made from inside a Docker container.
    The installing_litellm_on_python* jobs make no LLM calls and don't
    benefit from caching.
    
    https://linear.app/litellm-ai/issue/LIT-2787/add-24hr-caching-to-additional-test-suites
    
    
    
    * test(vcr): add safe-body matcher to handle JSONL and binary request bodies
    
    vcrpy's stock body matcher inspects Content-Type and unconditionally
    runs json.loads on application/json bodies. JSON Lines payloads (used
    by the Bedrock batch S3 PUT and other upload paths) crash that with
    json.JSONDecodeError: Extra data, before the matcher can return
    'not a match'.
    
    This was the root cause of the batches_testing CI job failing on
    test_async_create_file once VCR auto-marking was applied to the
    batches_tests directory.
    
    Add a conservative byte-equality body matcher and use it in place of
    'body' in the shared match_on tuple. The matcher is strictly more
    conservative than vcrpy's default — the only thing it gives up is
    'different JSON key order is treated as the same body', which doesn't
    apply to deterministic litellm-built request payloads. It can never
    produce a false positive that the default would have rejected, so
    there is no cross-contamination risk.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): exclude tests that VCR replay actively breaks
    
    A few tests are incompatible with cassette replay and were failing on
    the latest CI run after VCR auto-marking was extended to local_testing
    and logging_callback_tests:
    
    - test_amazing_s3_logs.py (logging_callback_tests): the test asserts on
      a per-run response_id that should round-trip through a real S3
      PUT/LIST. vcrpy's boto3 stub intercepts the PUT and the LIST replays
      stale keys, so the freshly-generated id is never found.
    - test_async_embedding_azure (logging_callback_tests) and
      test_amazing_sync_embedding (local_testing): the failure branches
      deliberately pass api_key='my-bad-key' to assert that the failure
      callback fires. We scrub auth headers from cassettes (so the bad-key
      request matches the prior good-key request), and vcrpy replays the
      recorded 200 — the failure callback never fires.
    - test_assistants.py (local_testing): the OpenAI Assistants polling
      APIs mint fresh thread/run IDs every recording session and then poll
      until status=='completed'. Replays of those polled GETs can never
      match a freshly-generated run id, so every CI run effectively
      re-records and the suite blows past the 15m no_output_timeout.
    
    Skip these from VCR auto-marking so they continue to hit live providers
    as they did before this change. The remaining tests in each directory
    still get cached.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): expand skip lists for second batch of incompatible tests
    
    Followup to the previous commit. After re-running CI on the rebuilt
    branch, three more tests surfaced as VCR-replay-incompatible:
    
    - litellm_utils_testing :: test_get_valid_models_from_dynamic_api_key
      Calls GET /v1/models with api_key='123' to assert the result is empty.
      We scrub auth headers, so the bad-key request matches the prior
      good-key cassette and replays the recorded model list.
    - litellm_utils_testing :: test_litellm_overhead.py
      Measures litellm_overhead_time_ms as a percentage of total wall-clock
      time. With cached responses the upstream 'network' time collapses to
      microseconds, blowing past the 40%% threshold the test asserts on.
      Skip the whole file (every parametrization is at risk).
    - local_testing_part1 :: test_async_custom_handler_completion and
      test_async_custom_handler_embedding
      Same bad-key failure-callback pattern as the already-skipped
      test_amazing_sync_embedding.
    - litellm_router_testing :: test_router_caching.py
      Asserts on litellm's own router-level response cache by comparing
      response1.id to response2.id across repeat upstream calls (test
      bypasses litellm cache via ttl=0 and expects upstream to return a
      *new* id). With VCR replay both upstream calls return the same
      cassette body, so the ids are identical. Skip the whole file.
    - logging_callback_tests :: test_async_chat_azure (preemptive)
      Same shape as already-skipped test_async_embedding_azure; was masked
      by upstream OpenAI rate-limit failures on baseline.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): use item.path and tighten matcher docstring
    
    - Replace pytest's deprecated item.fspath with item.path in
      apply_vcr_auto_marker_to_items so we don't emit deprecation
      warnings under pytest 8.
    - Clarify _safe_body_matcher docstring to reflect actual behavior
      (direct == first, then UTF-8 bytes comparison, no repr fallback).
    
    Addresses Greptile review feedback on PR #27159.
    
    * test(vcr): swallow all RedisError on cassette save/load
    
    Cassette persistence is strictly best-effort: any Redis-side failure
    (connection blip, timeout, OutOfMemoryError when the maxmemory cap is
    hit, READONLY replicas, etc.) should degrade to 'test passed but
    cassette not cached' rather than fail the test on teardown.
    
    Previously the persister only caught ConnectionError and TimeoutError,
    so OutOfMemoryError — which Redis Cloud raises when the cassette cache
    hits its memory cap and there are no evictable keys — propagated out of
    vcrpy's autouse fixture and ERRORed otherwise-passing tests on
    teardown. This caused the litellm_utils_testing CircleCI job to fail on
    the latest commit's run, even though the underlying test was a unit
    test that used mock_response and produced no real upstream traffic
    (the cassette was dirtied by a background langfuse callback). The
    rerun only succeeded because Redis evictions happened to free enough
    room before the SET — i.e. it was timing-dependent flakiness.
    
    Catch redis.exceptions.RedisError (the common base of all server- and
    client-side Redis exceptions) on both save and load, and parametrize
    the regression tests across ConnectionError, TimeoutError, and
    OutOfMemoryError to pin the new behavior.
    
    * test(vcr): surface cassette-cache failures with warnings + session banner
    
    When the persister silently swallows a Redis OOM (or any RedisError) on
    save/load there is otherwise no visible signal that the cache is
    degraded — tests pass, the cassette just isn't persisted, and the next
    session still hits the same Redis at the same near-cap memory.
    
    Add three layers of observability so that failure mode is loud:
    
    1. Per-process health counters ("save_failures", "load_failures", and
       the last error string for each), exposed via cassette_cache_health()
       and reset via reset_cassette_cache_health(). The persister
       increments these in addition to logging.
    
    2. VCRCassetteCacheWarning (UserWarning subclass) emitted via
       warnings.warn() inside the persister's except block. Pytest's
       built-in warnings summary at session end automatically lists every
       such warning, so the failure is visible in CI logs without any
       conftest-level wiring.
    
    3. Session-end banner via emit_cassette_cache_session_banner() and a
       stderr-fallback atexit handler registered from
       register_persister_if_enabled(). Two states:
         - red "VCR CASSETTE CACHE DEGRADED" when save_failures or
           load_failures > 0
         - yellow "VCR CASSETTE CACHE NEAR CAPACITY" (no failures, but
           used_memory >= 85% of maxmemory) so the next session knows
           the Redis is approaching OOM before any SET actually fails
    
    Capacity comes from a best-effort INFO memory probe
    (cassette_cache_capacity_snapshot) that returns None on any failure or
    when maxmemory is uncapped. The atexit handler skips xdist workers so
    only the controller emits.
    
    Tests: parametrize the existing save/load swallow-error tests across
    ConnectionError/TimeoutError/OutOfMemoryError, add direct tests for
    the health counters and warning emission, and a new
    test_vcr_conftest_common_banner.py covering banner output for every
    state (silent/red/yellow/disabled/xdist-worker).
    
    * test(vcr): bucket cassettes by API key fingerprint, drop bad-key skips
    
    Tests that deliberately call an LLM API with a bad key (e.g. to assert
    that the failure callback fires, or that check_valid_key returns False)
    were being silently served the prior good-key cassette: we scrub the
    real Authorization / x-api-key header from the cassette before storing
    it, so a follow-up bad-key call is byte-identical to the good-key call
    under the existing match_on tuple.
    
    Add a 'key_fingerprint' custom matcher that distinguishes requests by
    the SHA-256 of their API-key headers. The fingerprint is stamped into
    a synthetic 'x-litellm-key-fp' header by a new before_record_request
    hook, which then strips the real auth headers (we have to do the
    scrubbing here instead of via vcrpy's filter_headers knob, because
    filter_headers runs *first* and would erase the value we want to hash).
    
    Bad-key requests now get a different cassette bucket than good-key
    requests, so vcrpy will not replay a recorded 200 in place of the
    expected 401. The fingerprint is a one-way hash of the secret, so
    cassettes never contain the key.
    
    This permanently removes the 'bad-key' category of skips:
    
    - tests/local_testing: dropped ::test_amazing_sync_embedding,
      ::test_async_custom_handler_completion,
      ::test_async_custom_handler_embedding
    - tests/logging_callback_tests: dropped ::test_async_chat_azure,
      ::test_async_embedding_azure
    - tests/litellm_utils_tests: dropped
      ::test_get_valid_models_from_dynamic_api_key
    
    Coverage: 7 new unit tests in tests/test_litellm/test_vcr_safe_body_matcher.py
    covering header stripping, fingerprint determinism, no-auth bucketing,
    good-vs-bad key discrimination, x-api-key (Anthropic/Azure) discrimination,
    and idempotence under replay.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): drop redundant comments and docstrings
    
    Trim narration of code that is already self-evident from function and
    variable names. Keep the two genuinely non-obvious bits:
    
    - ordering constraint between filter_headers and before_record_request,
      which would invite a maintainer to re-introduce the bug if removed
    - the per-directory _VCR_INCOMPATIBLE_FILES rationale, since 'why
      exactly is this skipped' is not knowable from the test name alone
    
    Also drop the 40-line commented-out drop-in conftest snippet at the
    bottom of _vcr_conftest_common.py — the consuming conftests are the
    canonical reference.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): make _before_record_request idempotent
    
    vcrpy invokes before_record_request more than once per request:
    can_play_response_for calls it, then __contains__ /
    _responses (reached via play_response) call it again on the
    result. The second invocation sees a request whose auth headers we
    already stripped, so a naive recompute yields "no-key" and
    overwrites the real fingerprint stored in the header.
    
    This makes can_play_response_for and play_response disagree on
    matchability — the former says "yes, we have a stored response for
    this" (matching no-key to no-key) and the latter throws
    UnhandledHTTPRequestError because it computes a fresh real
    fingerprint that doesn't match the stored no-key.
    
    In CI this manifested as ~30 failing tests across guardrails_testing,
    audio_testing, batches_testing, image_gen_testing, llm_responses_api,
    litellm_router_unit_testing, etc. Skip the recompute when the header
    is already set, so re-applying the hook is a no-op.
    
    Adds a regression test that fires the hook twice on the same dict and
    asserts the fingerprint stays put.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * test(vcr): drop more redundant docstrings and headers
    
    * test(vcr): enable 24hr cache for ocr_tests and search_tests
    
    These two directories were the only non-dockerized test suites in the
    build_and_test workflow that make live LLM/provider API calls but were
    not VCR-enabled by this PR. Together they account for 96 tests:
    
    - tests/ocr_tests/ (31): Mistral OCR, Azure AI OCR, Azure Document
      Intelligence, Vertex AI OCR. Pure-unit tests inside the same files
      (e.g. TestAzureDocumentIntelligencePagesParam) make no HTTP calls
      and become benign VCR NOOPs.
    - tests/search_tests/ (65): Brave, DataForSEO, DuckDuckGo, Exa,
      Firecrawl, Google PSE, Linkup, Parallel.ai, Perplexity, SearchAPI,
      Searxng, Serper, Tavily.
    
    Both directories use the canonical minimal conftest pattern from
    tests/audio_tests/conftest.py with no skip lists. None of the test
    files use respx, none assert on per-call upstream non-determinism
    (no response1.id != response2.id, no overhead-as-fraction-of-total,
    no live polling), so the default match_on tuple should cache cleanly.
    If a flake surfaces during the first cassette-recording CI run, we
    can add a targeted skip the same way we did for the other dirs.
    
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: default avatarClaude <noreply@anthropic.com>
    Co-authored-by: default avatarCursor Agent <cursoragent@cursor.com>
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    
    * [Fix] Team UI: handle legacy dict shape for metadata.guardrails (#27224)
    
    * [Fix] Team UI: handle legacy dict shape for metadata.guardrails
    
    A team can have metadata.guardrails stored as {"modify_guardrails": bool}
    (the permission-flag shape introduced in PR #4810) rather than the
    expected string[]. The opt-out logic added in PR #25575 calls .filter()
    on this field, which throws TypeError on a dict and crashes the team
    detail page.
    
    Add a safeGuardrailsList helper that returns [] when the field is not
    an array, and route the three read sites through it.
    
    * [Fix] Team UI: inline Array.isArray guards for guardrails metadata
    
    Replace the safeGuardrailsList helper with inline Array.isArray checks
    at each call site, and apply the same guard to opted_out_global_guardrails
    for consistency. No known legacy dict rows for opted_out_global_guardrails,
    but the unguarded `|| []` pattern is the same shape risk.
    
    Six call sites now defended directly: three for metadata.guardrails
    and three for metadata.opted_out_global_guardrails.
    
    * chore: update Next.js build artifacts (2026-05-05 22:45 UTC, node v20.20.2) (#27240)
    
    * [Infra] Bump deps (#27157)
    
    * bump: version 0.4.70 → 0.4.71
    
    * bump: version 0.1.39 → 0.1.40
    
    * uv lock
    
    ---------
    
    Co-authored-by: default avatarMichael Riad Zaky <michaelr@Mac.localdomain>
    Co-authored-by: default avatarMateo Wang <277851410+mateo-berri@users.noreply.github.com>
    Co-authored-by: default avatargreptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
    Co-authored-by: default avatarryan-crabbe-berri <ryan@berri.ai>
    Co-authored-by: default avataruser <70670632+stuxf@users.noreply.github.com>
    Co-authored-by: default avatarCursor Agent <cursoragent@cursor.com>
    Co-authored-by: default avatarClaude <noreply@anthropic.com>
    Co-authored-by: default avatarshivam <shivam@berri.ai>
    Co-authored-by: default avatarMateo Wang <mateo-berri@users.noreply.github.com>
    Co-authored-by: default avatarYassin Kortam <yassin@berri.ai>
    Co-authored-by: default avatarKrrish Dholakia <krrish+github@berri.ai>
    Co-authored-by: default avatarshin-berri <shin-laptop@berri.ai>
    Co-authored-by: default avatarSameer Kankute <sameer@berri.ai>
    Co-authored-by: default avatarSameer Kankute <Sameerlite@users.noreply.github.com>
    Co-authored-by: default avatarMichael-RZ-Berri <michael@berri.ai>
    Co-authored-by: default avatarharish-berri <harish@berri.ai>
    Co-authored-by: default avatarYassin Kortam <yassinkortam@g.ucla.edu>
    Co-authored-by: default avatarMichael Riad Zaky <michaelr@Michaels-MacBook-Air.local>
    Co-authored-by: default avatarKrrish Dholakia <krrish-berri-2@users.noreply.github.com>
Loading