Routes
Routes are the paths that define your application's behavior.
Routes are the paths that accept incoming requests and define your application's behavior. They are defined in the routes
section of your app.
Each route is defined by a path and can have one or more providers to proxy the requests.
Path
The path is the URL that the route will match. It can contain only alphanumeric characters, hyphens, underscores, and slashes.
You can use paths to have different behavior for different use cases. For example, you can have a route that matches /openai
to proxy requests to the OpenAI API and another route that matches /anthropic
to proxy requests to the Anthropic API.
Schema
Defines which provider schema to use for the route. This will determine the structure of the request and response objects. You can use the OpenAI schema for input and output but have the request proxied to the Anthropic API or any other provider.
Routify automatically maps the request and response objects from the schema you specify to the provider's schema used in the route providers.
This allows you to switch providers without changing your application code.
Failover
If enabled, the request is retried with the next provider if the current provider fails to respond. This is useful when you have multiple providers for a route and want to ensure high availability.
You need to have at least two providers for the failover to work.
Load Balancing
If enabled, the request is proxied to the providers in a round-robin fashion, with custom weights for each provider. This is useful when you have multiple providers for a route and want to distribute the load evenly.
If you have only one provider, the request is always proxied to that provider.
If you have more than one provider, you can specify the weight for each provider. The weight determines the probability of the provider being selected.
You can perform A/B testing by changing the weights of the providers and observing the results.
Caching
If enabled, the response from the provider is cached for a specified duration. Subsequent requests for the same input are served from the cache.
Caching is applying on route provider level. If you have multiple providers for a route, the response from each provider is cached separately.
If you have load balancing enabled, it will impact the cache hit rate. If the request is proxied to a different provider, the cache will not be used.
Caching is applied on exact input match. If you change the input even slightly, including temperature or max tokens, the cache will not be used.
Cached responses are not calculated in the cost statistics.
Cost limit
The cost limit is the maximum cost allowed for a request to the route. If the cost of the request exceeds the limit, the request is rejected.
The response for requests that exceed the cost limit is a 429 status code.
The cost is calculated based on the provider's cost per input and output tokens.
You can specify the cost limit per day or per month. The cost is reset at the beginning of the day or month, depending on the limit you set.
Learn more about cost estimation.
Providers
Providers are the API services that offer generative AI models. They are defined in the providers
section of your app. You can configure more than one provider for a route.
When a request is made to a route, the request is proxied to the providers in the order they are defined. The first provider that returns a response is used to respond to the request.
If you enable failover, the request is retried with the next provider if the current provider fails to respond.
If you enable load balancing, the request is proxied to the providers in a round-robin fashion, with custom weights for each provider.
If you enable caching, the response from the provider is cached for a specified duration. Subsequent requests for the same input are served from the cache.
Weight
The weight determines the probability of the provider being selected when load balancing is enabled. The higher the weight, the higher the probability of the provider being selected.
You can specify the weight for each provider. If you have only one provider, the weight is always 1.
Using values from 1 to 100 allows you to perform A/B testing by changing the weights of the providers and observing the results.
Timeout
The timeout is the maximum time allowed for a request to the route. If the request takes longer than the timeout, the request is rejected.
Keep in mind that the timeout is applied for the complete outgoing request to the provider. If the provider takes longer to respond, the request is rejected. Also, the provider API geo-location can impact the response time.
Model
The provider model is the generative AI model used to generate the response. Each provider can have multiple models, and you can specify the model to use for the request.
This allows you to switch models without changing your application code. The selected model will override the model used in request input.