Skip to main content

REST API Interview Questions (2026)

100 real interview questions with in-depth answers — 30 basic, 40 intermediate, 30 advanced. Updated April 2026.

Preparing for a Backend Developer (APIs) role?

REST (Representational State Transfer) is an architectural style for distributed hypermedia systems defined by Roy Fielding in his 2000 dissertation. The six constraints are: (1) **Client–Server** — concerns are separated so the UI and data storage can evolve independently; (2) **Statelessness** — every request from the client must contain all information the server needs; session state is never stored server-side; (3) **Cacheability** — responses must declare whether they can be cached to reduce load; (4) **Uniform Interface** — resources are identified in requests, manipulated through representations, messages are self-descriptive, and HATEOAS drives application state; (5) **Layered System** — clients cannot tell whether they are connected directly to the origin server or an intermediary; (6) **Code on Demand** (optional) — servers may extend client functionality by transferring executable code such as JavaScript. An API that satisfies all mandatory constraints is said to be RESTful.

An API (Application Programming Interface) is a defined contract that allows two software systems to communicate. It specifies what operations are available, what inputs they accept, and what outputs they return — without exposing internal implementation details. Web APIs typically communicate over HTTP using JSON or XML payloads. The client sends a request to a well-known URL endpoint; the server processes it and returns a structured response. APIs enable loose coupling: a mobile app, a web frontend, and a third-party partner can all consume the same backend service without knowing how it is built internally.

The five most common HTTP methods map onto CRUD operations: **GET** retrieves a resource without side effects — never use it to modify data. **POST** creates a new resource or triggers a non-idempotent action; the server determines the new resource URI. **PUT** replaces a resource entirely at a known URI; the client sends the full representation. **PATCH** applies a partial update — only the fields provided are changed. **DELETE** removes a resource.

http
GET    /orders/42          # read order 42
POST   /orders             # create a new order
PUT    /orders/42          # replace order 42 wholesale
PATCH  /orders/42          # update specific fields
DELETE /orders/42          # remove order 42

Use GET and HEAD for safe, read-only operations. Use PUT/DELETE when the action is idempotent. Use POST for anything that must only run once.

An operation is idempotent if calling it once produces the same server state as calling it N times. GET, HEAD, PUT, DELETE, OPTIONS, and TRACE are all idempotent. POST is NOT idempotent — submitting the same order form twice creates two orders. PATCH is technically not guaranteed to be idempotent (a patch that appends to a list is not), though many implementations treat simple field updates as idempotent. Idempotency matters for reliability: if a network request times out and you are unsure whether the server received it, you can safely retry an idempotent method without fear of duplicate side effects.

A method is safe if it does not modify server state — it is purely a read operation. GET, HEAD, OPTIONS, and TRACE are safe. Safe methods are a subset of idempotent methods: all safe methods are also idempotent, but not all idempotent methods are safe (DELETE is idempotent but not safe). Browsers and caches rely on the safety guarantee to prefetch GET requests and allow them to be replayed without asking the user. Violating this — for example, using GET to delete a record — breaks cache correctness and can cause data loss when crawlers or prefetch mechanisms visit the URL.

HTTP status codes are three-digit numbers grouped into five classes: **1xx Informational** — the request was received and processing is continuing (e.g., 100 Continue, 101 Switching Protocols). **2xx Success** — the request was successfully received, understood, and accepted. **3xx Redirection** — further action is needed to complete the request (e.g., 301 Moved Permanently, 302 Found, 304 Not Modified). **4xx Client Error** — the request contains bad syntax or cannot be fulfilled — this is the client's fault. **5xx Server Error** — the server failed to fulfil a valid request — this is the server's fault. Correctly using these families lets clients react programmatically: a 4xx means "don't retry without fixing the request"; a 5xx means "retry may succeed later".
**200 OK** is the generic success code for GET, PUT, PATCH, and DELETE when the response body contains content. **201 Created** signals that a POST (or PUT) created a new resource; the response should include a `Location` header pointing to the new resource URI. **204 No Content** means success but there is nothing to return — common for DELETE or PATCH operations where the caller does not need the updated resource. ```http HTTP/1.1 201 Created Location: /users/99 Content-Type: application/json { "id": 99, "email": "alice@example.com" } ``` Using 201 vs 200 correctly helps clients know whether a new resource was created so they can cache or navigate to it.
**400 Bad Request** — the request body or parameters are malformed or fail validation. **401 Unauthorized** — authentication is required but missing or invalid (misleadingly named; it means "unauthenticated"). **403 Forbidden** — the client is authenticated but not authorised for this resource. **404 Not Found** — the resource does not exist at this URI. **409 Conflict** — the request conflicts with current state (e.g., duplicate unique key). **422 Unprocessable Entity** — the request is syntactically valid JSON but semantically invalid (preferred by many REST APIs over 400 for validation errors). **429 Too Many Requests** — rate limit exceeded; the response should include `Retry-After` so clients back off.
**500 Internal Server Error** — a catch-all for unhandled exceptions; the server crashed or encountered an unexpected condition. **502 Bad Gateway** — a gateway or proxy received an invalid response from an upstream server (common when a load balancer cannot reach the app). **503 Service Unavailable** — the server is temporarily overloaded or down for maintenance; clients should respect the `Retry-After` header. **504 Gateway Timeout** — the gateway did not receive a timely response from the upstream server. The key distinction: 5xx errors tell the client it did nothing wrong and a retry (with backoff) may succeed, whereas 4xx errors require the client to change the request.
A **URI** (Uniform Resource Identifier) is the broadest concept — a string that identifies a resource. A **URL** (Uniform Resource Locator) is a URI that also provides the means to locate the resource, i.e., it includes the scheme and network location (`https://api.example.com/users/1`). A **URN** (Uniform Resource Name) is a URI that names a resource persistently without implying its location (`urn:isbn:978-0-596-51774-8`). In practice, every URL is a URI, but not every URI is a URL. Web APIs almost always deal with URLs. The important components of a URL are: scheme, host, port (optional), path, query string, and fragment.
HTTP headers are key-value metadata sent with requests and responses that convey information beyond the body — content type, caching instructions, authentication tokens, compression, and more. Common **request headers**: `Authorization` (credentials), `Content-Type` (body format), `Accept` (desired response format), `User-Agent`, `Cookie`, `If-None-Match`, `X-Request-ID`. Common **response headers**: `Content-Type`, `Content-Length`, `Cache-Control`, `ETag`, `Location` (redirect target), `Set-Cookie`, `Retry-After`, `Access-Control-Allow-Origin` (CORS). ```http GET /products/5 HTTP/1.1 Host: api.example.com Authorization: Bearer eyJ... Accept: application/json ```
`Content-Type` is an HTTP header that tells the receiver what media type (MIME type) the body of a request or response contains. For API requests with a body (POST, PUT, PATCH), the client must set `Content-Type: application/json` so the server knows how to parse it. For responses, the server sets it so the client can deserialise correctly. Missing or wrong `Content-Type` causes parsers to fail silently or throw errors. Common API values: `application/json`, `application/x-www-form-urlencoded`, `multipart/form-data`, `text/plain`, `application/octet-stream`.
The `Accept` header is sent by the client to declare which media types it can process in the response. The server uses content negotiation to pick the best match and sets the response `Content-Type` accordingly. The value can include multiple types with quality factors (`q` values). ```http Accept: application/json, text/html;q=0.9, */*;q=0.8 ``` If the server cannot produce any of the accepted types, it returns **406 Not Acceptable**. APIs that only speak JSON often ignore `Accept` entirely and always return `application/json`, but properly honouring it enables serving the same endpoint in multiple formats (JSON, XML, CSV).
CORS (Cross-Origin Resource Sharing) is a browser security mechanism that restricts web pages from making HTTP requests to a different origin (scheme + host + port) than the one that served the page. Without CORS, a malicious site could silently call your bank's API using your cookies. When a browser makes a cross-origin request, it either sends the request with an `Origin` header and checks the response's `Access-Control-Allow-Origin` header (simple requests), or sends a preflight `OPTIONS` request first. The server must explicitly permit the origin. CORS is a **browser** restriction — `curl` and server-to-server calls are unaffected.
**Path parameters** are embedded in the URL path and identify a specific resource: `/users/{id}` → `/users/42`. **Query parameters** appear after `?` and are used for filtering, sorting, pagination, and optional inputs: `/users?role=admin&page=2`. **Request body** carries structured data (JSON, form fields, binary) for creating or updating resources — only POST, PUT, PATCH, and DELETE can have a body. Conventions: use path params to identify resources, query params for modifiers that do not change the resource identity, and body for complex payloads. Never put sensitive data in query params — they appear in server logs and browser history.
JSON (JavaScript Object Notation) is a lightweight, human-readable data-interchange format based on a subset of JavaScript syntax. It supports six types: string, number, boolean, null, array, and object. JSON became the dominant API format because: it is natively parseable in browsers, widely supported in every server language, more compact than XML, easy to read and debug, and maps directly to data structures most languages use. The `Content-Type` for JSON is `application/json`. One limitation: JSON has no native date type — dates are encoded as ISO 8601 strings by convention.
Statelessness means the server stores no client session state between requests. Every HTTP request must contain all context the server needs to process it — authentication credentials, user identity, pagination cursors — typically in headers or the request body. The server never stores a "session" for a client. This makes REST services horizontally scalable: any server instance can handle any request without needing shared session storage. The trade-off is that each request may be slightly larger (it carries credentials every time), and clients are responsible for maintaining their own state.
A resource is any named piece of information that can be addressed and manipulated through its representation. Resources are the fundamental building block of REST — everything is a resource: a user, an order, a document, a collection of products, or even a concept like a "search result". Each resource has a unique URI that identifies it. Clients interact with resources through their representations (JSON, XML, binary), not directly with the underlying data. The server can send different representations of the same resource depending on the client's `Accept` header. Resources should be modelled as nouns (things), not verbs (actions).
The key conventions are: (1) use **nouns not verbs** — `/orders` not `/getOrders`; the HTTP method expresses the action. (2) Use **plural nouns** for collection resources — `/users`, `/products`. (3) Use a **nested path** for sub-resources — `/users/42/orders`. (4) Use **lowercase and hyphens** for readability — `/job-applications` not `/jobApplications`. (5) Do not include the file extension — no `/users.json`. (6) Avoid deep nesting beyond two levels — prefer `/orders?userId=42` over `/users/42/orders/7/items/3`. Following these conventions makes APIs predictable and learnable without documentation.
An endpoint is a specific URL at which an API can be accessed — it represents a combination of a URI and an HTTP method. For example, `GET /users` (list all users) and `POST /users` (create a user) are two distinct endpoints on the same URI. Endpoints are the touchpoints clients interact with. A well-designed REST API groups endpoints around resources rather than actions. Documentation tools like Swagger/OpenAPI model each endpoint as an "operation" with its method, path, parameters, request body schema, and possible response codes.
API versioning lets you make breaking changes without disrupting existing clients. The three common strategies are: (1) **URI versioning** — `/v1/users`, `/v2/users` — most visible and easy to route; the default choice for public APIs. (2) **Query parameter versioning** — `/users?version=2` — less clean, often used internally. (3) **Header versioning** — `API-Version: 2024-01-01` or `Accept: application/vnd.api+json;version=2` — keeps URIs clean but is harder to test in a browser. Stripe uses date-based header versioning. GitHub uses URI versioning. There is no universally correct answer; URI versioning wins for public REST APIs due to cacheability and discoverability.
The base URL (also called base URI or base endpoint) is the root address of an API against which all endpoint paths are resolved. It typically includes the scheme, host, and a path prefix: ``` https://api.example.com/v1 ``` All endpoints are relative to this: `GET https://api.example.com/v1/users`. API documentation and SDKs are configured once with the base URL so consumers do not repeat it. Separate base URLs are usually provided for production and sandbox environments (e.g., `https://sandbox.api.example.com/v1`). A well-structured base URL includes the version so all endpoints under it are versioned together.
**Authentication** answers "who are you?" — it verifies the identity of the caller. Common mechanisms are API keys, JWTs, and OAuth access tokens. **Authorization** answers "are you allowed to do this?" — it checks whether the authenticated identity has permission to access the requested resource. Authentication must happen before authorization. A request can be authenticated (valid JWT) but still receive 403 Forbidden if the user lacks the required role or ownership. Many bugs come from conflating the two: returning 401 when the issue is actually a missing permission (should be 403), or granting access based solely on authentication without checking scope.
An API key is a static secret string issued to a client application (not a user) that is sent with every request — typically in a header (`X-API-Key: abc123`) or query parameter. The server looks up the key in its database to identify and authorise the caller. API keys are simple to implement and ideal for server-to-server communication, but they have no expiry by default, cannot carry claims, and if leaked must be rotated immediately. They should always be transmitted over HTTPS and never embedded in client-side JavaScript. Unlike OAuth tokens, API keys do not encode scope or identity — the server must look them up on every request.
Basic Auth sends credentials as a base64-encoded `username:password` string in the `Authorization` header on every request. ```http Authorization: Basic dXNlcjpwYXNz ``` Base64 is encoding, not encryption — the credentials are essentially in plaintext, so Basic Auth **must** only be used over HTTPS. The server decodes the header, looks up the user, and verifies the password hash. Basic Auth is stateless (no session), simple to implement, and supported natively by browsers and HTTP clients. It is appropriate for machine-to-machine calls with low security requirements or for protecting internal tools behind a gateway. For user-facing login flows, OAuth or token-based auth is preferred.
Bearer token auth sends an opaque or structured token (often a JWT) in the `Authorization` header: ```http Authorization: Bearer eyJhbGciOiJSUzI1NiJ9... ``` The word "Bearer" means "whoever holds this token is granted access." The server validates the token — either by looking it up in a database (opaque tokens) or by verifying the signature and claims (JWTs). Bearer tokens are the standard mechanism for OAuth 2.0. They expire (unlike API keys), can carry identity claims, and can be scoped to specific permissions. Because anyone who obtains the token can use it, they must be transmitted only over HTTPS and stored securely (never in localStorage for high-value tokens).
HTTPS is HTTP transported over TLS (Transport Layer Security). TLS provides three guarantees: **confidentiality** (data is encrypted in transit so eavesdroppers cannot read it), **integrity** (data cannot be tampered with without detection), and **authentication** (the server's identity is verified via its TLS certificate). APIs require HTTPS because they transmit credentials, tokens, and sensitive data. Without it, a network attacker can read API keys and tokens in plaintext (man-in-the-middle) or modify responses. Modern browsers and HTTP clients warn or refuse to send credentials over plain HTTP. Any API handling auth, PII, or payments must enforce HTTPS and redirect HTTP to HTTPS.
A request timeout is a maximum duration the client will wait for the server to respond before giving up and returning an error. Without timeouts, a hung server or slow network can cause the client to block indefinitely, exhausting connection pools and causing cascading failures. Timeouts are typically set at two levels: **connection timeout** (time to establish the TCP connection, e.g., 3 s) and **read timeout** (time to receive the response after connection, e.g., 30 s). Servers should also enforce their own timeouts to free resources. For long-running operations, the pattern of returning 202 Accepted immediately with a polling URL is preferable to increasing timeouts arbitrarily.
API caching stores a copy of a response so subsequent identical requests can be served faster without hitting the origin server. Caching can occur at multiple layers: the **client** (browser or HTTP client cache), a **CDN** or reverse proxy, an **API gateway**, or the **application layer** (Redis, Memcached). The server controls caching behaviour via response headers: `Cache-Control: max-age=60` tells caches to store the response for 60 seconds; `Cache-Control: no-store` prohibits caching entirely. GET requests are cacheable by default; POST, PUT, DELETE are not. Effective caching reduces latency, lowers server load, and improves scalability. The hardest problem in caching is invalidation — knowing when to evict stale data.
An **API** is the interface contract — the set of HTTP endpoints, methods, parameters, and data formats that a service exposes. An **SDK** (Software Development Kit) is a language-specific library that wraps the API and provides a more ergonomic experience for developers. The SDK handles low-level concerns like constructing HTTP requests, serialising/deserialising JSON, managing authentication headers, handling retries and pagination, and exposing typed objects. ```js // Raw API fetch('/users/42', { headers: { Authorization: 'Bearer ' + token } }); // SDK equivalent const user = await client.users.get(42); ``` You can always use an API without its SDK; the SDK just makes development faster and less error-prone.
The Richardson Maturity Model (RMM) describes four levels of REST maturity. **Level 0 (Swamp of POX)** — one endpoint, one HTTP method, everything is a RPC-style POST: `POST /api { "action": "getUser", "id": 42 }`. **Level 1 (Resources)** — separate URIs per resource but still uses only GET/POST: `POST /users/42`. **Level 2 (HTTP Verbs)** — correct HTTP method semantics, proper status codes: `GET /users/42` returns 200, `DELETE /users/42` returns 204. **Level 3 (HATEOAS)** — responses include hypermedia links describing available actions. ```json { "id": 42, "_links": { "self": "/users/42", "orders": "/users/42/orders" } } ``` Most production REST APIs operate at Level 2. Level 3 is rare due to implementation complexity.
HATEOAS (Hypermedia As The Engine Of Application State) is the Level 3 REST constraint: every response includes links to related resources and available actions, so clients discover the API dynamically rather than hardcoding URIs. The idea is that a client should only need to know the entry-point URL and then follow links, similar to a web browser. ```json { "orderId": 7, "_links": { "cancel": { "href": "/orders/7/cancel", "method": "POST" }, "invoice": { "href": "/orders/7/invoice" } } } ``` In practice, most teams skip HATEOAS because: client developers hardcode URLs anyway, the added payload size and server complexity are not justified, tooling support is immature, and the discoverability claim rarely holds for typed API consumers. API documentation (OpenAPI) serves the discoverability goal more practically.
In **HTTP/1.1**, connections are persistent by default (`Connection: keep-alive`) but head-of-line (HOL) blocking means only one request can be in-flight per connection at a time; browsers work around this with 6 parallel connections per origin. **HTTP/2** introduced binary framing and multiplexing: multiple request/response streams are interleaved on a single TCP connection, eliminating HTTP-level HOL blocking and reducing connection overhead. It also adds header compression (HPACK) and server push. However, TCP-level HOL blocking still exists — a lost packet stalls all streams. **HTTP/3** (RFC 9114) replaces TCP with QUIC (over UDP), which implements independent stream handling so a dropped packet only blocks the affected stream. QUIC also makes TLS 1.3 handshake faster (0-RTT). For REST APIs behind a modern CDN/gateway, HTTP/2 is the practical sweet spot today.
PUT replaces the entire resource representation: you must send the complete object, and fields you omit are cleared. PATCH applies a partial update: only the fields in the request body are changed. Two standardised PATCH formats exist: **JSON Merge Patch (RFC 7396)** — send a plain JSON object with only the fields to change; set a field to `null` to delete it. **JSON Patch (RFC 6902)** — send an array of operation objects (`add`, `remove`, `replace`, `move`, `copy`, `test`) for fine-grained, atomic mutation. ```json // JSON Merge Patch PATCH /users/42 { "email": "new@example.com" } // JSON Patch PATCH /users/42 [{ "op": "replace", "path": "/email", "value": "new@example.com" }] ``` JSON Patch is more powerful (supports `test` for optimistic concurrency) but verbose. JSON Merge Patch is simpler and widely adopted.
An idempotency key is a client-generated unique token (UUID) sent with a non-idempotent POST request, allowing the server to detect and deduplicate retries. If the client times out and retries, the server recognises the same key and returns the cached response instead of executing the operation again — preventing duplicate charges, double orders, etc. ```http POST /payments Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000 Content-Type: application/json { "amount": 5000, "currency": "usd" } ``` Server-side implementation: store the idempotency key in a table with the response payload and status. On incoming request, check if the key exists — if so, return the stored response immediately without re-executing. Set a TTL (Stripe uses 24 hours). Use a unique index on the key to prevent race conditions. Scope keys per user to prevent cross-user replay attacks.
Conditional requests let clients avoid downloading unchanged resources and let servers prevent lost updates. The server generates an **ETag** (entity tag — typically a hash or version of the resource) and includes it in the response. On subsequent GET requests, the client sends `If-None-Match: "etag-value"` — the server returns 304 Not Modified (no body) if unchanged, saving bandwidth. For writes, the client sends `If-Match: "etag-value"` with PUT/PATCH — the server processes the request only if the ETag still matches, returning 412 Precondition Failed if it has changed since the client last fetched it (optimistic concurrency). `Last-Modified` / `If-Modified-Since` works analogously using timestamps instead of hashes, but ETags are more precise. ```http GET /articles/1 → ETag: "abc123" GET /articles/1 If-None-Match: "abc123" → 304 Not Modified ```
`Cache-Control` is the primary HTTP caching directive. Key values: **`max-age=N`** — cache the response for N seconds from the time it was fetched. **`s-maxage=N`** — like max-age but applies only to shared caches (CDN, proxy); overrides max-age for them. **`no-cache`** — the response can be stored but must be revalidated with the origin server before use (sends conditional request). **`no-store`** — the response must never be stored at all (use for sensitive data). **`must-revalidate`** — once stale, must revalidate before serving; never serve stale. **`stale-while-revalidate=N`** — serve a stale response immediately while fetching a fresh one in the background for up to N seconds. **`private`** — only the end-user browser can cache it (not a CDN). **`public`** — any cache may store it, including CDNs.
Three main strategies: **Offset/limit** (`?offset=40&limit=20`) — simple to implement, allows random page jumps, but suffers from data drift (items inserted/deleted during pagination cause rows to be skipped or duplicated) and is slow on large tables (database must scan all preceding rows). **Cursor-based** (`?cursor=eyJpZCI6NDB9&limit=20`) — encodes a pointer to the last seen item, typically the primary key; no drift, efficient index seeks, but cannot jump to arbitrary pages. **Keyset pagination** — a variant of cursor using composite values (e.g., `?after_id=40`) tied to a sort column with an index; the most performant for large datasets. At scale (millions of rows), offset pagination with large offsets becomes untenable; cursor or keyset is the only viable option.
A cursor encodes the position of the last returned item as an opaque, URL-safe token (base64 of JSON `{"id": 42, "created_at": "2024-01-15T10:00:00Z"}`). The client passes this in the next request; the server decodes it and uses it as a WHERE clause filter. ```sql SELECT * FROM jobs WHERE (created_at, id) < (:cursor_ts, :cursor_id) ORDER BY created_at DESC, id DESC LIMIT 20; ``` Requirements: (1) **Sort stability** — the sort key(s) must be stable and part of a unique index, otherwise rows can appear in different positions between pages. Adding the primary key as a tiebreaker ensures uniqueness. (2) The cursor must encode all sort columns. (3) Never expose raw database IDs in cursors — always encode/encrypt them. Respond with `next_cursor: null` when on the last page.
Common conventions: `?status=active&role=admin` for filtering, `?sort=created_at&order=desc` for sorting. Never pass sort column names directly to SQL — maintain an allowlist of sortable fields and map them to column names server-side to prevent SQL injection and column enumeration. ```js const SORTABLE = { created_at: 'jobs.created_at', title: 'jobs.title' }; const col = SORTABLE[req.query.sort] ?? 'jobs.created_at'; ``` For complex filters, some APIs accept an object notation (`?filter[status]=active`) or a simple query DSL. Validate and sanitise every parameter. Avoid filtering on non-indexed columns — add database indexes for all commonly filtered fields. Document which filters are supported to prevent clients from relying on undocumented behaviour that could change.
Field masks let clients request only a subset of fields in the response: `GET /users?fields=id,name,email`. This reduces payload size and over-fetching — useful for mobile clients on slow connections or list views that need minimal data. ```json GET /orders?fields=id,total,status → [{ "id": 1, "total": 99.99, "status": "shipped" }] ``` Implement field masks when: responses are large (many fields), the API is used heavily from mobile, or different consumers need drastically different projections. The server parses the `fields` parameter, validates each field name against an allowlist, and projects the query accordingly. Google's APIs use a `fields` query parameter; GraphQL solves this problem more elegantly. Avoid implementing field masks for small responses — the overhead of parsing and projecting outweighs the savings.
Rate limiting protects APIs from abuse and ensures fair usage. Common algorithms: **Token bucket** — a bucket holds up to `capacity` tokens; requests consume one token each; tokens refill at a fixed rate. Allows short bursts up to capacity. **Sliding window counter** — counts requests in a rolling time window (e.g., last 60 s) using a Redis sorted set with timestamps; smoother than fixed windows which allow double the rate at window boundaries. **Fixed window counter** — simplest; reset the counter every minute. Vulnerable to boundary bursts. Clients must receive standard headers so they can back off correctly: ```http X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 42 X-RateLimit-Reset: 1716912000 Retry-After: 58 ``` Return 429 Too Many Requests when the limit is exceeded.
OAuth 2.0 is an authorisation framework for delegated access. **Authorization Code** flow: user is redirected to the auth server, approves access, and the server returns a short-lived `code` to the redirect URI; the backend exchanges the code for an access token + refresh token server-to-server. The code is useless without the client secret. **Client Credentials** flow: no user involved; the client authenticates with its own credentials (`client_id` + `client_secret`) to get an access token — used for machine-to-machine APIs. **Implicit** flow (deprecated — tokens exposed in URL fragment). **PKCE** (Proof Key for Code Exchange): an extension to Authorization Code for public clients (SPAs, mobile apps) that cannot keep a client secret; the client generates a `code_verifier`, hashes it to a `code_challenge` sent with the auth request, then sends the original verifier at token exchange time. This prevents code interception attacks without needing a client secret.
A JWT (JSON Web Token) is three base64url-encoded segments separated by dots: `header.payload.signature`. The **header** contains the token type and signing algorithm (`{ "alg": "RS256", "typ": "JWT" }`). The **payload** contains claims — statements about the subject: registered claims (`iss` issuer, `sub` subject/user ID, `aud` audience, `exp` expiry timestamp, `iat` issued-at, `jti` JWT ID for revocation), and custom claims (`role`, `email`). The **signature** is created by signing `base64(header) + "." + base64(payload)` with a secret (HMAC) or private key (RSA/EC). ``` eyJhbGciOiJSUzI1NiJ9 .eyJzdWIiOiJ1c2VyXzQyIiwiZXhwIjoxNzE2OTEyMDAwfQ .SflKxwRJSMeKKF2QT4fw... ``` The payload is not encrypted — only signed. Never store secrets in JWT claims.
Full JWT validation must check: (1) **Signature** — verify using the correct key; for RS256/ES256, fetch the public key from the issuer's JWKS endpoint and cache it. (2) **`exp`** — reject if the current time is past expiry. (3) **`nbf`** (not-before) — reject if present and current time is before it. (4) **`iss`** — verify it matches the expected issuer. (5) **`aud`** — verify your service is the intended audience. Common mistakes: (A) accepting `"alg": "none"` — an attacker can strip the signature entirely; explicitly reject `none`. (B) using the client-provided algorithm — always pin to expected algorithms server-side. (C) trusting the payload without verifying the signature first. (D) not checking `exp` — expired tokens remain valid forever. Use a well-tested library (jose, jsonwebtoken) — never roll your own JWT parser.
Access tokens are short-lived (15 min – 1 hr) to limit damage if stolen. Refresh tokens are long-lived credentials the client uses to obtain new access tokens without re-authenticating. **Rotation**: every time a refresh token is used, it is invalidated and a new one is issued. If an attacker uses a stolen refresh token, the legitimate client's next use will fail (the token was already rotated), alerting the system to revoke the entire family. **Absolute expiry**: refresh tokens have a hard maximum lifetime (e.g., 90 days) after which the user must log in again regardless of activity. **Revocation via blocklist**: since JWTs are self-contained, invalidating them before `exp` requires storing revoked token IDs (`jti`) in a fast store (Redis). On each token use, check the blocklist. For high-traffic APIs, this adds latency — hence short access token lifetimes reduce how often the blocklist must be consulted.
Use **API keys** for: machine-to-machine integrations where a human user is not involved; partners/third parties accessing your API; scenarios where simplicity matters more than fine-grained claims; when you need easy revocation (just delete the key from the database). Use **JWTs** for: user authentication where the token must carry identity and permission claims; distributed microservices where local signature validation avoids a network call to an auth server on every request; short-lived, scoped access tokens within an OAuth 2.0 flow. JWTs are self-contained but harder to revoke before expiry. API keys are opaque but require a database lookup on every request. A hybrid approach — API keys at the gateway that are exchanged for JWTs internally — combines the simplicity of keys with the performance of JWT validation.
HMAC request signing proves that a request was created by the holder of a secret key and has not been tampered with in transit. Instead of sending the raw secret, the client derives a signature over the canonical request and sends it. AWS Signature V4 process: (1) Build a **canonical request** — method, URI, sorted query string, signed headers, and SHA-256 hash of the body. (2) Build a **string to sign** — algorithm identifier, timestamp, credential scope, and SHA-256 hash of the canonical request. (3) Derive a **signing key** by HMAC-chaining: `HMAC(HMAC(HMAC(HMAC(secret, date), region), service), "aws4_request")`. (4) Compute the signature and add it to the `Authorization` header. Signed requests have a short expiry window (±5 minutes) to prevent replay attacks. This pattern is more secure than bearer tokens for high-value server-to-server calls because the secret never travels in the request.
A CORS preflight is an automatic `OPTIONS` request the browser sends before a cross-origin request that uses non-simple methods (PUT, PATCH, DELETE) or custom headers. The browser asks the server "is this request allowed?" ```http OPTIONS /api/users HTTP/1.1 Origin: https://app.example.com Access-Control-Request-Method: DELETE Access-Control-Request-Headers: Authorization ``` The server must respond with the appropriate CORS headers: ```http Access-Control-Allow-Origin: https://app.example.com Access-Control-Allow-Methods: GET, POST, DELETE Access-Control-Allow-Headers: Authorization Access-Control-Max-Age: 86400 ``` `Access-Control-Max-Age` tells the browser how long to cache the preflight result, avoiding repeated OPTIONS requests. If the server does not respond correctly, the browser blocks the actual request. A wildcard `*` for `Allow-Origin` cannot be used with `credentials: 'include'`.
**CORS** (Cross-Origin Resource Sharing) is a security policy enforced by browsers that prevents a script on `evil.com` from reading responses from `bank.com`. It does not prevent the request from being sent — it only prevents the response from being read by the attacking script. **CSRF** (Cross-Site Request Forgery) exploits the fact that browsers automatically attach cookies to cross-origin requests. An attacker tricks a logged-in user into visiting a page that submits a forged request to `bank.com` using the user's cookies. CORS does not protect against CSRF for simple requests (GET, POST with `application/x-www-form-urlencoded`). CSRF defenses: double-submit cookie pattern, `SameSite=Strict/Lax` cookie attribute (the modern standard), or CSRF tokens in request bodies. REST APIs using Bearer tokens in `Authorization` headers are naturally CSRF-resistant because browsers do not auto-send `Authorization` headers cross-origin.
Content negotiation is the mechanism by which a client and server agree on the format, language, and encoding of a response. The client expresses its preferences via `Accept` headers, and the server picks the best match. Three dimensions: **Format** (`Accept: application/json, application/xml;q=0.8`) — server responds in JSON if it supports it, falls back to XML. **Language** (`Accept-Language: en-US, fr;q=0.7`) — server returns English content preferably, French as fallback. **Encoding** (`Accept-Encoding: gzip, br`) — server compresses the response if it can. The server sets `Vary: Accept, Accept-Language` to tell caches that different clients may receive different responses for the same URL. If no acceptable format is available, the server returns 406 Not Acceptable.
HTTP compression reduces response body size. The client signals supported compression algorithms via `Accept-Encoding: gzip, br, deflate`. The server compresses the body and sets `Content-Encoding: gzip` (or `br` for Brotli). The client decompresses before using the response. Brotli (br) achieves ~20% better compression than gzip for text. Gzip is universal; Brotli requires HTTPS in some clients. **When NOT to compress**: small responses (<1 KB) — the compression header overhead and CPU cost exceed savings. Binary formats (images, video, pre-compressed files like ZIP/GZIP) — compressing already-compressed data can increase size. High-frequency, latency-sensitive API calls — the compression/decompression CPU cost may outweigh bandwidth savings. For large JSON responses (>10 KB), gzip/brotli typically reduces size by 60–80%.
**Long polling**: client sends a request, server holds it open until new data arrives or a timeout, then responds. Client immediately reconnects. Latency: 1 RTT per event. Works over HTTP/1.1 with no special infrastructure but is wasteful for high-frequency updates. **Server-Sent Events (SSE)**: client opens a persistent HTTP GET connection; server pushes `text/event-stream` events indefinitely. Unidirectional (server→client). Automatic reconnection, event IDs, and `Last-Event-ID` header built in. Works through HTTP/2. Best for dashboards, feeds, notification streams. **WebSockets**: full-duplex binary/text channel over a single TCP connection (upgraded from HTTP). Bidirectional. Best for chat, collaborative editing, games, or any scenario requiring high-frequency bidirectional messaging. Has higher infrastructure complexity (sticky sessions or pub/sub needed). SSE is the pragmatic choice for most "push updates to browser" use cases.
SSE uses a plain HTTP response with `Content-Type: text/event-stream`. The server keeps the connection open and writes events in a specific text format: ``` id: 42 event: price-update data: {"symbol":"AAPL","price":182.50} ``` Each event block is separated by a blank line. The `id` field sets the `Last-Event-ID`. If the connection drops, the browser automatically reconnects (after `retry:` ms, defaulting to 3 s) and sends `Last-Event-ID` in the request header, allowing the server to replay missed events. SSE is limited to UTF-8 text. For binary data, base64-encode it. In HTTP/1.1, browsers allow only 6 connections per origin and SSE occupies one permanently — HTTP/2 multiplexing eliminates this limit. Node.js implementation: keep the response writable, write events, handle `close` event to clean up.
WebSocket upgrades an HTTP/1.1 connection to a persistent, bidirectional TCP socket. The client initiates with a special HTTP Upgrade request: ```http GET /ws HTTP/1.1 Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Sec-WebSocket-Version: 13 ``` The `Sec-WebSocket-Key` is a random base64 nonce. The server concatenates it with the magic GUID `258EAFA5-E914-47DA-95CA-C5AB0DC85B11`, SHA-1 hashes it, base64-encodes it, and returns it in `Sec-WebSocket-Accept`. The client verifies the accept value: ```http HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= ``` After the 101 response, the connection is no longer HTTP — both sides communicate using WebSocket frames. The handshake prevents non-WebSocket servers from accidentally accepting connections.
REST uses multiple endpoints each returning a fixed shape; GraphQL exposes a single endpoint and the client specifies exactly what data it needs in a typed query. **Over-fetching**: REST returns the full resource object even if the client needs only 2 fields; GraphQL returns only requested fields. **Under-fetching / N+1**: to display a user's posts and their authors in REST requires multiple round trips (`GET /users/1`, `GET /users/1/posts`, `GET /users/2` …); GraphQL resolves this in one query but can still suffer N+1 at the resolver level (solved with DataLoader). **Schema**: GraphQL is strongly typed with an introspectable schema; REST relies on OpenAPI for typing. **Tooling**: GraphQL has excellent dev tools (GraphiQL, code generation); REST has OpenAPI/Swagger. **Choose REST** when: the API is public, resources map naturally to CRUD, HTTP caching at the CDN level matters (GraphQL POST requests are not cacheable). **Choose GraphQL** when: many different clients with different data needs consume the same backend.
gRPC is an RPC framework by Google using Protocol Buffers (protobuf) over HTTP/2. Key differences: **Protocol**: REST uses text-based JSON over HTTP/1.1 or HTTP/2; gRPC uses binary protobuf over HTTP/2 — typically 5–10× smaller payloads and faster serialisation. **Schema**: gRPC requires `.proto` files defining services and messages — strongly typed, schema-first; REST schemas are optional (OpenAPI). **Streaming**: gRPC has first-class support for server, client, and bidirectional streaming; REST requires SSE or WebSockets for streaming. **Browser support**: gRPC is not natively supported in browsers without gRPC-Web proxy; REST works everywhere. **Tooling**: gRPC auto-generates client code in all major languages. **Choose gRPC** for: internal microservice-to-microservice communication where performance matters, streaming data pipelines, polyglot environments. **Choose REST** for: public APIs, browser clients, simple CRUD services where ecosystem and familiarity outweigh performance.
RFC 7807 defines a standard JSON format for HTTP API error responses, avoiding inconsistent ad-hoc error schemas across APIs. Fields: `type` (URI identifying the error type, links to docs), `title` (human-readable summary, stable), `status` (HTTP status code), `detail` (human-readable, request-specific explanation), `instance` (URI of the specific occurrence, useful for support). ```json { "type": "https://api.example.com/errors/validation", "title": "Validation Failed", "status": 422, "detail": "The email field must be a valid email address.", "instance": "/requests/abc123", "errors": [{ "field": "email", "message": "Invalid format" }] } ``` `Content-Type: application/problem+json`. You can extend the schema with custom fields. This standard makes errors machine-readable and consistent across your entire API surface.
Batch requests allow clients to send multiple operations in a single HTTP request, reducing round trips. Two patterns: (1) **Array body** — the endpoint accepts an array and processes items together: ```http POST /users/batch-delete [{ "id": 1 }, { "id": 2 }, { "id": 3 }] ``` (2) **Batch endpoint** — a single `POST /batch` accepts an array of sub-requests, each specifying their own method, path, and body (used by Facebook Graph API, Microsoft 365). The response is an array of individual results with their own status codes. Trade-offs: batching improves throughput but makes error handling complex (partial success — some items may fail). Consider returning a 207 Multi-Status response for partial success. Limit batch size (e.g., max 100 items) to prevent abuse. Batch operations that are logically atomic should run in a database transaction.
Webhooks are server-initiated HTTP POST requests to a URL the client registers, triggered by events. Delivery guarantees: use **at-least-once delivery** — persist the event to a queue before attempting delivery, retry with exponential backoff on non-2xx responses or timeouts (e.g., 1 min, 5 min, 30 min, 2 hr, 24 hr). After N failures, mark as failed and alert. Consumers must be **idempotent** — the same event may arrive more than once. Security via HMAC signature verification: ```http X-Webhook-Signature: sha256=a8b9c... ``` The server computes `HMAC-SHA256(secret, raw_request_body)` and includes it. The receiver recomputes the HMAC and compares using a constant-time comparison to prevent timing attacks. Include a timestamp in the payload and reject events older than 5 minutes to prevent replay attacks. Never compute HMAC on the parsed JSON — always use the raw bytes.
A clear deprecation strategy prevents breaking clients without warning. Steps: (1) Announce deprecation in docs, changelog, and developer blog with a concrete sunset date (minimum 6–12 months for public APIs). (2) Add a `Deprecation` header to all responses from deprecated endpoints: ```http Deprecation: Sat, 31 Dec 2025 23:59:59 GMT Sunset: Sat, 31 Dec 2025 23:59:59 GMT Link: <https://api.example.com/v2/users>; rel="successor-version" ``` (3) Log usage of deprecated endpoints so you know which clients are still using them — contact those teams directly. (4) Keep the old version running until usage drops to zero or the sunset date passes. (5) Return 410 Gone (not 404) after sunset to differentiate "never existed" from "intentionally removed". (6) Provide a migration guide with example before/after code.
OpenAPI (formerly Swagger) is a language-agnostic specification for describing HTTP APIs in YAML or JSON. A spec file describes: **paths** (endpoints, HTTP methods, parameters, request bodies, responses with status codes), **components** (reusable schemas, security schemes, parameters, response objects), **security** (OAuth 2.0, API key, Bearer token schemes). Tools built on OpenAPI: Swagger UI (interactive browser), Redoc (documentation), code generators (openapi-generator for TypeScript, Java, Python SDKs), contract testing (Prism mock server, Schemathesis). Maintaining an accurate OpenAPI spec enables auto-generation of client SDKs, documentation, and server validation. Best practice: generate the spec from code annotations rather than maintaining it separately to prevent drift.
An API gateway sits in front of your backend services and provides a unified entry point with cross-cutting concerns: **Routing** — map external routes to internal service URLs. **Authentication/Authorization** — validate API keys, JWTs, and OAuth tokens before requests reach services. **Rate limiting** — enforce per-client request quotas. **SSL termination** — handle TLS at the gateway; internal services communicate over plain HTTP. **Request/response transformation** — add, remove, or modify headers; translate between protocols. **Observability** — log all requests with trace IDs, emit metrics (latency p50/p99, error rates) and traces to monitoring systems. **Caching** — cache upstream responses at the gateway. **Load balancing** — distribute traffic across service instances. Examples: AWS API Gateway, Kong, Traefik, nginx. An API gateway reduces the amount of cross-cutting code each service must implement.
The BFF pattern creates a separate API layer tailored to each client type (web, iOS, Android) rather than exposing a single general-purpose API. Each BFF aggregates calls to multiple downstream microservices, shapes the response to exactly what its client needs, and handles client-specific concerns (auth flows, response formats, caching strategies). ``` Web App → /bff/web → [UserService, OrderService, RecommendationService] iOS App → /bff/ios → [UserService, OrderService] (leaner payload) ``` Benefits: eliminates over-fetching/under-fetching for each client; teams can optimise independently; adds an isolation layer so internal service changes do not break client contracts. Drawbacks: code duplication across BFFs (mitigated by shared libraries), additional deployment unit to maintain. Popular at organisations with multiple client types with very different data needs (Netflix, SoundCloud).
Consumer-driven contract testing flips the traditional integration testing model: instead of testing against a real running provider, the consumer generates a **pact** — a file describing the interactions it expects. The provider then verifies it can satisfy all those interactions independently, without needing the consumer to run. Pact is the most widely used framework. Workflow: (1) Consumer test generates a pact JSON file specifying requests and expected response shapes. (2) Pact file is published to a Pact Broker. (3) Provider CI pulls the pact and runs `pact:verify` — starting a mock consumer and asserting the provider's real responses match. If verification passes, the provider can deploy safely knowing it won't break the consumer. This enables independent deployment of microservices without a full end-to-end integration test environment. Critical for microservice architectures with many teams.
Idempotent writes prevent duplicate data when retries occur. Three patterns: (1) **Upsert (INSERT ON CONFLICT)** — insert the record; if a unique constraint is violated, update instead. Safe to retry because the end state is the same. ```sql INSERT INTO users (email, name) VALUES ($1, $2) ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name; ``` (2) **Conditional write** — only update if the current version matches: `UPDATE orders SET status=$2 WHERE id=$1 AND version=$3`. Returns 0 rows affected if already updated; the API can return the current state. (3) **Idempotency key table** — store the operation key with its result; check before executing. Use database transactions to ensure the check and the insert are atomic. PostgreSQL advisory locks or `SELECT ... FOR UPDATE` prevent race conditions between concurrent retries hitting the same idempotency key.
Multipart uploads send a body with multiple parts separated by a boundary string, allowing files and JSON fields in the same request. ```http POST /uploads HTTP/1.1 Content-Type: multipart/form-data; boundary=----boundary ------boundary Content-Disposition: form-data; name="file"; filename="photo.jpg" Content-Type: image/jpeg <binary data> ------boundary-- ``` For large files, avoid buffering the entire file in memory — use streaming. Node.js: pipe `req` through a multipart parser (busboy, formidable) directly to object storage (S3 presigned URL upload). For very large files (>100 MB), use **chunked/resumable upload**: split the file into chunks client-side; the client uploads each chunk to `/uploads/{uploadId}/parts/{partNumber}`; server reassembles or uses S3 Multipart Upload API. AWS S3 multipart upload requires a minimum 5 MB per part (except the last). Return a resumable upload URL so clients can recover from interrupted uploads.
These are standardised JSON formats that implement HATEOAS-style hypermedia. **HAL (Hypertext Application Language)** — `application/hal+json`: resources have `_links` (relationships and URIs) and `_embedded` (included sub-resources). ```json { "id": 1, "_links": { "self": { "href": "/orders/1" } }, "_embedded": { "items": [{ "sku": "ABC" }] } } ``` **JSON:API** — `application/vnd.api+json`: more opinionated; resources have a `type` and `id`, `attributes` for data, `relationships` for associations, and `included` for sideloaded related resources. It specifies compound document format, sparse fieldsets, filtering, sorting, and pagination conventions. JSON:API eliminates the need to design these conventions yourself. Neither format is dominant in most new APIs — most teams use plain JSON with OpenAPI for documentation. They are most valuable for libraries and frameworks that auto-generate client code from the hypermedia.
Every API request should emit a structured log entry (JSON) and optionally a distributed trace span. Minimum fields per request: ```json { "trace_id": "4bf92f3577b34da6", "span_id": "00f067aa0ba902b7", "method": "POST", "path": "/orders", "status": 201, "duration_ms": 43, "user_id": "user_42", "ip": "203.0.113.5", "user_agent": "MyApp/2.0", "error": null } ``` Emit metrics derived from logs: request rate, error rate (4xx/5xx), p50/p95/p99 latency, per-endpoint. Set up dashboards for error rate spike alerts and latency degradation. Use distributed tracing (OpenTelemetry) to propagate `traceparent` across service boundaries so you can reconstruct the full call graph. Never log request bodies containing PII or credentials — log only non-sensitive identifiers.
These three terms form a hierarchy for reliability commitments. **SLI (Service Level Indicator)** is a measured metric: the actual value of reliability being tracked. Common API SLIs: availability (% of successful requests / total requests), latency (p99 response time), error rate (5xx rate), and throughput (requests/second). **SLO (Service Level Objective)** is an internal target for an SLI: "p99 latency < 200 ms, error rate < 0.1%". SLOs define what "good" looks like and drive engineering priorities via error budgets (if the SLO is 99.9% availability, you have 8.7 hours of allowed downtime per year; spending the error budget means you must slow feature work and focus on reliability). **SLA (Service Level Agreement)** is the contractual, customer-facing commitment with financial penalties for breach: "We guarantee 99.9% uptime or issue a service credit." SLOs should be more ambitious than SLAs to give a safety buffer.
Core resources: `users`, `posts`, `comments`, `likes`, `follows`, `feeds`, `notifications`. Model relationships as sub-resources where ownership is clear (`/users/{id}/posts`) and as top-level resources where the relationship itself has attributes (`/follows` with `follower_id`, `followee_id`, `created_at`). For the **feed** — a ranked, personalised stream of posts — do not model it as `/users/{id}/posts` (that is a user's own posts); instead expose `/feeds/home` as a separate resource generated asynchronously by a fan-out service. Avoid deep nesting beyond two levels; use query parameters for cross-cutting filters (`/posts?author_id=42`). **Pagination**: never use offset pagination for feeds — social feeds are append-heavy and offset pagination skips/duplicates items constantly. Use cursor pagination keyed on `(score DESC, post_id DESC)` with a stable composite index. For `likes` and `follow` counts, store denormalised counters (updated via DB triggers or event-driven workers) and return them with the resource to avoid expensive COUNT queries on every request. Version the API from day one (`/v1/`) — social graph APIs are notoriously hard to migrate once clients proliferate.
Two-phase commit (2PC) across microservices creates tight coupling and availability risk — if the coordinator fails, all participants are blocked. The **saga pattern** splits a distributed transaction into a sequence of local transactions, each publishing events that trigger the next step. Two flavours: **Choreography** — each service listens to events and reacts (OrderService publishes `OrderCreated` → PaymentService processes it → publishes `PaymentProcessed` → InventoryService reserves stock). No central coordinator, but hard to track state and debug. **Orchestration** — a saga orchestrator (a dedicated service or workflow engine like AWS Step Functions / Temporal) explicitly calls each service and tracks state. On failure, the orchestrator issues **compensating transactions** in reverse order (refund payment, release inventory, cancel order). Compensation must be idempotent. Key challenges: (1) compensating transactions may fail — use exponential retry with a dead-letter queue. (2) Intermediate states are visible — design the UI to handle "payment processing" state. (3) Isolation between concurrent sagas requires careful design — consider pessimistic locking or semantic locks for shared resources.
Optimistic concurrency control prevents the "lost update" problem where two clients read the same resource, both modify it, and the second write silently overwrites the first. Full flow: (1) `GET /articles/42` → server returns the article with `ETag: "v5"`. (2) Client A and Client B both read this response. (3) Client A sends `PATCH /articles/42` with `If-Match: "v5"` and a new body. Server checks — ETag is still "v5", update proceeds, new ETag is `"v6"`, response includes `ETag: "v6"`. (4) Client B sends `PATCH /articles/42` with `If-Match: "v5"`. Server checks — current ETag is "v6", mismatch → returns **412 Precondition Failed**. Client B must re-fetch, merge changes, and retry. ```http PATCH /articles/42 HTTP/1.1 If-Match: "v5" Content-Type: application/json { "title": "Updated Title" } ``` The ETag must change on every write — use a version counter (`version INT`) or a hash of the serialised resource. Store the version in the same row as the data and update atomically: `UPDATE articles SET title=$1, version=version+1 WHERE id=$2 AND version=$3`.
Opaque tokens are random strings that mean nothing on their own — the server must call the auth server's **introspection endpoint** (RFC 7662) on every request to validate them and retrieve claims. This adds 10–50 ms of network latency per request and creates a potential single point of failure. However, they are instantly revocable: delete the token from the database, and the next introspection check fails. JWTs are self-contained — the gateway validates the signature using the issuer's public key (fetched once from the JWKS endpoint and cached). Validation is CPU-bound, not network-bound: ~0.5 ms for RS256 verification. The trade-off: JWTs are not revocable before their `exp` unless the gateway also checks a blocklist (Redis lookup, ~1 ms). Hybrid pattern used at scale: issue short-lived JWTs (15 min) with a refresh token. The gateway validates JWTs locally. Compromised tokens expire quickly; the refresh token is revocable. For extremely sensitive actions (password change, high-value transactions), always introspect or require re-authentication regardless of JWT validity.
Standard Bearer tokens are "holder-of-token" — anyone who obtains the token can use it. DPoP (RFC 9449) binds an access token to a specific public key the client proves possession of at each use. Flow: (1) The client generates an ephemeral asymmetric key pair. (2) On token request, client sends a DPoP proof JWT (signed with the private key, containing `htm` = HTTP method, `htu` = URL, `iat` = timestamp, `jti` = unique nonce) in the `DPoP` header. (3) Auth server issues a DPoP-bound access token containing `cnf.jkt` = SHA-256 thumbprint of the client's public key. (4) On every API request, client generates a fresh DPoP proof and sends it alongside the Bearer token. (5) The resource server verifies: DPoP signature is valid, `htm`/`htu` match the current request, `iat` is recent (prevents replay), and `cnf.jkt` in the token matches the public key in the DPoP proof. An attacker who steals the access token cannot use it without the private key. DPoP is increasingly required for high-security OAuth flows (FAPI 2.0 for open banking). The private key should live in a hardware-backed key store (iOS Secure Enclave, Android Keystore).
gRPC defines four call types in a `.proto` service definition. **Unary** (`rpc GetUser(Request) returns (Response)`) — standard request/response; use for all simple CRUD operations. **Server-streaming** (`rpc StreamPrices(Request) returns (stream Quote)`) — server sends a sequence of messages for one request. Use for: stock tickers, live dashboards, log streaming, large dataset exports where the response would be too large to buffer. **Client-streaming** (`rpc UploadChunks(stream Chunk) returns (UploadResult)`) — client sends a sequence of messages, server responds once. Use for: chunked file uploads, batch ingestion of sensor readings, streaming ML inference inputs. **Bidirectional streaming** (`rpc Chat(stream Message) returns (stream Message)`) — both sides send independent streams simultaneously over a single HTTP/2 stream. Use for: real-time chat, collaborative editing, distributed system coordination (leader election heartbeats), video game state sync. The key constraint: gRPC streaming requires persistent HTTP/2 connections and does not work natively in browsers (requires gRPC-Web proxy which only supports server-streaming). For browser clients needing bidirectional communication, WebSockets or SSE over REST are more practical.
Protobuf's schema evolution rules allow producers and consumers to evolve their `.proto` files independently without breaking each other, as long as field numbers are never reused. Rules: (1) **Never change a field's number** — field numbers, not names, identify fields in the binary encoding. Renaming a field is safe; renumbering is catastrophic. (2) **Never remove a field number without reserving it** — use `reserved 5;` and `reserved "old_field_name";` to prevent future reuse. (3) **Adding new optional fields** is always backward compatible — old parsers ignore unknown fields (proto3 preserves them by default). (4) **Removing fields** is safe only if the field number and name are reserved. (5) **Changing field types** is only safe for wire-compatible types (e.g., `int32` → `int64` is safe; `string` → `bytes` is safe; `int32` → `bool` is not). (6) In proto3, all singular fields are optional with zero-value defaults — there is no `required`. **Forward compatibility** (old code reading new messages): unknown fields are preserved. **Backward compatibility** (new code reading old messages): missing new fields get their default values. These rules make protobuf extremely robust for long-lived APIs.
HTTP/1.1 REST clients work around head-of-line blocking by opening multiple TCP connections (browsers open 6 per origin; HTTP clients like `axios` use connection pools with configurable limits). Each connection has its own TLS handshake overhead. Connection pool exhaustion under load causes queuing. HTTP/2 multiplexes all requests over a **single TCP connection** using independent streams — no head-of-line blocking at the HTTP layer. This changes several operational considerations: (1) Connection pool settings for HTTP/2 REST clients need far fewer connections (1–4 vs 20–50 for HTTP/1.1). (2) Server push can be used to proactively send related resources (e.g., push CSS/JS with HTML — rarely useful for JSON APIs). (3) Header compression (HPACK) reduces overhead for APIs with large or repetitive headers (Auth tokens, Content-Type). (4) HTTP/2 server settings like `SETTINGS_MAX_CONCURRENT_STREAMS` (default 100) become the bottleneck instead of connection count. (5) Load balancers must support HTTP/2 or connection multiplexing breaks — ensure your API gateway and upstream connections both speak HTTP/2.
A payment API must never charge a customer twice due to retries. Full design: (1) **Client generates a UUID v4** as an idempotency key per payment attempt and includes it in every request. (2) **Server stores** `(idempotency_key, user_id, created_at, status, response_json)` in a dedicated table with a unique index on `(idempotency_key, user_id)`. (3) On receipt, the server wraps the check-and-insert in a serialisable transaction: if the key already exists, return the stored response immediately without processing. (4) **While processing**, mark the record `status=processing` before calling the payment provider — if the server crashes mid-flight, the next retry finds `status=processing` and must decide whether to probe the payment provider for the outcome (idempotent provider check) or wait and retry later. (5) On completion, store the full response and mark `status=complete`. (6) **TTL**: expire idempotency keys after 24 hours (Stripe) or 7 days — after TTL, the same key will create a new payment. Communicate the TTL clearly in your docs. (7) **Scope by user_id** to prevent cross-user key collisions. (8) Return the cached response with the same HTTP status as the original — 201 for the original creation, 200 for subsequent duplicates so the client can distinguish first-time vs cached responses if needed.
At scale, two serious strategies exist. **URI versioning** (`/v1/`, `/v2/`) is operationally simple — routes map to different code paths or deployments, easy to test, cacheable, discoverable. Downside: encourages "big bang" version bumps and URI proliferation; once on v3, you carry v1 and v2 forever. **Date-based API-Version header versioning** (Stripe, Twilio): `API-Version: 2024-01-01`. Each breaking change gets a date. The server maintains a **transformation layer** — incoming requests are normalised to the latest internal format; responses are downgraded for the requested version. Clients opt in to new versions by bumping their API-Version header. This avoids URI explosion but requires maintaining a version transformation matrix. **Migration strategy**: (1) publish migration guide weeks before the new version. (2) Use the `Sunset` and `Deprecation` response headers on old version routes. (3) Log per-client API version usage — identify who is on old versions and contact them. (4) Run old and new versions in parallel for at least 6 months. (5) Set a hard sunset date and enforce it. (6) Consider API-Version as the default in new SDKs so new integrators always start on the latest version.
In a GraphQL resolver, fetching `posts` and then their `author` for each post naively executes one SQL query for posts and then N queries for authors — the N+1 problem. DataLoader solves this with **batching** and **caching** within a single request's lifecycle. When a resolver calls `userLoader.load(userId)`, DataLoader defers the actual fetch. At the end of the current event loop tick, it collects all pending keys and calls the batch function once with all of them. ```js const userLoader = new DataLoader(async (userIds) => { const users = await db.query( `SELECT * FROM users WHERE id = ANY($1)`, [userIds] ); return userIds.map(id => users.find(u => u.id === id)); }); ``` This transforms N+1 queries into 2 queries (1 for posts + 1 for all unique authors). The in-request cache means the same userId is only fetched once per request even if referenced multiple times. For deeply nested fields, each level gets batched separately, resulting in O(depth) queries instead of O(N^depth). DataLoader cannot replace JOIN-based fetching for very hot paths — for truly performance-critical list views, custom resolvers with explicit JOINs are more efficient.
OWASP API Security Top 10 (2023): (1) **Broken Object Level Authorization (BOLA/IDOR)** — failing to check that the authenticated user owns the requested resource ID. Most common, most impactful. (2) **Broken Authentication** — weak token generation, missing expiry, insecure storage. (3) **Broken Object Property Level Authorization** — exposing sensitive fields the caller should not see (OWASP 2023 added this; previously mass assignment was separate). (4) **Unrestricted Resource Consumption** — no rate limits, no max page size, no request body size limit — enables DoS. (5) **Broken Function Level Authorization** — admin endpoints accessible to regular users due to missing role checks. (6) **Unrestricted Access to Sensitive Business Flows** — automation of business processes that should be limited (scalping, credential stuffing). (7) **Server Side Request Forgery (SSRF)** — API fetches a user-supplied URL; attacker targets internal metadata APIs. (8) **Security Misconfiguration** — debug endpoints in production, overly permissive CORS, verbose error messages exposing stack traces. (9) **Improper Inventory Management** — old API versions left running and unauthenticated. (10) **Unsafe Consumption of APIs** — trusting third-party API responses without validation, leading to injection attacks.
Mass assignment occurs when an API binds all incoming request body properties directly to a data model without filtering, allowing an attacker to set fields they should not control. ```js // VULNERABLE: directly merges request body into update query await db.query("UPDATE users SET $1 WHERE id=$2", [req.body, userId]); // Attacker sends: { "name": "Alice", "role": "admin", "balance": 999999 } ``` Prevention: use explicit **allow-lists** — define exactly which fields the caller is permitted to set for each endpoint, and only include those in the database update. ```js const ALLOWED = ['name', 'email', 'bio']; const safe = pick(req.body, ALLOWED); // lodash pick or manual await db.query("UPDATE users SET ...", [safe]); ``` Additionally: validate the schema with a library (Zod, Joi, JSON Schema) that strips unknown fields (`stripUnknown: true`). Never use ORM "update all properties" methods with raw request input. For admin-only fields, require explicit privilege check before accepting the field, not just at the route level.
Application-level rate limiting fails in horizontally scaled deployments because each instance has its own in-memory counter. The solution: centralise the counter in Redis and use a **Lua script** to make the check-and-increment atomic (no race condition between two Redis operations). ```lua local key = KEYS[1] -- e.g. "ratelimit:user_42:minute:1716912060" local limit = tonumber(ARGV[1]) -- e.g. 100 local window = tonumber(ARGV[2]) -- 60 seconds local current = redis.call("INCR", key) if current == 1 then redis.call("EXPIRE", key, window) end if current > limit then return 0 else return 1 end ``` This runs atomically in Redis — no two instances can race. The key encodes the user ID and the current time window (floor to minute). For sliding window rate limiting, use a Redis sorted set: add the current timestamp as both score and member, trim entries older than the window, count the set size. Lua keeps the trim and count atomic. Publish rate limit headers from Redis's returned values. Rate limit by: user ID (authenticated), IP (unauthenticated), API key, endpoint path, or combinations. At extreme scale (millions of keys), use Redis Cluster with consistent hashing on the rate limit key.
A reverse proxy cache (nginx proxy_cache, Varnish) stores upstream responses and serves them directly for subsequent matching requests. The cache key is typically `method + URL`. The **`Vary` response header** tells the cache which request headers affect the response, so it maintains separate cache entries per header value: ```http Vary: Accept-Encoding, Accept-Language ``` This means `gzip` and `br` compressed versions are stored separately; French and English responses are stored separately. Overusing `Vary: *` (varies on everything) effectively disables caching. Cache **invalidation**: (1) TTL expiry via `Cache-Control: max-age`. (2) `PURGE` method (Varnish / nginx `cache_purge` module) — the application sends `PURGE /articles/42` to the cache after an update. (3) Surrogate keys (cache tags) — tag cache entries with logical keys (`article:42`, `author:5`); on update, purge by tag (Fastly/Varnish). Nginx does not support tag-based invalidation natively — use a Lua module or external Redis mapping. Cache invalidation at the proxy level dramatically reduces origin load for read-heavy content but requires careful design to avoid serving stale data after writes.
For datasets too large to buffer in memory (millions of rows, large files): **Chunked transfer encoding** — the server sets `Transfer-Encoding: chunked` and streams response body chunks as they are generated; the `Content-Length` is unknown. Node.js: pipe a readable stream into `res`. This frees memory on both sides. **NDJSON (Newline-Delimited JSON)** — each JSON object is on its own line separated by ` `, enabling the client to parse the stream incrementally without waiting for the full response: ``` {"id":1,"name":"Alice"} {"id":2,"name":"Bob"} ``` `Content-Type: application/x-ndjson`. Clients parse line by line. **Range requests** for file downloads: the server advertises `Accept-Ranges: bytes`. The client can request a specific byte range: ```http GET /exports/large-file.csv Range: bytes=0-1048575 → 206 Partial Content Content-Range: bytes 0-1048575/52428800 ``` This enables resumable downloads — if interrupted, the client requests the remaining range. Implement ETag validation with range requests to detect if the file changed mid-download.
CloudEvents is a CNCF specification for describing event data in a common, interoperable format, enabling routing, filtering, and tracing across different event brokers (Kafka, HTTP, AMQP). Every CloudEvent has required attributes: `id`, `source`, `specversion` ("1.0"), `type`, and optional attributes: `datacontenttype`, `subject`, `time`, `dataschema`. Two HTTP binding content modes: **Structured mode** — all attributes and data are encoded in a single JSON body with `Content-Type: application/cloudevents+json`: ```json { "specversion": "1.0", "type": "com.example.order.created", "source": "/orders", "id": "abc123", "time": "2024-01-15T10:00:00Z", "data": { "orderId": 42, "total": 99.99 } } ``` **Binary mode** — event attributes are HTTP headers (`ce-id`, `ce-type`, `ce-source`), and the raw event data is the body with its native `Content-Type`. Binary mode has lower overhead (no JSON wrapper) and works better with binary payloads. Use CloudEvents when building event-driven APIs that need to integrate with multiple consumers across different platforms.
Pact contract testing has two sides: consumer and provider, run independently in CI. **Consumer side**: write a test that describes what the consumer expects from the provider. Pact records this as an "interaction" in a pact JSON file. ```js await provider.addInteraction({ state: 'user 42 exists', uponReceiving: 'a request for user 42', withRequest: { method: 'GET', path: '/users/42' }, willRespondWith: { status: 200, body: like({ id: 42, name: string() }) } }); ``` **Provider states** (`state: 'user 42 exists'`) are setup hooks that the provider test runner calls before each interaction to put the database into the required state (insert fixtures, mock dependencies). **Provider side CI**: pull pact files from Pact Broker, run `pact:verify` — this starts a replay of each interaction against the real provider. If all interactions pass, the provider publishes "can-i-deploy: true" to the broker. **can-i-deploy** is a Pact Broker CLI command that checks whether both consumer and provider have verified compatible versions — only deploy if both sides pass. This creates a safety net: any provider change that breaks a consumer pact fails CI before deployment, enabling truly independent service deployment.
A service mesh (Istio, Linkerd) manages service-to-service traffic at the infrastructure level, independent of application code. For canary API releases: (1) Deploy the new API version (`v2`) as a separate deployment with a distinct label. (2) Create an Istio **VirtualService** that routes a small percentage of traffic to v2: ```yaml http: - route: - destination: { host: api, subset: v1 } weight: 90 - destination: { host: api, subset: v2 } weight: 10 ``` (3) Define **DestinationRules** with subsets matching pod labels. (4) Gradually increase the v2 weight based on error rate and latency metrics from the mesh's telemetry (Prometheus + Grafana). (5) Use header-based routing for internal testing: route requests with `X-Canary: true` to v2 without percentage rollout. (6) If v2 error rate exceeds the SLO, immediately shift weight back to 0 (automated via Flagger or Argo Rollouts). The API gateway layer can add additional controls: A/B testing by user segment (user ID hash), geographic rollout, or beta-user whitelisting by JWT claim.
Zero-downtime migration of a live API to a new version or data store requires avoiding a hard cutover. Key patterns: **Shadow traffic (dark launch)**: route a copy of live production traffic to the new implementation in parallel, without serving the shadow's response to the client. Compare outputs to detect divergence before committing. ``` Request → Old API → Response to client ↓ async copy New API → log divergence ``` **Dual-write**: during the migration window, write mutations to both old and new stores simultaneously. Reads can start being shifted to the new store once the backfill is complete and consistency is verified. **Traffic shifting strategy**: start at 0% to new, run shadow comparison, verify correctness, then shift 1% → 5% → 25% → 50% → 100% with automated rollback triggers on error rate increase. **Read-your-writes consistency**: if a user writes to the new store but another read is served from the old store before replication, they see stale data — use sticky sessions or a migration flag per user. **Cutover**: once 100% of traffic is on the new version, keep the old version running for 24–48 hours as fallback, then decommission. Monitor database connections and cache warming during ramp-up.
Some API operations take seconds to minutes (video transcoding, report generation, ML inference). Three patterns: **(1) 202 Accepted + polling**: the server immediately returns 202 with a `Location: /operations/{opId}` header. The client polls the operation URL until it transitions from `pending` to `complete` or `failed`. Simple for clients but wastes bandwidth on polling and adds latency. ```http POST /reports → 202 Accepted Location: /operations/op_abc123 GET /operations/op_abc123 → { "status": "pending", "progress": 45 } GET /operations/op_abc123 → { "status": "complete", "result_url": "..." } ``` **(2) Webhooks**: on completion, the server calls a client-registered callback URL. Zero latency overhead on the server, no polling. But the client must expose a public HTTPS endpoint, handle retries from the server, and verify HMAC signatures. Harder to implement for browser clients. **(3) SSE**: client opens an event stream; the server pushes progress updates and the final result. No polling, no webhook infrastructure needed. Works well in browsers. Timeout if the operation takes longer than nginx's `proxy_read_timeout`. Best choice by scenario: SSE for browser-initiated operations; webhooks for server-to-server or mobile; polling for the simplest cross-platform SDK compatibility.
A well-designed search API separates filtering (exact match, reduces the candidate set) from relevance scoring (how well documents match the query, reorders the candidate set). Conflating them leads to poor results: applying a relevance boost inside a filter clause returns wrong documents. Query DSL design: ```json POST /search { "query": "software engineer", "filters": { "location": "remote", "salary_min": 100000 }, "sort": "relevance", "cursor": "eyJzY29yZSI6MC45Miwi...", "limit": 20 } ``` Use `query` for full-text search (Elasticsearch `multi_match`, PostgreSQL `ts_vector`). Use `filters` for exact-match criteria applied as pre-filters before scoring. **Relevance sorting and cursor pagination**: relevance scores are floating-point values that can tie across pages. Without a tiebreaker, pages are unstable — the same document appears on multiple pages or is skipped. Always add a deterministic secondary sort key (document ID) as a tiebreaker. The cursor encodes `(score, id)` from the last document. Elasticsearch's `search_after` parameter implements this exactly. For keyword-sorted results (alphabetical, date), sort stability is easier — just use the primary key as a tiebreaker in the WHERE clause.
Caching at each layer reduces load on layers below it but introduces consistency challenges. From outermost to innermost: **(1) Client cache** — HTTP client respects `Cache-Control: max-age`; subsequent identical requests are served from disk/memory without hitting the network. Zero latency. **(2) CDN** (Cloudflare, Fastly) — edge nodes cache responses globally. GET requests to public resources hit CDN, not the origin. TTL set via `Cache-Control: s-maxage`. Invalidation via tag-based purge APIs or surrogate keys. **(3) API gateway** — request-level caching for authenticated or semi-public responses. Can cache per-user responses if scoped correctly. **(4) Application cache** (Redis, Memcached) — caches expensive DB query results, computed aggregates, or external API responses. Invalidate on write via cache-aside pattern or pub/sub. **(5) Database** (query cache, buffer pool) — the DB engine caches recently used index pages and query results. Each layer has its own invalidation strategy. The hardest problem: a write must invalidate entries at all relevant layers — CDN purge + Redis DEL + application cache clear — atomically or with acceptable inconsistency windows. Design the TTL hierarchy so inner caches expire faster than outer ones to prevent stale outer cache reads surviving inner cache refreshes.
The HTTP specification defines three properties for methods: | Method | Safe | Idempotent | Cacheable (by default) | |---------|------|-----------|----------------------| | GET | ✓ | ✓ | ✓ | | HEAD | ✓ | ✓ | ✓ | | OPTIONS | ✓ | ✓ | ✗ | | TRACE | ✓ | ✓ | ✗ | | PUT | ✗ | ✓ | ✗ | | DELETE | ✗ | ✓ | ✗ | | POST | ✗ | ✗ | ✓ (only with explicit headers) | | PATCH | ✗ | ✗ | ✗ | Safety guarantees that no state changes — browsers can prefetch GET requests without risk. Idempotency guarantees that retries are safe — HTTP clients with retry logic should only retry idempotent methods automatically. Cacheability means the response can be stored and reused — only GET/HEAD/POST (explicitly) and 200/203/204/206/300/301/404/405/410/414/501 status codes are cacheable by default. Compliant frameworks: Express does not enforce any of this; you must implement it. Frameworks like Fastify, Spring, and Django REST Framework validate method matching against route definitions. HTTP clients like `axios-retry` only auto-retry idempotent methods. CDNs only cache GET/HEAD by default. Violating these semantics — e.g., a GET that mutates state — breaks caches, retries, and prefetchers in unpredictable ways.
A well-designed SDK dramatically reduces integration time and support burden. Key elements: **(1) Error hierarchy**: base `ApiError` with sub-classes for each error family. Never expose raw HTTP errors; map them to typed exceptions. ```ts class ApiError extends Error { status: number; requestId: string; } class AuthenticationError extends ApiError {} class RateLimitError extends ApiError { retryAfter: number; } class NotFoundError extends ApiError {} ``` **(2) Retry logic**: automatic retry for idempotent methods (GET) on 5xx and network errors; retry for 429 with exponential backoff honouring `Retry-After`. Never retry POST automatically (not idempotent); allow opt-in with idempotency key. **(3) Pagination helpers**: return async iterators or generator functions that transparently fetch the next cursor. ```ts for await (const job of client.jobs.list({ status: "active" })) { console.log(job.title); // SDK handles pagination automatically } ``` **(4) Versioning alignment**: pin the SDK version to a specific API version (e.g., SDK v2.x always sends `API-Version: 2024-06-01`). Provide a way to override. Publish a changelog mapping SDK versions to API version changes. (5) Provide configurable base URL for sandbox vs production. (6) Include `X-SDK-Version` request header for observability.
OpenTelemetry (OTel) provides vendor-neutral distributed tracing. When service A calls service B, it propagates trace context via the **W3C Trace Context** standard headers: ```http traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 ``` Format: `version-traceId(16B hex)-parentSpanId(8B hex)-flags`. `01` in flags means "sampled". Service B reads `traceparent`, creates a child span with the same `traceId` and its own `spanId`, setting the incoming `parentSpanId` as its parent. This connects the spans in the trace backend (Jaeger, Tempo, Honeycomb) into a single trace tree spanning all services. Implementation in Node.js: `@opentelemetry/api` with auto-instrumentation for Express/Fastify propagates headers automatically. Manual instrumentation: ```js const span = tracer.startSpan("process-payment", { kind: SpanKind.SERVER, attributes: { "user.id": userId, "payment.amount": amount } }); ``` Always propagate `tracestate` alongside `traceparent` for vendor-specific metadata (sampling decisions, baggage). In API gateways, log the `traceId` in access logs to correlate gateway logs with application traces. Sampling strategy: head-based sampling at the gateway (sample 1% of all traffic) or tail-based sampling (sample 100% of traces containing errors).
Usage-based API billing requires accurate metering at every request, quota enforcement in real time, and aggregation for billing calculations. Architecture layers: **(1) Metering at the gateway**: increment a Redis counter per API key per billing period on every request, before the request hits the application — this is the source of truth for quota enforcement. ```lua -- Atomic increment and check local count = redis.call("INCR", "meter:key_abc:2024-01") redis.call("EXPIRE", "meter:key_abc:2024-01", 2678400) -- 31 days if count > limit then return 0 end return 1 ``` **(2) Quota enforcement**: soft limits (warn at 80% via email) and hard limits (return 429 at 100%). Expose usage in API headers: `X-RateLimit-Used: 8000`, `X-RateLimit-Limit: 10000`. **(3) Aggregation for billing**: stream meter events (Kafka, Kinesis, or Redis INCR) to a time-series store (ClickHouse, TimescaleDB). Aggregate daily/monthly usage by customer, by endpoint, by tier. **(4) Stripe metering**: use Stripe Billing with usage records — report meter events to Stripe's metered billing API at the end of each billing period (or in near-real-time via the Meter Events API). **(5) Idempotent reporting**: meter events must be deduplicated before billing — use the request's `X-Request-ID` as the idempotency key when reporting to Stripe.
A well-designed SDK hides HTTP complexity while giving callers enough control. **Error hierarchy**: define a base `APIError` with `statusCode`, `requestId`, and `message`. Subclass into `AuthError` (401/403), `RateLimitError` (429, with `retryAfter`), `ValidationError` (400/422, with `fields`), `NotFoundError` (404), and `ServerError` (5xx). Callers can `catch (e) { if (e instanceof RateLimitError) sleep(e.retryAfter) }`. **Retry logic**: automatically retry idempotent requests (GET, PUT, DELETE) on 429 and 5xx using exponential backoff with jitter — never retry POST unless the server returned an idempotency key echo. Cap at 3–5 attempts. **Pagination helpers**: return an async iterator or `AsyncGenerator` from list methods so callers write `for await (const item of client.jobs.list())` without managing cursors. The SDK fetches the next page transparently when the iterator is exhausted. **Versioning alignment**: pin the SDK version to the API version it was built against (e.g., `stripe@13.x` → `api-version: 2024-06-20`). Expose `client.setAPIVersion('2024-11-01')` for users who want to opt into newer versions. Ship changelogs that cross-reference API changelog entries. **Type safety**: auto-generate TypeScript types from the OpenAPI spec at build time with `openapi-typescript` so the SDK types are always in sync with the actual API contract.
**Safe** methods (GET, HEAD, OPTIONS, TRACE) do not change server state — clients can call them freely without side effects. **Idempotent** methods (GET, HEAD, OPTIONS, TRACE, PUT, DELETE) produce the same server state regardless of how many times they are called — retrying them after a network timeout is safe. POST and PATCH are neither safe nor idempotent. **Cacheable** methods are those whose responses may be stored and reused: GET and HEAD are cacheable by default if the response includes appropriate cache headers; POST responses are cacheable only if `Cache-Control` or `Expires` explicitly allows it. DELETE and PUT are not cacheable. The matrix: ``` Method Safe Idempotent Cacheable GET ✓ ✓ ✓ HEAD ✓ ✓ ✓ POST ✗ ✗ conditional PUT ✗ ✓ ✗ PATCH ✗ ✗ ✗ DELETE ✗ ✓ ✗ ``` RFC 9110 (HTTP Semantics) formalises these. A compliant framework like Spring WebMVC or ASP.NET Core will refuse to add `Cache-Control` to PUT/DELETE responses by default, and an API gateway like Kong can be configured to only cache responses to safe methods. The practical implication: design your API so GET/HEAD are truly free of side effects (no audit-logging writes in the hot path), and ensure DELETE really is idempotent — a second DELETE on a missing resource should return 404, not 500.
A search API must balance expressiveness, performance, and predictable pagination. **Query DSL design**: separate *filters* (exact/range matches that do not affect relevance) from *query* (free-text that scores results). Use a flat structure for simple use cases and a nested boolean DSL for complex ones: ```json { "query": "senior engineer", "filters": { "location": "remote", "salary_min": 120000 }, "sort": [{ "field": "relevance", "order": "desc" }, { "field": "created_at", "order": "desc" }] } ``` **Relevance vs filtering separation**: filters run as fast no-score queries (bit sets in Lucene/Elasticsearch) and should not affect scoring. The relevance score comes from BM25 over the `query` field. Mixing them (e.g., boosting by recency) is a deliberate tuning decision, not a default. **Cursor pagination with sort stability**: offset pagination (`?page=3`) breaks when new results are inserted — items shift between pages. Cursor-based pagination encodes the last-seen sort key as an opaque token. For multi-field sorts (`relevance DESC, created_at DESC`), the cursor must encode both values and the WHERE clause becomes `WHERE (relevance, created_at) < (cursor.relevance, cursor.created_at)` using tuple comparison. Encode as base64 JSON to keep the API surface clean. **Sort stability requirement**: if two documents have identical sort keys, a tiebreaker (document `id`) must be the final sort field to guarantee the cursor never produces duplicate or skipped pages. **Response shape**: include `next_cursor`, `has_more`, `total_count` (if cheap), and `took_ms` for client-side debugging.

Frequently Asked Questions

REST vs GraphQL vs gRPC?

REST is simplest and most compatible. GraphQL fits flexible client queries. gRPC is efficient for service-to-service. Choose based on client and performance needs.

How do you version APIs?

URL versioning (/v1) is simplest; header versioning is cleaner but harder to debug. Either way, deprecate gracefully with long notice.

What about idempotency?

Critical for safe retries. POST can be idempotent via an Idempotency-Key header. PUT and DELETE are idempotent by spec.

Which auth scheme to use?

OAuth 2.0 + JWT for user-facing APIs, mutual TLS or signed requests for service-to-service.

How do you document APIs?

OpenAPI (Swagger) is the industry standard. Many frameworks auto-generate it from code annotations.

Related Topics

Ready to apply?

TryApplyNow scores matches, tailors resumes, and tracks applications so you can focus on prep, not paperwork.

Try for free →