agentzh's Nginx Tutorials (version 2020.03.19)

Table of Contents

Foreword

I've been doing a lot of work in the Nginx world over the last few years and I've also been thinking about writing a series of tutorial-like articles to explain to more people what I've done and what I've learned in this area. Now I have finally decided to post serial articles to the Sina Blog http://blog.sina.com.cn/openresty in Chinese. Every article will roughly cover a single topic and will be in a rather casual style. But at some point in the future I may restructure the articles and their style in order to turn them into a "real" book.

The articles are divided into series. For example, the first series is "Nginx Variables". Each series can be thought of as mapping to a chapter in the Nginx book that I may publish in the future.

The articles are intended for Nginx users of all experience levels, including users with extensive Apache and Lighttpd experience who may have never used Nginx before.

The examples in the articles are at least compatible with Nginx 0.8.54. Do not try the examples with older versions of Nginx. The latest stable version of Nginx as of this writing is 1.7.9.

All of the Nginx modules referenced in the articles are production-ready. I will not be covering any Nginx core modules that are either experimental or buggy. Additionally, I will be making extensive use of 3rd-party Nginx modules in the examples. If it's inconvenient for you to download and install the individual modules one at a time then I highly recommend that you download and install the ngx_openresty software bundle that I maintain.

http://openresty.org/

All of the modules referenced in the articles, including the core Nginx modules that are new (but stable), are included in the OpenResty bundle.

A principle that I will be trying to adhere to is to use small concise examples to explain and validate the concepts and behaviors being described. My hope is that it will help the reader to develop the good habit of not accepting others' viewpoints or statements at face value without testing them first. This approach may have something to do with my QA background. In fact, I keep tweaking and correcting the articles based on the results of running the examples while writing.

The examples in the articles fall into one of two categories, good and problematic. The purpose of the problematic examples is to highlight potential pitfalls and other areas where Nginx or its modules behave in ways that readers may not expect. Problematic examples are easy to identify because each line of text in the example will be prefixed with a question mark, i.e., "?". Here is an example:

? server {
?     listen 8080;
?
?     location /bad {
?         echo $foo;
?     }
? }

Do not reproduce these articles without explicit permissions from us. Copyright reserved.

I encourage readers to send feedback (agentzh@gmail.com), especially constructive criticism.

The source for all the articles is on GitHub:

http://github.com/agentzh/nginx-tutorials/

The source files are under the en/ directory. I am using a little markup language that is a mixture of Wiki and POD to write these articles. They are the .tut files. You are welcome to create forks and/or provide patches.

The e-books files that are suitable for cellphones, Kindle, iPad/iPhone, Sony Readers, and other devices can be downloaded from here:

http://openresty.org/#eBooks

Special thanks go to Kai Wu (kai10k) who kindly translates these articles to English.

agentzh at home in the Fuzhou city

October 30, 2011

Writing Plan for the Tutorials

Here lists the tutorial series that have already been published or to be published.

The series names can roughly correspond to the chapter names in my final Nginx book, but they are unlikely to stay exactly the same. The actual series names may change and the relative order of the series may change as well.

The list above will be constantly updated to always reflect the latest plan.

Nginx Variables (01)

Variables as Value Containers

Nginx's configuration files use a micro programming language. Many real-world Nginx configuration files are essentially small programs. This language's design is heavily influenced by Perl and Bourne Shell as far as I can see, despite the fact that it might not be Turing-Complete and it is declarative in many places. This is a distinguishing feature of Nginx, as compared to other web servers like Apache or Lighttpd. Being a programming language, "variables" are thus a natural part of it (exceptions do exist, of course, as in pure functional languages like Haskell).

Variables are value containers

Variables are just containers holding various values in imperative languages like Perl, Bourne Shell, and C/C++. And "values" can be numbers like 3.14, strings like hello world, or even complicated things like references to arrays or hash tables in those languages. For the Nginx configuration language, however, variables can hold only one type of values, that is, strings (there is an interesting exception: the 3rd-party module ngx_array_var extends Nginx variables to hold arrays, but it is implemented by encoding a C pointer as a binary string value behind the scene).

Variable Syntax and Interpolation

Let's say our nginx.conf configuration file has the following line:

set $a "hello world";

We assign a value to the variable $a via the set configuration directive coming from the standard ngx_rewrite module. In particular, we assign the string value hello world to $a.

We can see that the Nginx variable name takes a dollar sign ($) in front of it. This is required by the language syntax: whenever we want to reference an Nginx variable in the configuration file, we must add a $ prefix. This looks very familiar to those Perl and PHP programmers.

Such variable prefix modifiers may discomfort some Java and C# programmers, this notation does have an obvious advantage though, that is, variables can be embedded directly into a string literal:

set $a hello;
set $b "$a, $a";

Here we use the value of the existing Nginx variable $a to construct the value for the variable $b. So after these two directives complete execution, the value of $a is hello, and $b is hello, hello. This technique is called "variable interpolation" in the Perl world, which makes ad-hoc string concatenation operators no longer that necessary. Let's use the same term for the Nginx world from now on.

Let's see another complete example:

server {
    listen 8080;

    location /test {
        set $foo hello;
        echo "foo: $foo";
    }
}

This example omits the http directive and events configuration blocks in the outer-most scope for brevity. To request this /test interface via curl, an HTTP client utility, on the command line, we get

$ curl 'http://localhost:8080/test'
foo: hello

Here we use the echo directive of the 3rd party module ngx_echo to print out the value of the $foo variable as the HTTP response.

Apparently the arguments of the echo directive does support "variable interpolation", but we can not take it for granted for other directives. Because not all the configuration directives support "variable interpolation" and it is in fact up to the implementation of the directive in that module. Always look up the documentation to be sure.

Escaping "$"

We've already learned that the $ character is special and it serves as the variable name prefix, but now consider that we want to output a literal $ character via the echo directive. The following naive example does not work at all:

? :nginx
? location /t {
?     echo "$";
? }

We will get the following error message while loading this configuration:

[emerg] invalid variable name in ...

Obviously Nginx tries to parse $" as a variable name. Is there a way to escape $ in the string literal? The answer is "no" (it is still the case in the latest Nginx stable release 1.2.7) and I have been hoping that we could write something like $$ to obtain a literal $.

Luckily, workarounds do exist and here is one proposed by Maxim Dounin: first we assign to a variable a literal string containing a dollar sign character via a configuration directive that does not support "variable interpolation" (remember that not all the directives support "variable interpolation"?), and then reference this variable later whenever we need a dollar sign. Here is such an example to demonstrate the idea:

geo $dollar {
    default "$";
}

server {
    listen 8080;

    location /test {
        echo "This is a dollar sign: $dollar";
    }
}

Let's test it out:

$ curl 'http://localhost:8080/test'
This is a dollar sign: $

Here we make use of the geo directive of the standard module ngx_geo to initialize the $dollar variable with the string "$", thereafter variable $dollar can be used in places that require a dollar sign. This works because the geo directive does not support "variable interpolation" at all. However, the ngx_geo module is originally designed to set a Nginx variable to different values according to the remote client address, and in this example, we just abuse it to initialize the $dollar variable with the string "$" unconditionally.

Disambiguating Variable Names

There is a special case for "variable interpolation", that is, when the variable name is followed directly by characters allowed in variable names (like letters, digits, and underscores). In such cases, we can use a special notation to disambiguate the variable name from the subsequent literal characters, for instance,

server {
    listen 8080;

    location /test {
        set $first "hello ";
        echo "${first}world";
    }
}

Here the variable $first is concatenated with the literal string world. If it were written directly as "$firstworld", Nginx's "variable interpolation" engine (also known as the "script engine") would try to access the variable $firstworld instead of $first. To resolve the ambiguity here, curly braces must be used around the variable name (excluding the $ prefix), as in ${first}. Let's test this sample:

$ curl 'http://localhost:8080/test'
hello world

Variable Declaration and Creation

In languages like C/C++, variables must be declared (or created) before they can be used so that the compiler can allocate storage and perform type checking at compile-time. Similarly, Nginx creates all the Nginx variables while loading the configuration file (or in other words, at "configuration time"), therefore Nginx variables are also required to be declared somehow.

Fortunately the set directive and the geo directive mentioned above do have the side effect of declaring or creating Nginx variables that they will assign values to later at "request time". If we do not declare a variable this way and use it directly in, say, the echo directive, we will get an error. For example,

? server {
?     listen 8080;
?
?     location /bad {
?         echo $foo;
?     }
? }

Here we do not declare the $foo variable and access its value directly in echo. Nginx will just refuse loading this configuration:

[emerg] unknown "foo" variable

Yes, we cannot even start the server!

Nginx variable creation and assignment happen at completely different phases along the time-line. Variable creation only occurs when Nginx loads its configuration. On the other hand, variable assignment occurs when requests are actually being served. This also means that we can never create new Nginx variables at "request time".

Variable Scope

Once an Nginx variable is created, it is visible to the entire configuration, even across different virtual server configuration blocks, regardless of the places it is declared at. Here is an example:

server {
    listen 8080;

    location /foo {
        echo "foo = [$foo]";
    }

    location /bar {
        set $foo 32;
        echo "foo = [$foo]";
    }
}

Here the variable $foo is created by the set directive within location /bar, and this variable is visible to the entire configuration, therefore we can reference it in location /foo without worries. Below is the result of testing these two interfaces via the curl tool.

$ curl 'http://localhost:8080/foo'
foo = []

$ curl 'http://localhost:8080/bar'
foo = [32]

$ curl 'http://localhost:8080/foo'
foo = []

We can see that the assignment operation is only performed in requests that access location /bar, since the corresponding set directive is only used in that location. When requesting the /foo interface, we always get an empty value for the $foo variable because that is what we get when accessing an uninitialized variable.

Another important characteristic that we can observe from this example is that even though the scope of Nginx variables is the entire configuration, each request does have its own version of all those variables' containers. Requests do not interfere with each other even if they are referencing a variable with the same name. This is very much like local variables in C/C++ function bodies. Each invocation of the C/C++ function does use its own version of those local variables (on the stack).

For instance, in this sample, we request /bar and the variable $foo gets the value 32, which does not affect the value of $foo in subsequent requests to /foo (it is still uninitialized!), because they correspond to different value containers.

One common mistake for Nginx newcomers is to regard Nginx variables as something shared among all the requests. Even though the scope of Nginx variable names go across configuration blocks at "configuration time", its value container's scope never goes beyond request boundaries at "request time". Essentially here we do have two different kinds of scope here.

Nginx Variables (02)

Variable Lifetime & Internal Redirection

We already know that Nginx variables are bound to each request handled by Nginx, for this reason they have exactly the same lifetime as the corresponding request.

There is another common misunderstanding here though: some newcomers tend to assume that the lifetime of Nginx variables is bound to the location configuration block. Let's consider the following counterexample:

server {
    listen 8080;

    location /foo {
        set $a hello;
        echo_exec /bar;
    }

    location /bar {
        echo "a = [$a]";
    }
}

Here in location /foo we use the echo_exec directive (provided by the 3rd-party module ngx_echo) to initiate an "internal redirection" to location /bar. The "internal redirection" is an operation that makes Nginx jump from one location to another while processing a request. This "jumping" happens completely within the server itself. This is different from those "external redirections" based on the HTTP 301 and 302 responses because the latter is collaborated externally, by the HTTP clients. Also, in case of "external redirections", the end user could usually observe the change of the URL in her web browser's address bar while this is not the case for internal ones. "Internal redirections" are very similar to the exec command in Bourne Shell; it is a "one way trip" and never returns. Another similar example is the goto statement in the C language.

Being an "internal redirection", the request after the redirection remains the original one. It is just the current location that is changed, so we are still using the original copy of the Nginx variable containers. Back to our example, the whole process looks like this: Nginx first assigns to the $a variable the string value hello via the set directive in location /foo, and then it issues an internal redirection via the echo_exec directive, thus leaving location /foo and entering location /bar, and finally it outputs the value of $a. Because the value container of $a remains untouched, we can expect the response output to be hello. The test result confirms this:

$ curl localhost:8080/foo
a = [hello]

But when accessing /bar directly from the client side, we will get an empty value for the $a variable, since this variable relies on location /foo to get initialized.

It can be observed that during a request's lifetime, the copy of Nginx variable containers does not change at all even when Nginx goes across different location configuration blocks. Here we also encounter the concept of "internal redirections" for the first time and it's worth mentioning that the rewrite directive of the ngx_rewrite module can also be used to initiate "internal redirections". For instance, we can rewrite the example above with the rewrite directive as follows:

server {
    listen 8080;

    location /foo {
        set $a hello;
        rewrite ^ /bar;
    }

    location /bar {
        echo "a = [$a]";
    }
}

It's functionally equivalent to echo_exec. We will discuss the rewrite directive in more depth in later chapters, like initiating "external redirections" like 301 and 302.

To conclude, the lifetime of Nginx variable containers is indeed bound to the request being processed, and is irrelevant to location.

Nginx Built-in Variables

The Nginx variables we have seen so far are all (implicitly) created by directives like set. We usually call such variables "user-defined varaibles", or simply "user variables". There is also another kind of Nginx variables that are pre-defined by either the Nginx core or Nginx modules. Let's call this kind of variables "built-in variables".

$uri & $request_uri

One common use of Nginx built-in variables is to retrieve various types of information about the current request or response. For instance, the built-in variable $uri provided by ngx_http_core is used to fetch the (decoded) URI of the current request, excluding any query string arguments. Another example is the $request_uri variable provided by the same module, which is used to fetch the raw, non-decoded form of the URI, including any query string. Let's look at the following example.

location /test {
    echo "uri = $uri";
    echo "request_uri = $request_uri";
}

We omit the server configuration block here for brevity. Just as all those samples above, we still listen to the 8080 local port. In this example, we output both the $uri and $request_uri into the response body. Below is the result of testing this /test interface with different requests:

$ curl 'http://localhost:8080/test'
uri = /test
request_uri = /test

$ curl 'http://localhost:8080/test?a=3&b=4'
uri = /test
request_uri = /test?a=3&b=4

$ curl 'http://localhost:8080/test/hello%20world?a=3&b=4'
uri = /test/hello world
request_uri = /test/hello%20world?a=3&b=4

Variables with Infinite Names

There is another very common built-in variable that does not have a fixed variable name. Instead, It has infinite variations. That is, all those variables whose names have the prefix arg_, like $arg_foo and $arg_bar. Let's just call it the $arg_XXX "variable group". For example, the $arg_name variable is evaluated to the value of the name URI argument for the current request. Also, the URI argument's value obtained here is not decoded yet, potentially containing the %XX sequences. Let's check out a complete example:

location /test {
    echo "name: $arg_name";
    echo "class: $arg_class";
}

Then we test this interface with various different URI argument combinations:

$ curl 'http://localhost:8080/test'
name:
class:

$ curl 'http://localhost:8080/test?name=Tom&class=3'
name: Tom
class: 3

$ curl 'http://localhost:8080/test?name=hello%20world&class=9'
name: hello%20world
class: 9

In fact, $arg_name does not only match the name argument name, but also NAME or even Name. That is, the letter case does not matter here:

$ curl 'http://localhost:8080/test?NAME=Marry'
name: Marry
class:

$ curl 'http://localhost:8080/test?Name=Jimmy'
name: Jimmy
class:

Behind the scene, Nginx just converts the URI argument names into the pure lower-case form before matching against the name specified by $arg_XXX.

If you want to decode the special sequences like %20 in the URI argument values, then you could use the set_unescape_uri directive provided by the 3rd-party module ngx_set_misc.

location /test {
    set_unescape_uri $name $arg_name;
    set_unescape_uri $class $arg_class;

    echo "name: $name";
    echo "class: $class";
}

Let's check out the actual effect:

$ curl 'http://localhost:8080/test?name=hello%20world&class=9'
name: hello world
class: 9

The space has indeed been decoded!

Another thing that we can observe from this example is that the set_unescape_uri directive can also implicitly create Nginx user-defined variables, just like the set directive. We will discuss the ngx_set_misc module in more detail in future chapters.

This type of variables like $arg_XXX possesses infinite number of possible names, so they do not correspond to any value containers. Furthermore, such variables are handled in a very specific way within the Nginx core. It is thus not possible for 3rd-party modules to introduce such magical built-in variables of their own.

The Nginx core offers a lot of such built-in variables in addition to $arg_XXX, like the $cookie_XXX variable group for fetching HTTP cookie values, the $http_XXX variable group for fetching request headers, as well as the $sent_http_XXX variable group for retrieving response headers. We will not go into the details for each of them here. Interested readers can refer to the official documentation for the ngx_http_core module.

Read-only Built-in Variables

All the user-defined variables are writable. Actually the way that we declare or create such variables so far is to use a configure directive, like set, that performs value assignment at request time. But it is not necessarily the case for built-in variables.

Most of the built-in variables are effectively read-only, like the $uri and $request_uri variables that we just introduced earlier. Assignments to such read-only variables must always be avoided. Otherwise it will lead to unexpected consequences, for example,

? location /bad {
?     set $uri /blah;
?     echo $uri;
? }

This problematic configuration just triggers a confusing error message when Nginx is started:

[emerg] the duplicate "uri" variable in ...

Attempts of writing to some other read-only built-in variables like $arg_XXX will just lead to server crashes in some particular Nginx versions.

Nginx Variables (03)

Writable Built-in Variable $args

Some built-in variables are writable as well. For instance, when reading the built-in variable $args, we get the URL query string of the current request, but when writing to it, we are effectively modifying the query string. Here is such an example:

location /test {
    set $orig_args $args;
    set $args "a=3&b=4";

    echo "original args: $orig_args";
    echo "args: $args";
}

Here we first save the original URL query string into our own variable $orig_args, then modify the current query string by overriding the $args variable, and finally output the variables $orig_args and $args, respectively, with the echo directive. Let's test it like this:

$ curl 'http://localhost:8080/test'
original args:
args: a=3&b=4

$ curl 'http://localhost:8080/test?a=0&b=1&c=2'
original args: a=0&b=1&c=2
args: a=3&b=4

In the first test, we did not provide any URL query string, hence the empty output for the $orig_args variable. And in both tests, the current query string was forcibly overridden to the new value a=3&b=4, regardless of the presence of a query string in the original request.

It should be noted that the $args variable here no longer owns a value container as user variables, just like $arg_XXX. When reading $args, Nginx will execute a special piece of code, fetching data from a particular place where the Nginx core stores the URL query string for the current request. On the other hand, when we overwrite $args, Nginx will execute another special piece of code, storing new value into the same place in the core. Other parts of Nginx also read the same place whenever the query string is needed, so our modification to $args will immediately affect all the other parts' functionality later on. Let's see an example for this:

location /test {
    set $orig_a $arg_a;
    set $args "a=5";
    echo "original a: $orig_a";
    echo "a: $arg_a";
}

Here we first save the value of the built-in varaible $arg_a, the value of the original request's URL argument a, into our user variable $orig_a, then change the URL query string to a=5 by assigning the new value to the built-in variable $args, and finally output the variables $orig_a and $arg_a, respectively. Because modifications to $args effectively change the URL query string of the current request for the whole server, the value of the built-in variable $arg_XXX should also change accordingly. The test result verifies this:

$ curl 'http://localhost:8080/test?a=3'
original a: 3
a: 5

We can see that the initial value of $arg_a is 3 since the URL query string of the original request is a=3. But the final value of $arg_a automatically becomes 5 after we modify $args with the value a=5.

Below is another example to demonstrate that assignments to $args also affect the HTTP proxy module ngx_proxy.

server {
    listen 8080;

    location /test {
        set $args "foo=1&bar=2";
        proxy_pass http://127.0.0.1:8081/args;
    }
}

server {
    listen 8081;

    location /args {
        echo "args: $args";
    }
}

Two virtual servers are defined here in the http configuration block (omitted for brevity).

The first virtual server is listening at the local port 8080. Its /test location first updates the current URL query string to the value foo=1&bar=2 by writing to $args, then sets up an HTTP reverse proxy via the proxy_pass directive of the ngx_proxy module, targeting the HTTP service /args on the local port 8081. By default the ngx_proxy module automatically forwards the current URL query string to the remote HTTP service.

The "remote HTTP service" on the local port 8081 is provided by the second virtual server defined by ourselves, where we output the current URL query string via the echo directive in location /args. By doing this, we can investigate the actual URL query string forwarded by the ngx_proxy module from the first virtual server.

Let's access the /test interface exposed by the first virtual server.

$ curl 'http://localhost:8080/test?blah=7'
args: foo=1&bar=2

We can see that the URL query string is first rewritten to foo=1&bar=2 even though the original request takes the value blah=7, then it is forwarded to the /args interface of the second virtual server via the proxy_pass directive, and finally its value is output to the client.

To summarize, the assignment to $args also successfully influences the behavior of the ngx_proxy module.

Variable "Get Handlers" and "Set Handlers"

We have already learned in previous sections that when reading the built-in variable $args, Nginx executes a special piece of code to obtain a value on-the-fly and when writing to this variable, Nginx executes another special piece of code to propagate the change. In Nginx's terminology, the special code executed for reading the variable is called "get handler" and the code for writing to the variable is called "set handler". Different Nginx modules usually prepare different "get handlers" and "set handlers" for their own variables, which effectively put magic into these variables' behavior.

Such techniques are not uncommon in the computing world. For example, in object-oriented programming (OOP), the class designer usually does not expose the member variable of the class directly to the user programmer, but instead provides two methods for reading from and writing to the member variable, respectively. Such class methods are often called "accessors". Below is an example in the C++ programming language:

#include <string>
using namespace std;

class Person {
public:
    const string get_name() {
        return m_name;
    }

    void set_name(const string name) {
        m_name = name;
    }

private:
    string m_name;
};

In this C++ class Person, we provide two public methods, get_name and set_name, to serve as the "accessors" for the private member variable m_name.

The benefits of such design are obvious. The class designer can execute arbitrary code in the "accessors", to implement any extra business logic or useful side effects, like automatically updating other member variables depending on the current member, or updating the corresponding field in a database associated with the current object. For the latter case, it is possible that the member variable does not exist at all, or that the member variable just serves as a data cache to mitigate the pressure on the back-end database.

Corresponding to the concept of "accessors" in OOP, Nginx variables also support binding custom "get handlers" and "set handlers". Additionally, not all Nginx variables own a container to hold values. Some variables without a container just behave like a magical container by means of its fancy "get handler" and "set handler". In fact, when a variable is being created at "configure time", the creating Nginx module must make a decision on whether to allocate a value container for it and whether to attach a custom "get handler" and/or a "set handler" to it.

Those variables owning a value container are called "indexed variables" in Nginx's terminology. Otherwise, they are said to be not indexed.

We already know that the "variable groups" like $arg_XXX discussed in earlier sections do not have a value container and thus are not indexed. When reading $arg_XXX, it is its "get handler" at work, that is, its "get handler" scans the current URL query string on-the-fly, extracting the value of the specified URL argument. Many beginners misunderstand the way $arg_XXX is implemented; they assume that Nginx will parse all the URL arguments in advance and prepare the values for all those non-empty $arg_XXX variables before they are actually read. This is not true, however. Nginx never tries to parse all the URL arguments beforehand, but rather scans the whole URL query string for a particular argument in a "get handler" every time that argument is requested by reading the corresponding $arg_XXX variable. Similarly, when reading the built-in variable $cookie_XXX, its "get handler" just scans the Cookie request headers for the cookie name specified.

Nginx Variables (04)

Value Containers for Caching & ngx_map

Some Nginx variables choose to use their value containers as a data cache when the "get handler" is configured. In this setting, the "get handler" is run only once, i.e., at the first time the variable is read, which reduces overhead when the variable is read multiple times during its lifetime. Let's see an example for this.

map $args $foo {
    default     0;
    debug       1;
}

server {
    listen 8080;

    location /test {
        set $orig_foo $foo;
        set $args debug;

        echo "original foo: $orig_foo";
        echo "foo: $foo";
    }
}

Here we use the map directive from the standard module ngx_map for the first time, which deserves some introduction. The word map here means mapping or correspondence. For example, functions in Maths are a kind of "mapping". And Nginx's map directive is used to define a "mapping" relationship between two Nginx variables, or in other words, "function relationship". Back to this example, we use the map directive to define the "mapping" relationship between user variable $foo and built-in variable $args. When using the Math function notation, y = f(x), our $args variable is effectively the "independent variable", x, while $foo is the "dependent variable", y. That is, the value of $foo depends on the value of $args, or rather, we map the value of $args onto the $foo variable (in some way).

Now let's look at the exact mapping rule defined by the map directive in this example.

map $args $foo {
    default     0;
    debug       1;
}

The first line within the curly braces is a special rule condition, that is, this condition holds if and only if other conditions all fail. When this "default" condition holds, the "dependent variable" $foo is assigned by the value 0. The second line within the curly braces means that the "dependent variable" $foo is assigned by the value 1 if the "independent variable" $args matches the string value debug. Combining these two lines, we obtain the following complete mapping rule: if the value of $args is debug, variable $foo gets the value 1; otherwise $foo gets the value 0. So essentially, this is a conditional assignment to the variable $foo.

Now that we understand what the map directive does, let's look at the definition of location /test. We first save the value of $foo into another user variable $orig_foo, then overwrite the value of $args to debug, and finally output the values of $orig_foo and $foo, respectively.

Intuitively, after we overwrite the value of $args to debug, the value of $foo should automatically get adjusted to 1 according to the mapping rule defined earlier, regardless of the original value of $foo. But the test result suggests the other way around.

$ curl 'http://localhost:8080/test'
original foo: 0
foo: 0

The first output line indicates that the value of $orig_foo is 0, which is exactly what we expected: the original request does not take a URL query string, so the initial value of $args is empty, leading to the 0 initial value of $foo, according to the "default" condition in our mapping rule.

But surprisingly, the second output line indicates that the final value of $foo is still 0, even after we overwrite $args to the value debug. This apparently violates our mapping rule because when $args takes the value debug, the value of $foo should really be 1. So what is happening here?

Actually the reason is pretty simple: when the first time variable $foo is read, its value computed by ngx_map's "get handler" is cached in its value container. We already learned earlier that Nginx modules may choose to use the value container of the variable created by themselves as a data cache for its "get handler". Obviously, the ngx_map module considers the mapping computation between variables expensive enough and caches the result automatically, so that the next time the same variable is read within the lifetime of the current request, Nginx can just return the cached result without invoking the "get handler" again.

To verify this further, we can try specifying the URL query string as debug in the original request.

$ curl 'http://localhost:8080/test?debug'
original foo: 1
foo: 1

It can be seen that the value of $orig_foo becomes 1, complying with our mapping rule. And subsequent readings of $foo always yield the same cached result, 1, regardless of the new value of $args later on.

The map directive is actually a unique example, because it not only registers a "get handler" for the user variable, but also allows the user to define the computing rule in the "get handler" directly in the Nginx configuration file. Of course, the rule that can be defined here is limited to simple mapping relations with another variable. Meanwhile, it must be made clear that not all the variables using a "get handler" will cache the result. For instance, we have already seen earlier that the $arg_XXX variable does not use its value container at all.

Similar to the ngx_map module, the standard module ngx_geo that we encountered earlier also enables value caching for the variables created by its geo directive.

A Side Note for Use Contexts of Directives

In the previous example, we should also note that the map directive is put outside the server configuration block, that is, it is defined directly within the outermost http configuration block. Some readers may be curious about this setting, since we only use it in location /test after all. If we try putting the map statement within the location block, however, we will get the following error while starting Nginx:

[emerg] "map" directive is not allowed here in ...

So it is explicitly prohibited. In fact, it is only allowed to use the map directive in the http block. Every configure directive does have a pre-defined set of use contexts in the configuration file. When in doubt, always refer to the corresponding documentation for the exact use contexts of a particular directive.

Lazy Evaluation of Variable Values

Many Nginx freshmen would worry that the use of the map directive within the global scope (i.e., the http block) will lead to unnecessary variable value computation and assignment for all the locations in all the virtual servers even if only one location block actually uses it. Fortunately, this is not what is happening here. We have already learned how the map directive works. It is the "get handler" (registered by the ngx_map module) that performs the value computation and related assignment. And the "get handler" will not run at all unless the corresponding user variable is actually being read. Therefore, for those requests that never access that variable, there cannot be any (useless) computation involved.

The technique that postpones the value computation off to the point where the value is actually needed is called "lazy evaluation" in the computing world. Programming languages natively offering "lazy evaluation" is not very common though. The most famous example is the Haskell programming language, where lazy evaluation is the default semantics. In contrast with "lazy evaluation", it is much more common to see "eager evaluation". We are lucky to see examples of lazy evaluation here in the ngx_map module, but the "eager evaluation" semantics is also much more common in the Nginx world. Consider the following set statement that cannot be simpler:

set $b "$a,$a";

When running the set directive, Nginx eagerly computes and assigns the new value for the variable $b without postponing to the point when $b is actually read later on. Similarly, the set_unescape_uri directive also evaluates eagerly.

Nginx Variables (05)

Variables in Subrequests

A Detour to Subrequests

We have seen earlier that the lifetime of variable containers is bound to the request, but I owe you a formal definition of "requests" there. You might have assumed that the "requests" in that context are just those HTTP requests initiated from the client side. In fact, there are two kinds of "requests" in the Nginx world. One is called "main requests", and the other is called "subrequests".

Main requests are those initiated externally by HTTP clients. All the examples that we have seen so far involve main requests only, including those doing "internal redirections" via the echo_exec or rewrite directive.

Whereas subrequests are a special kind of requests initiated from within the Nginx core. But please do not confuse subrequests with those HTTP requests created by the ngx_proxy modules! Subrequests may look very much like an HTTP request in appearance, their implementation, however, has nothing to do with neither the HTTP protocol nor any kind of socket communication. A subrequest is an abstract invocation for decomposing the task of the main request into smaller "internal requests" that can be served independently by multiple different location blocks, either in series or in parallel. "Subrequests" can also be recursive: any subrequest can initiate more sub-subrequests, targeting other location blocks or even the current location itself. According to Nginx's terminology, if request A initiates a subrequest B, then A is called the "parent request" of B. It is worth mentioning that the Apache web server also has the concept of subrequests for long, so readers coming from that world should be no stranger to this.

Let's check out an example using subrequests:

location /main {
    echo_location /foo;
    echo_location /bar;
}

location /foo {
    echo foo;
}

location /bar {
    echo bar;
}

Here in location /main, we use the echo_location directive from the ngx_echo module to initiate two GET-typed subrequests targeting /foo and /bar, respectively. The subrequests initiated by echo_location are always running sequentially according to their literal order in the configuration file. Therefore, the second /bar request will not be fired until the first /foo request completes processing. The response body of these two subrequests get concatenated together according to their running order, to form the final response body of their parent request (for /main):

$ curl 'http://localhost:8080/main'
foo
bar

It should be noted that the communication of location blocks via subrequests is limited within the same server block (i.e., the same virtual server configuration), so when the Nginx core processes a subrequest, it just calls a few C functions behind the scene, without doing any kind of network or UNIX domain socket communication. For this reason, subrequests are extremely efficient.

Independent Variable Containers in Subrequests

Back to our earlier discussion for the lifetime of Nginx variable containers, now we can still state that the lifetime is bound to the current request, and every request does have its own copy of all the variable containers. It is just that the "request" here can be either a main request, or a subrequest. Variables with the same name between a parent request and a subrequest will generally not interfere with each other. Let's do a small experiment to confirm this:

location /main {
    set $var main;

    echo_location /foo;
    echo_location /bar;

    echo "main: $var";
}

location /foo {
    set $var foo;
    echo "foo: $var";
}

location /bar {
    set $var bar;
    echo "bar: $var";
}

In this sample, we assign different values to the variable $var in three location blocks, /main, /foo, and /bar, and output the value of $var in all these locations. In particular, we intentionally output the value of $var in location /main after calling the two subrequests, so if value changes of $var in the subrequests can affect their parent request, we should see a new value output in location /main. The result of requesting /main is as follows:

$ curl 'http://localhost:8080/main'
foo: foo
bar: bar
main: main

Apparently, the assignments to variable $var in those two subrequests do not affect the main request /main at all. This successfully verifies that both the main request and its subrequests do own different copies of variable containers.

Shared Variable Containers among Requests

Unfortunately, subrequests initiated by certain Nginx modules do share variable containers with their parent requests, like those initiated by the 3rd-party module ngx_auth_request. Below is such an example:

location /main {
    set $var main;
    auth_request /sub;
    echo "main: $var";
}

location /sub {
    set $var sub;
    echo "sub: $var";
}

Here in location /main, we first assign the initial value main to variable $var, then fire a subrequest to /sub via the auth_request directive from the ngx_auth_request module, and finally output the value of $var. Note that in location /sub we intentionally overwrite the value of $var to sub. When accessing /main, we get

$ curl 'http://localhost:8080/main'
main: sub

Obviously, the value change of $var in the subrequest to /sub does affect the main request to /main. Thus the variable container of $var is indeed shared between the main request and the subrequest created by the ngx_auth_request module.

For the previous example, some readers might ask: "why doesn't the response body of the subrequest appear in the final output?" The answer is simple: it is just because the auth_request directive discards the response body of the subrequest it manages, and only checks the response status code of the subrequest. When the status code looks good, like 200, auth_request will just allow Nginx continue processing the main request; otherwise it will immediately abort the main request by returning a 403 error page, for example. In our example, the subrequest to /sub just return a 200 response implicitly created by the echo directive in location /sub.

Even though sharing variable containers among the main request and all its subrequests could make bidirectional data exchange easier, it could also lead to unexpected subtle issues that are hard to debug in real-world configurations. Because users often forget that a variable with the same name is actually used in some deeply embedded subrequest and just use it for something else in the main request, this variable could get unexpectedly modified during processing. Such bad side effects make many 3rd-party modules like ngx_echo, ngx_lua and ngx_srcache choose to disable the variable sharing behavior for subrequests by default.

Nginx Variables (06)

Built-in Variables in Subrequests

There are some subtleties involved in using Nginx built-in variables in the context of a subrequest. We will discuss the details in this section.

Built-in Variables Sensitive to the Subrequest Context

We already know that most built-in variables are not simple value containers. They behave differently than user variables by registering "get handlers" and/or "set handlers". Even when they do own a value container, they usually just use the container as a result cache for their "get handlers". The $args variable we discussed earlier, for example, just uses its "get handler" to return the URL query string for the current request. The current request here can also be a subrequest, so when reading $args in a subrequest, its "get handler" should naturally return the query string for the subrequest. Let's see such an example:

location /main {
    echo "main args: $args";
    echo_location /sub "a=1&b=2";
}

location /sub {
    echo "sub args: $args";
}

Here in the /main interface, we first echo out the value of $args for the current request, and then use echo_location to initiate a subrequest to /sub. It should be noted that here we give a second argument to the echo_location directive, to specify the URL query string for the subrequest being fired (the first argument is the URI for the subrequest, as we already know). Finally, we define the /sub interface and print out the value of $args in there. Querying the /main interface gives

$ curl 'http://localhost:8080/main?c=3'
main args: c=3
sub args: a=1&b=2

It is clear that when $args is read in the main request (to /main), its value is the URL query string of the main request; whereas when in the subrequest (to /foo), it is the query string of the subrequest, a=1&b=2. This behavior indeed matches our intuition.

Just like $args, when the built-in variable $uri is used in a subrequest, its "get handler" also returns the (decoded) URI of the current subrequest:

location /main {
    echo "main uri: $uri";
    echo_location /sub;
}

location /sub {
    echo "sub uri: $uri";
}

Below is the result of querying /main:

$ curl 'http://localhost:8080/main'
main uri: /main
sub uri: /sub

The output is what we would expect.

Built-in Variables for Main Requests Only

Unfortunately, not all built-in variables are sensitive to the context of subrequests. Several built-in variables always act on the main request even when they are used in a subrequest. The built-in variable $request_method is such an exception.

Whenever $request_method is read, we always get the request method name (such as GET and POST) for the main request, no matter whether the current request is a subrequest or not. Let's test it out:

location /main {
    echo "main method: $request_method";
    echo_location /sub;
}

location /sub {
    echo "sub method: $request_method";
}

In this example, the /main and /sub interfaces both output the value of $request_method. Meanwhile, we initiate a GET subrequest to /sub via the echo_location directive in /main. Now let's do a POST request to /main:

$ curl --data hello 'http://localhost:8080/main'
main method: POST
sub method: POST

Here we use the --data option of the curl utility to specify our POST request body, also this option makes curl use the POST method for the request. The test result turns out as we predicted: the variable $request_method is evaluated to the main request's method name, POST, despite its use in a GET subrequest.

Some readers might challenge our conclusion here by pointing out that we did not rule out the possibility that the value of $request_method got cached at its first reading in the main request and what we were seeing in the subrequest was actually the cached value that was evaluated earlier in the main request. This concern is unnecessary, however, because we have also learned that the variable container required by data caching (if any) is always bound to the current request, also the subrequests initiated by the ngx_echo module always disable variable container sharing with their parent requests. Back to the previous example, even if the built-in variable $request_method in the main request used the value container as the data cache (actually it does not), it cannot affect the subrequest by any means.

To further address the concern of these readers, let's slightly modify the previous example by putting the echo statement for $request_method in /main after the echo_location directive that runs the subrequest:

location /main {
    echo_location /sub;
    echo "main method: $request_method";
}

location /sub {
    echo "sub method: $request_method";
}

Let's test it again:

$ curl --data hello 'http://localhost:8080/main'
sub method: POST
main method: POST

No change in the output can be observed, except that the two output lines reversed the order (since we exchange the order of those two ngx_echo module's directives).

Consequently, we cannot obtain the method name of a subrequest by reading the $request_method variable. This is a common pitfall for freshmen when dealing with method names of subrequests. To overcome this limitation, we need to turn to the built-in variable $echo_request_method provided by the ngx_echo module:

location /main {
    echo "main method: $echo_request_method";
    echo_location /sub;
}

location /sub {
    echo "sub method: $echo_request_method";
}

We are finally getting what we want:

$ curl --data hello 'http://localhost:8080/main'
main method: POST
sub method: GET

Now within the subrequest, we get its own method name, GET, as expected, and the main request method remains POST.

Similar to $request_method, the built-in variable $request_uri also always returns the (non-decoded) URL for the main request. This is more understandable, however, because subrequests are essentially faked requests inside Nginx, which do not really take a non-decoded raw URL.

Variable Container Sharing and Value Caching Together

In the previous section, some of the readers were worried about the case that variable container sharing in subrequests and value caching for variable's "get handlers" were working together. If it were indeed the case, then it would be a nightmare because it would be really really hard to predict what is going on by just looking at the configuration file. In previous sections, we already learned that the subrequests initiated by the ngx_auth_request module are sharing the same variable containers with their parents, so we can maliciously construct such a horrible example:

map $uri $tag {
    default     0;
    /main       1;
    /sub        2;
}

server {
    listen 8080;

    location /main {
        auth_request /sub;
        echo "main tag: $tag";
    }

    location /sub {
        echo "sub tag: $tag";
    }
}

Here we use our old friend, the map directive, to map the value of the built-in variable $uri to our user variable $tag. When $uri takes the value /main, the value 1 is assigned to $tag; when $uri takes the value /sub, the value 2 is assigned instead to $tag; under all the other conditions, 0 is assigned. Next, in /main, we first initiate a subrequest to /sub by using the auth_request directive, and then output the value of $tag. And within /sub, we directly output the value of $tag. Guess what we will get when we access /main?

$ curl 'http://localhost:8080/main'
main tag: 2

Ouch! Didn't we map the value /main to 1? Why the actual output for /main is the value, 2, for /sub? What is going on here?

Actually it worked like this: our $tag variable was first read in the subrequest to /sub, and the "get handler" registered by map computed the value 2 for $tag in that context (because $uri was /sub in the subrequest) and the value 2 got cached in the value container of $tag from then on. Because the parent request shared the same container as the subrequest created by auth_request, when the parent request read $tag later (after the subrequest was finished), the cached value 2 was directly returned! Such results can indeed be very surprising at first glance.

From this example, we can conclude again that it can hardly be a good idea to enable variable container sharing in subrequests.

Nginx Variables (07)

Special Value "Invalid" and "Not Found"

We have mentioned that the values of Nginx variables can only be of one single type, that is, the string type, but variables could also have no meaningful values at all. Variables without any meaningful values still take a special value though. There are two possible special values: "invalid" and "not found".

For example, when a user variable $foo is created but not assigned yet, $foo takes the special value of "invalid". And when the current URL query string does not have the XXX argument at all, the built-in variable $arg_XXX takes the special value of "not found".

Both "invalid" and "not found" are special values, completely different from an empty string value (""). This is very similar to those distinct special values in some dynamic programing languages, like undef in Perl, nil in Lua, and null in JavaScript.

We have seen earlier that an uninitialized variable is evaluated to an empty string when used in an interpolated string, its real value, however, is not an empty string at all. It is the "get handler" registered by the set directive that automatically converts the "invalid" special value into an empty string. To verify this, let's return to the example we have discussed before:

location /foo {
    echo "foo = [$foo]";
}

location /bar {
    set $foo 32;
    echo "foo = [$foo]";
}

When accessing /foo, the user variable $foo is uninitialized when used in the interpolated string for the echo directive. The output shows that the variable is evaluated to an empty string:

$ curl 'http://localhost:8080/foo'
foo = []

From the output, the uninitialized $foo variable behaves just like taking an empty string value. But careful readers should have already noticed that, for the request above, there is a warning in the Nginx error log file (which is logs/error.log by default):

[warn] 5765#0: *1 using uninitialized "foo" variable, ...

Who on earth generates this warning? The answer is the "get handler" of $foo, registered by the set directive. When $foo is read, Nginx first checks the value in its container but sees the "invalid" special value, then Nginx decides to continue running $foo's "get handler", which first prints the warning (as shown above) and then returns an empty string value, which thereafter gets cached in $foo's value container.

Careful readers should have identified that this process for user variables is exactly the same as the mechanism we discussed earlier for built-in variables involving "get handlers" and result caching in value containers. Yes, it is the same mechanism in action. It is also worth noting that only the "invalid" special value will trigger the "get handler" invocation in the Nginx core while "not found" will not.

The warning message above usually indicates a typo in the variable name or misuse of uninitialized variables, not necessarily in the context of an interpolated string. Because of the existence of value caching in the variable container, this warning will not get printed multiple times in the lifetime of the current request. Also, the ngx_rewrite module provides the uninitialized_variable_warn directive for disabling this warning altogether.

Testing Special Values of Nginx Variables in Lua

As we have just mentioned, the built-in variable $arg_XXX takes the special value "not found" when the URL argument XXX does not exist, but unfortunately, it is not easy to distinguish it from the empty string value directly in the Nginx configuration file, for example:

location /test {
    echo "name: [$arg_name]";
}

Here we intentionally omit the URL argument name in our request:

$ curl 'http://localhost:8080/test'
name: []

We can see that we are still getting an empty string value, because this time it is the Nginx "script engine" that automatically converts the "not found" special value to an empty string when performing variable interpolation.

Then how can we test the special value "not found"? Or in other words, how can we distinguish it from normal empty string values? Obviously, in the following example, the URL argument name does take an ordinary value, which is a true empty string:

$ curl 'http://localhost:8080/test?name='
name: []

But we cannot really differentiate this from the earlier case that does not mention the name argument at all.

Luckily, we can easily achieve this in Lua by means of the 3rd-party module ngx_lua. Please look at the following example:

location /test {
    content_by_lua '
        if ngx.var.arg_name == nil then
            ngx.say("name: missing")
        else
            ngx.say("name: [", ngx.var.arg_name, "]")
        end
    ';
}

This example is very close to the previous one in terms of functionality. We use the content_by_lua directive from the ngx_lua module to embed a small piece of our own Lua code to test against the special value of the Nginx variable $arg_name. When $arg_name takes a special value (either "not found" or "invalid"), we will get the following output when requesting /foo:

$ curl 'http://localhost:8080/test'
name: missing

This is our first time meeting the ngx_lua module, which deserves a brief introduction. This module embeds the Lua language interpreter (or LuaJIT's Just-in-Time compiler) into the Nginx core, to allow Nginx users directly run their own Lua programs inside the server. The user can choose to insert her Lua code into different running phases of the server, to fulfill different requirements. Such Lua code are either specified directly as literal strings in the Nginx configuration file, or reside in external .lua source files (or Lua binary bytecode files) whose paths are specified in the Nginx configuration.

Back to our example, we cannot directly write something like $arg_name in our Lua code. Instead, we reference Nginx variables in Lua by means of the ngx.var API provided by the ngx_lua module. For example, to reference the Nginx variable $VARIABLE in Lua, we just write ngx.var.VARIABLE. When the Nginx variable $arg_name takes the special value "not found" (or "invalid"), ngx.var.arg_name is evaluated to the nil value in the Lua world. It should also be noting that we use the Lua function ngx.say to print out the response body contents, which is functionally equivalent to the echo directive we are already very familiar with.

If we provide a name URI argument that takes an empty value in the request, the output is now very different:

$ curl 'http://localhost:8080/test?name='
name: []

In this test, the value of the Nginx variable $arg_name is a true empty string, neither "not found" nor "invalid". So in Lua, the expression ngx.var.arg_name evaluates to the Lua empty string (""), clearly distinguished from the Lua nil value in the previous test.

This differentiation is important in certain application scenarios. For instance, some web services have to decide whether to use a column value to filter the data set by checking the existence of the corresponding URI argument. For these serives, when the name URI argument is absent, the whole data set are just returned; when the name argument takes an empty value, however, only those records that take an empty value are returned.

It is worth mentioning a few limitations in the standard $arg_XXX variable. Consider using the following request to test /test in our previous example using Lua:

$ curl 'http://localhost:8080/test?name'
name: missing

Now the $arg_name variable still reads the "not found" special value, which is apparently counter-intuitive. Additionally, when multiple URI arguments with the same name are specified in the request, $arg_XXX just returns the first value of the argument, discarding other values silently:

$ curl 'http://localhost:8080/test?name=Tom&name=Jim&name=Bob'
name: [Tom]

To solve these problems, we can use the Lua function ngx.req.get_uri_args provided by the ngx_lua module instead.

Nginx Variables (08)

In (02) we mentioned that another category of builtin variables $cookie_XXX are like $arg_XXX. Similarly when there exist no cookie named XXX, its corresponding Nginx variable $cookie_XXX has non-value "not found".

location /test {
    content_by_lua '
        if ngx.var.cookie_user == nil then
            ngx.say("cookie user: missing")
        else
            ngx.say("cookie user: [", ngx.var.cookie_user, "]")
        end
    ';
}

The curl utility offers the --cookie name=value option, which designates name=value as a cookie of its request (by adding the Cookie header). Let's test a few cases containing cookies.

$ curl --cookie user=agentzh 'http://localhost:8080/test'
cookie user: [agentzh]

$ curl --cookie user= 'http://localhost:8080/test'
cookie user: []

$ curl 'http://localhost:8080/test'
cookie user: missing

As expected, when cookie user does not exist, Lua variable ngx.var. cookie_user is nil. So we have successfully distinguished the case with empty string and the case with non-value.

A nice add-on with module ngx_lua is when lua references an undeclared variable of Nginx, the variable is nil and Nginx will not aborts it loading as before.

location /test {
    content_by_lua '
        ngx.say("$blah = ", ngx.var.blah)
    ';
}

User variable $blah is never declared in the Nginx configuration nginx. conf, but it is referenced as ngx.var.blah in Lua code. Nginx can be started still, because when Nginx loads its configuration, Lua code is only compiled but not executed, So Nginx has no idea a variable $blah is referenced. When lua command is executed in run time by command content_by_lua, the lua variable is evaluated as nil. Module ngx_lua and its command ngx.say will convert Lua nil into string "nil" before it is printed, so the output will be:

curl 'http://localhost:8080/test'
$blah = nil

This is indeed what we want.

We should have noticed also, when command content_by_lua includes $blah in its parameter, it is never evaluated as "variable interpolation" does (otherwise Nginx will be complaining variable $blah is not declared). This is because command content_by_lua does not really support "variable interpolation" . As we have said earlier in (01), Nginx command does not necessarily support "variable interpolation" and it is entirely up to the module implementation.

It's actually difficult to return an "invalid" non-value. As we learnt in (07), variables which are declared but not initialized by set has non-value "invalid". However, as soon as the variable is devalued, the "get handler" is executed and an empty string is computed and cached, so eventually empty string is returned, not the "invalid" non-value. Following lua code can prove this:

location /foo {
    content_by_lua '
        if ngx.var.foo == nil then
            ngx.say("$foo is nil")
        else
            ngx.say("$foo = [", ngx.var.foo, "]")
        end
    ';
}

location /bar {
    set $foo 32;
    echo "foo = [$foo]";
}

By requesting to location /foo we have:

$ curl 'http://localhost:8080/foo'
$foo = []

As we can tell, when Lua references uninitialized Nginx variable $foo, it obtains empty string.

Last not the least, we should have pointed out, although Nginx variable can have only strings as valid value. The 3rd party module ngx_array_var can support array like operations for Nginx variable.Here is an example:

location /test {
    array_split "," $arg_names to=$array;
    array_map "[$array_it]" $array;
    array_join " " $array to=$res;

    echo $res;
}

Module ngx_array_var provides commands array_split, array_map and array_join. The semantics is pretty close to the builtin functions split, map and join in Perl (other languages support similar functionalities too). Now let's check what happens when location /test is requested:

$ curl 'http://localhost:8080/test?names=Tom,Jim,Bob'
[Tom] [Jim] [Bob]

Clearly module ngx_array_var make it easier to handle inputs with variable length, such as the URL parameter name, which composes of multiple comma delimited names. Still we must emphasize, module ngx_lua is a much better choice to execute this kind of complicated tasks, usually it is more flexible and maintainable.

Till now the tutorial covers the Nginx variable. In the process we have been discussing many builtin and 3rd party Nginx modules, these modules help us better understand features and internals of Nginx variable by composing various mini constructs. Later on the tutorial will be covering more details of those modules.

With these examples, we should understand that Nginx variable plays a key role in the Nginx mini language: variables are the ways and means Nginx communicate internally, they contain all the needed information (including the request information) and they are the cornerstone elements which bridge every other Nginx modules. Nginx variables are everywhere in the coming tutorials, understand them is absolutely necessary.

In the coming tutorial "Nginx Directive Execution Order", we will be discussing in detail the Nginx execution ordering and the phases every request traverses. It' s indispensable to understand them since for the Nginx mini language, the ordering of writing can be dramatically different from the ordering of executing in the timeline. It usually confuses many Nginx users.

Nginx directive execution order (01)

When there are multiple Nginx module commands in a location directive, the execution order can be different from what you expect. Busy Nginx users who attempt to configure Nginx by "trial and error" may be very confused by this behavior. This series is to uncover the mysteries and help you better understand the execution ordering behind the scenes.

We start with a confused example:

? location /test {
?     set $a 32;
?     echo $a;
?
?     set $a 56;
?     echo $a;
? }

Clearly, we'd expect to output 32, followed by 56. Because variable $a has been reset after command echo "is executed". Really? the reality is:

$ curl 'http://localhost:8080/test'
56
56

Wow, statement set $a 56 must have had been executed before the first echo $a command, but why? Is it a Nginx bug?

No, this is not an Nginx bug. When Nginx handles every request, the execution follows a few predefined phases.

There can be altogether 11 phases when Nginx handles a request, let's start with three most common ones: rewrite, access and content (The other phases will be addressed later.)

Usually an Nginx module and its commands register their execution in only one of those phases. For example command set runs in phase rewrite, and command echo runs in phase content. Since phase rewrite occurs before phase content for every request processing, its commands are executed earlier as well. Therefore, command set always gets executed before command echo within one location directive, regardless of their statement ordering in the configuration.

Back to our example:

set $a 32;
echo $a;

set $a 56;
echo $a;

The actual execution ordering is:

set $a 32;
set $a 56;
echo $a;
echo $a;

It's clear now, two commands set are executed in phase rewrite, two commands echo are executed afterwards in phase content. Commands in different phases cannot be executed back and forth.

To prove this, we can enable Nginx's "debug log".

If you have not worked with Nginx "debug log" before, here is a brief introduction. The "debug log" is disabled by default because performance is degraded when it is enabled. To enable "debug log" you must reconfigure and recompile Nginx, and set the --with-debug option for the package's ./configure script. When building under Linux or Mac OS X from source:

tar xvf nginx-1.0.10.tar.gz
cd nginx-1.0.10/
./configure --with-debug
make
sudo make install

In case the package ngx_openresty is used. The option --with-debug can be used with its ./configure script as well.

After we rebuild the Nginx debug binary with --with-debug option, we still need to explicitly use the debug log level (it's the lowest level) for command error_log, in Nginx configuration:

error_log logs/error.log debug;

debug, the second parameter of command error_log is crucial. Its first parameter is error log's file path, logs/error.log. Certainly we can use another file path but do remember the location because we need to check its content right away.

Now let's restart Nginx (Attention, it's not enough to reload Nginx. It needs to be killed and restarted because we've updated the Nginx binary). Then we can send the request again:

$ curl 'http://localhost:8080/test'
56
56

It's time to check Nginx's error log, which is becoming a lot more verbose (more than 700 lines for the request in my setup). So let's apply the grep command to filter what we would be interested:

grep -E 'http (output filter|script (set|value))' logs/error.log

It's approximately like below (for clearness, I've edited the grep output and remove its timestamp etc) :

[debug] 5363#0: *1 http script value: "32"
[debug] 5363#0: *1 http script set $a
[debug] 5363#0: *1 http script value: "56"
[debug] 5363#0: *1 http script set $a
[debug] 5363#0: *1 http output filter "/test?"
[debug] 5363#0: *1 http output filter "/test?"
[debug] 5363#0: *1 http output filter "/test?"

It barely makes any senses, does it? So let me interpret. Command set dumps two lines of debug info which start with http script, the first line tells the value which command set has possessed, and the second line being the variable name it will be given to, so for the leading filtered log:

[debug] 5363#0: *1 http script value: "32"
[debug] 5363#0: *1 http script set $a

These two lines are generated by this statement:

set $a 32;

And for the following filtered log:

[debug] 5363#0: *1 http script value: "56"
[debug] 5363#0: *1 http script set $a

They are generated by this statement:

set $a 56;

Besides, whenever Nginx outputs its response, its "output filter" will be executed, our favorite command echo is no exception. As soon as Nginx's "output filter" is executed, it generates debug log like below:

[debug] 5363#0: *1 http output filter "/test?"

Of course the debug log might not have "/test?", since this part corresponds to the actual request URI. By putting everything together, we can finally conclude those two commands set are indeed executed before the other two commands echo.

Considerate readers must have noticed that there are three lines of http output filter debug log but we were having only two output commands echo. In fact, only the first two debug logs are generated by the two echo statements. The last debug log is added by module ngx_echo because it needs to flag the end of output. The flag operation itself causes Nginx's "output filter" to be executed again. Many modules including ngx_proxy has similar behavior, when they need to give output data.

All right, there are no surprises with those duplicated 56 outputs. We are not given a chance to execute echo in front of the second set command. Luckily, we can still achieve this with a few techniques:

location /test {
    set $a 32;
    set $saved_a $a;
    set $a 56;

    echo $saved_a;
    echo $a;
}

Now we have what we have wanted:

$ curl 'http://localhost:8080/test'
32
56

With the help of another user variable $saved_a, the value of $a is saved before it is overwritten. Be careful, the execution order of multiple set commands are ensured to be like their order of writing by module . Similarly, module ngx_echo ensures multiple echo commands get executed in the same order of their writing.

If we recall examples in Nginx Variables, this technique has been applied extensively. It bypasses the execution ordering difficulties introduced by Nginx phased processing.

You might need to ask : "how would I know the phase a Nginx command belongs to ?" Indeed, the answer is RTFD. (Surely advanced developers can examine the C source code directly). Many module marks explicitly its applicable phase in the module's documentation, such as command echo writes below in its documentation:

phase: content

It says the command is executed in phase content. If you encounters a module which misses the applicable phase in the document, you can write to its authors right away and ask for it. However, we shall be reminded, not every command has an applicable phase. Examples are command geo introduced in Nginx Variables (01) and command map introduced in Nginx Variables (04). These commands, who have no explicit applicable phase, are declarative and unrelated to the conception of execution ordering. Igor Sysoev, the author of Nginx, has made the statements a few times publicly, that Nginx mini language in its configuration is "declarative" not "procedural".

Nginx directive execution order (02)

We've just learnt, all set commands within location are executed in rewrite phase. In fact, almost all commands implemented by module rewrite are executed in rewrite phase under the specific context. Commad rewrite introduced in Nginx Variables (02) is one of them. However, we shall point out that when these commands are found in server directive, they will be executed in an earlier phase we've not addressed: the server rewrite phase.

Command set_unescape_uri, introduced in Nginx Variables (02) is also executed in rewrite phase. Actually, commands implemented by module ngx_set_misc can mix with commands implemented by module ngx_rewrite and the execution ordering is ensured. Let's check an example:

location /test {
    set $a "hello%20world";
    set_unescape_uri $b $a;
    set $c "$b!";

    echo $c;
}

By sending a request accordingly we have:

$ curl 'http://localhost:8080/test'
hello world!

Apparently, the set_unescape_uri command and its neighboring set commands are all executed in the order of their writing.

To further demonstrate our assertion, we check again Nginx "debug log" (in case it's unclear for you how to check "debug log", please reference steps found in (01)).

grep -E 'http script (value|copy|set)' logs/error.log

The debug logs are filtered as:

[debug] 11167#0: *1 http script value: "hello%20world"
[debug] 11167#0: *1 http script set $a
[debug] 11167#0: *1 http script value (post filter): "hello world"
[debug] 11167#0: *1 http script set $b
[debug] 11167#0: *1 http script copy: "!"
[debug] 11167#0: *1 http script set $c

The leading two lines:

[debug] 11167#0: *1 http script value: "hello%20world"
[debug] 11167#0: *1 http script set $a

They correspond to the command

set $a "hello%20world";

The following two lines:

[debug] 11167#0: *1 http script value (post filter): "hello world"
[debug] 11167#0: *1 http script set $b

They are generated by command

set_unescape_uri $b $a;

There are minor differences in the first line, if we compare to the logs generated by command set: the "(post filter)" addition. In the end of the line, URL decoding has successfully executed as we wish. "hello%20world" is decoded as "hello world".

The last two lines of debug log:

[debug] 11167#0: *1 http script copy: "!"
[debug] 11167#0: *1 http script set $c

They are generated by the last set command

set $c "$b!";

As you might have noticed, since "variable interpolation" is evaluated when variable $c is declared and initialized, the debug log starts with http script copy. In the end of the log it is the string constant "!" to be concatenated.

With the log information, it's fairly easy to tell the command execution ordering:

set $a "hello%20world";
set_unescape_uri $b $a;
set $c "$b!";

It is a perfect match to the statements ordering.

Just like the commands implemented in module ngx_set_misc, command set_by_lua implemented in 3rd party module ngx_lua, can mix with commands of module ngx_rewrite as well. As introduced in Nginx Variables (07), command set_by_lua supports computation with given Lua code, and assigns the computed result to a Nginx variable. As command set does, command set_by_lua declares Nginx variable before initialization if the variable does not exist.

Let's check a mixed example which comprises command set_by_lua and set:

location /test {
    set $a 32;
    set $b 56;
    set_by_lua $c "return ngx.var.a + ngx.var.b";
    set $equation "$a + $b = $c";

    echo $equation;
}

Variable $a and $b are initialized with numerical value 32 and 56 respectively, then command set_by_lua is used together with given Lua code to compute the sum of $a and $b. Variable $c is initialized with the computed value. Finally, variables $a, $b and $c are concatenated by "variable interpolation" and assigns the result to variable $equation, which is printed by command echo.

We shall pay attention to a few points in the example: Firstly Nginx variable $VARIABLE is referenced as ngx.var.VARIABLE in Lua code. Secondly, since Nginx variables are strings, the value of variable ngx.var.a and ngx.var.b are actually strings "32" and "56", however they are automatically converted to numerical values by Lua in the addition operation. Thirdly Lua code returns to Nginx variable $c the computed sum value by statement return. Finally when Lua code returns, it actually converts the numerical value back to string. (because string is the only valid value for Nginx variable)

The actual output meets our expectation:

$ curl 'http://localhost:8080/test'
32 + 56 = 88

This in fact asserts that command set_by_lua can mix with commands implemented by module ngx_rewrite, such as set.

Many other 3rd party modules support the mix with module ngx_rewrite as well. The examples include module ngx_array_var, discussed in Nginx Variables (08) and module ngx_encrypted_session, which encrypts sessions. The latter will be studied in detail shortly.

Since builtin module ngx_rewrite is virtually indispensable, it's a great advantage for the 3rd party module has the caliber of being mixed with. Truth is, all of those 3rd party modules have adopted a special technique, which allows the "injection" of their execution into commands of module rewrite (with the help of a 3rd party module ngx_devel_kit developed by Marcus Clyne). For the rest regular 3rd party modules, which also register their execution in phase rewrite, their commands are executed separately from module ngx_rewrite in runtime. In fact, it's hardly accurate to tell the commands execution ordering in between different modules (strictly speaking they are usually executed in the order of loading, but exception does exist). For example both modules, A and B register their commands to be executed in phase rewrite, then it is either the case in which commands of A are executed followed by B or the other complete way around. Unless it is explicitly documented, we cannot rely on the uncertain ordering in our configurations.

Nginx directive execution order (03)

As discussed earlier, unless special techniques are utilized as module ngx_set_misc does, a module can not mix its commands with ngx_rewrite, and expects the correct execution order. Even if the commands are registered in the rewrite phase as well. We can demonstrate with some examples.

3rd party module ngx_headers_more provides a few commands, which deal with the current request header and response header. One of them is more_set_input_header. The command can modify a given request header in rewrite phase (or add the specific header if it's not available in current request). As described in its documentation, the command always executes in the end of rewrite phase:

phase: rewrite tail

Being terse though, rewrite tail means the end of phase rewrite.

Since it executes in the end of phase rewrite, the implication is its execution is always after the commands implemented in module ngx_rewrite. Even if it is written at the very beginning:

? location /test {
?     set $value dog;
?     more_set_input_headers "X-Species: $value";
?     set $value cat;
?
?     echo "X-Species: $http_x_species";
? }

As briefly introduced in Nginx Variables (02), Builtin variable $http_XXX has the header XXX for the current request. We must be careful though, variable <$http_XXX> matches to the normalized request header, i.e. it lower cases capital letters and turns minus - into underscore _ for the request header names. Therefore variable $http_x_species can successfully catches the request header X-Species, which is declared by command more_set_input_header.

Because of the statement ordering, we might have mistakenly concluded header X-Species has the value dog when /test is requested. But the actual result is different:

$ curl 'http://localhost:8080/test'
X-Species: cat

Clearly, statement set $value cat is executed earlier than more_set_input_headers, although it is written afterwards.

This example tells us that commands of different modules are executed independently from each other, even if they are all registered in the same processing phase. (unless it is implemented as module ngx_set_misc, whose commands are specifically tuned with module ngx_rewrite). In other words, every processing phase is further divided into sub-phases by Nginx modules.

Similar to more_set_input_headers, command rewrite_by_lua provided by 3rd party module ngx_lua execute in the end of rewrite phase as well. We can verify this:

? location /test {
?     set $a 1;
?     rewrite_by_lua "ngx.var.a = ngx.var.a + 1";
?     set $a 56;
?
?     echo $a;
? }

By using Lua code specified by command rewrite_by_lua Nginx variable $a is incremented by 1.We might have expected the result be 56 if we are looking at the writing sequence.The actual result is 57 because command is always executed after all the set statements.

$ curl 'http://localhost:8080/test'
57

Admittedly command rewrite_by_lua has different behavior than command set_by_lua, which is discussed in (02).

Out of sheer curiosity, we shall ask immediately that what would be execution ordering in between more_set_input_headers and rewrite_by_lua, since they both ride on rewrite tail? The answer is : undefined. We must avoid a configuration which relies on their execution orders.

Nginx phase rewrite is a rather early processing phase. Usually commands registered in this phase execute various rewrite tasks on the request (for example rewrite the URL or the URL parameters), the commands might also declare and initialize Nginx variables which are needed in the subsequent handling. Certainly, one cannot forbid others to complicate themselves by checking the request body, or visit a database etc. After all, command like rewrite_by_lua offers the caliber to stuff in any potentially mind twisted Lua code.

After phase rewrite, Nginx has another phase called access. The commands provided by 3rd party module ngx_auth_request, which is discussed in Nginx Variables (05), execute in phase access. Commands registered in access phase mostly carry out ACL functionalities, such as guarding user clearance, checking user origins, examining source IP validity etc.

For example command allow and deny provided by builtin module ngx_access can control which IP addresses have the privileges to visit, or which IP addresses are rejected:

location /hello {
    allow 127.0.0.1;
    deny all;

    echo "hello world";
}

Location /hello allows visit from localhost (IP address 127.0.0.1) and reject requests from all other IP addresses (returns http error 403) The rules defined by ngx_access commands are asserted in the writing sequence. Once one rule is matched, the assertion stops and all the rest allow or deny commands are ignored. If no rule is matched, handling continues in the following statements. If the matched rule is deny, handing is aborted and error 403 is returned immediately. In our example, request issued from localhost matches to the rule allow 127.0.0.1 and handing continues to the other statements, however request issued from every other IP addresses will match rule deny all handling is therefore aborted and error 403 is returned.

We can give it a test, by sending request from localhost:

$ curl 'http://localhost:8080/hello'
hello world

If request is sent from another machine (suppose Nginx runs on IP 192.168.1.101 ) we have:

$ curl 'http://192.168.1.101:8080/hello'
<html>
<head><title>403 Forbidden</title></head>
<body bgcolor="white">
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx</center>
</body>
</html>

By the way, module ngx_access supports the "CIDR notation" to designate a sub-network. For example 169.200.179.4/24 represents the sub-network which has the routing prefix 169.200.179.0 (or subnet mask 255.255. 255.0)

Because commands of module ngx_access execute in access phase, and phase access is behind rewrite phase. So for those commands we have been discussing, regardless of the writing order they always execute in rewrite phase, which is earlier than allow or deny. Keep this in mind, we shall try our best to keep the writing and execution order consistent.

Nginx directive execution order (04)

Module ngx_lua implements another command access_by_lua. The command allows lua code to be executed in the end of access phase, which means it always executes after allow and deny even they belong to the same phase. In many cases, we examine the request' s source IP address with ngx_access, and use command access_by_lua to execute more complicated verifications with Lua. For example by querying a database or other backend services, the current user's identity and privileges are examined.

We can check a simple example, which uses command access_by_lua to implement the IP filtering functionality of module ngx_access

location /hello {
    access_by_lua '
        if ngx.var.remote_addr == "127.0.0.1" then
            return
        end

        ngx.exit(403)
    ';

    echo "hello world";
}

Nginx's builtin variable $remote_addr is referenced in Lua to get the client's IP address. Then Lua statement if is used to determine if the address equals 127.0.0.1. Lua returns if it equals, Nginx thus continues the subsequent handling (including the content phase where command echo applies to). If it is not the localhost address, current handling is aborted by using ngx_lua module's Lua function ngx.exit Client gets a http error 403.

The example is equivalent to the other example using ngx_access module in terms of functionality, which was discussed in (03):

location /hello {
    allow 127.0.0.1;
    deny all;

    echo "hello world";
}

However we shall point out, performance wise the two still have differences. Module ngx_access performs better because it is specifically implemented as a Nginx module in C.

We can measure the performance differences of the two. After all, performance is what we are after by using Nginx. On the other hand, it's absolutely necessary to be equipped with measuring techniques, because only actual data distinguishes amateurs and professionals. In fact, both ngx_lua and ngx_access perform pretty good for IP filtering. To minimize measuring errors we could measure directly the elapsed time of access phase. Traditionally, this means hacking Nginx source code with timing code and statistical code, or recompile Nginx binary so that it can be monitored by specific profiling tools like GNU gprof.

We are lucky, because current releases of Solaris, Mac OSX or FreeBSD offer a system utility dtrace, which allows micro monitoring of user process in terms of performance (and functionality as well). The tool spares us from hacking source code or recompilation with profiling. Let's demonstrate the measuring scenario on the MacBook Air because dtrace is available since Mac OS X 10.5

First, open the Terminal application of Mac OSX, change to your preferable path and create a file named as nginx-access-time.d, edit the file with following content:

#!/usr/bin/env dtrace -s

pid$1::ngx_http_handler:entry
{
    elapsed = 0;
}

pid$1::ngx_http_core_access_phase:entry
{
    begin = timestamp;
}

pid$1::ngx_http_core_access_phase:return
/begin > 0/
{
    elapsed += timestamp - begin;
    begin = 0;
}

pid$1::ngx_http_finalize_request:return
/elapsed > 0/
{
    @elapsed = avg(elapsed);
    elapsed = 0;
}

Save the file and make it executable.

$ chmod +x ./nginx-access-time.d

The .d file actually contains code written in D language offered by utility dtrace (attention, the D language is not the other D language, which is advocated by Walter Bright for a better C++). So far we cannot really explain in detail the code because it requires a thorough understanding of Nginx internals. Anyway we shall be clear of the code's purpose: measure requests being handled by specific Nginx worker process and calculate the average time elapsed in access phase.

Now we can get the D script running. The script takes a command line parameter, which is the process id (pid) of Nginx worker. Since Nginx supports multiple worker processes and the requests can be randomly handled by anyone of them, we'd like to configure Nginx in its configuration nginx.conf so that only one worker is requested.

worker_processes 1;

After Nginx binary is restarted, the worker process id can be obtained by command ps.

$ ps ax|grep nginx|grep worker|grep -v grep

Typically we have:

10975   ??  S      0:34.28 nginx: worker process

10975 is my Nginx worker pid. In case you have multiple lines, you must have started multiple Nginx server instances or the current Nginx server has started multiple worker processes.

Then as root, script nginx-access-time.d is executed with the worker pid

$ sudo ./nginx-access-time.d 10975

We shall have one output message if everything goes OK.

dtrace: script './nginx-access-time.d' matched 4 probes

The message says our D script has successfully deployed 4 probes on the target process. Then the script is ready to trace process 10975 constantly.

Let's open another Terminal, and send multiple requests with curl to our monitored process

$ curl 'http://localhost:8080/hello'
hello world

$ curl 'http://localhost:8080/hello'
hello world

Back to our Terminal where D script is running, press keys Ctrl-C to interrupt it. When the script bails out it prints on console the statistical result. For example my console has following result:

$ sudo ./nginx-access-time.d 10975
dtrace: script './nginx-access-time.d' matched 4 probes
^C
       19219

The final 19219 is the average time elapsed in access phase in nano seconds (1 second = 1000x1000x1000 nano seconds)

Done with the steps. We can run the nginx-access-time.d script to calculate average elapsed time in phase access for three different Nginx setups respectively. They are IP filtering with module ngx_access, IP filtering with command access_by_lua, and finally no filtering for access phase. The last result helps eliminate the side effect caused by probes or other "systematic errors". Besides, we can use traffic loader tools such as ab to sends half a million requests to minimize "random errors", as below:

$ ab -k -c1 -n100000 'http://127.0.0.1:8080/hello'

Therefore the statistical result of D script is as close as possible to the "actual" time.

In the Mac OSX, a typical run has following results:

ngx_access                   18146
access_by_lua                35011
no filtering                 15887

We minus the last value from the former two:

ngx_access                2259
access_by_lua            19124

Well, module ngx_access out performs command access_by_lua by a magnitude, as we might have expected. Still the absolute difference is tiny. For the Intel Core2Due 1.86 GHz CPU of mine, there is only a few micro seconds.

In fact the access_by_lua example can be further optimized using builtin variable $binary_remote_addr. This variable has the IP address in binary form whereas variable $remote_addr has the address in a longer string format. Shorter address can be compared quicker when Lua executes its string operations.

Be careful, if "debug log" is enabled as introduced in (01) the computed elapsed time will increase dramatically, because "debug log" has a huge overhead.

Nginx directive execution order (05)

content is by all means the most significant phase in Nginx's request handling, because commands running in the phase have the responsibility to generate "content" and output HTTP response. Because of its importance, Nginx has a rich set of commands running in it. The commands include echo, echo_exec, proxy_pass, echo_location, content_by_lua, which were discussed in Nginx Variables (02), Nginx Variables (03), Nginx Variables (05) and Nginx Variables (07) respectively.

content is a phase which runs later than rewrite and access. Therefore its commands always execute in the end when they are used together with commands of rewrite and access.

location /test {
    # rewrite phase
    set $age 1;
    rewrite_by_lua "ngx.var.age = ngx.var.age + 1";

    # access phase
    deny 10.32.168.49;
    access_by_lua "ngx.var.age = ngx.var.age * 3";

    # content phase
    echo "age = $age";
}

This is a perfect example, in which commands are executed in an exact sequence as they are written. The testing result matches to our expectations too.

$ curl 'http://localhost:8080/test'
age = 6

In fact, the commands' writing order can be completely shuffled and it won't have any impact to their execution sequence. Command set, which is implemented by module ngx_rewrite, executes in rewrite phase. Command rewrite_by_lua from module ngx_lua executes in the end of rewrite phase. Command deny from module ngx_access executes in access phase. Command access_by_lua from module ngx_lua executes in the end of access phase. Finally, our favorite command echo, implemented by module ngx_echo, executes in content phase.

The example also demonstrates the collaborating in between commands running on each different Nginx phase. In the process, Nginx variable is the data carrier interconnecting commands and modules. The execution order of these commands is largely decided by the phase each applies to.

As matter of fact, multiple commands from different modules could coexist in phase rewrite and access. As the example shows, command set and command rewrite_by_lua both belong to phase rewrite. Command deny and command access_by_lua both belong to phase access. However it is not the same story for phase content.

Most modules, when they implement commands for phase content, they are actually inserting "content handler" for the current location directive, however there can be one and only one "content handler" for a location. So only one module could beat the rest when multiple modules are contending the role. Consider following problematic example:

? location /test {
?     echo hello;
?     content_by_lua 'ngx.say("world")';
? }

Command echo from module ngx_echo and command content_by_lua from module ngx_lua both execute in phase content. But only one of them could successfully become "content handler":

$ curl 'http://localhost:8080/test'
world

Our test indicates, that the winner is content_by_lua although it is written afterwards, and command echo never really has a chance to run. We cannot be assured which module wins in the circumstance. For example, module ngx_echo wins and the output becomes hello if we swap the content_by_lua and echo statements. So we shall avoid to use multiple commands for phase content, if the commands are implemented by different modules.

The example can be modified by replacing command content_by_lua with command echo and we will get what we need:

location /test {
    echo hello;
    echo world;
}

Again test proves:

$ curl 'http://localhost:8080/test'
hello
world

We can use multiple echo commands, there is no problem with this because they all belong to module ngx_echo. Module ngx_echo regulates the execution ordering of them. Be careful though, not every module supports the commands being executed multiple times within one location. Command content_by_lua for an instance, can be used only once, so following example is incorrect:

? location /test {
?     content_by_lua 'ngx.say("hello")';
?     content_by_lua 'ngx.say("world")';
? }

Nginx dumps error for the configuration:

[emerg] "content_by_lua" directive is duplicate ...

The correct way of doing it is:

location /test {
    content_by_lua 'ngx.say("hello") ngx.say("world")';
}

Instead of using twice the content_by_lua command in location, the approach is to call function ngx.say twice in the Lua code, which is executed by command content_by_lua

Similarly, command proxy_pass from module ngx_proxy cannot coexist with command echo within one location because they both execute in content phase. Many Nginx newbies make following mistake:

? location /test {
?     echo "before...";
?     proxy_pass http://127.0.0.1:8080/foo;
?     echo "after...";
? }
?
? location /foo {
?     echo "contents to be proxied";
? }

The example tries to output strings "before..." and "after..." with command echo before and after module ngx_proxy returns its content. However only one module could execute in content. The test indicates module ngx_proxy wins and command echo from module ngx_echo never runs

$ curl 'http://localhost:8080/test'
contents to be proxied

To implement what the example had wanted to, we shall use two other commands provided by module ngx_echo, echo_before_body and echo_after_body:

location /test {
    echo_before_body "before...";
    proxy_pass http://127.0.0.1:8080/foo;
    echo_after_body "after...";
}

location /foo {
    echo "contents to be proxied";
}

Test tells we make it:

$ curl 'http://localhost:8080/test'
before...
contents to be proxied
after...

The reason commands echo_before_body and echo_after_body could coexist with other modules in content phase, is they are not "content handler" but "output filter" of Nginx. Back in (01) when we examine the "debug log" generated by command echo , we've learnt Nginx calls its "output filter" whenever Nginx outputs data. So that module ngx_echo takes the advantage of it to modify content generated by module ngx_proxy (by adding surrounding content). We shall point out though, "output filter" is not one of those 11 phases mentioned in (01) (many phases could trigger "output filter" when they output data). Still it's perfectly all right to document commands echo_before_body and echo_after_body as following:

phase: output filter

It means the command executes in "output filter".

Nginx directive execution order (06)

We've learnt in (05) that when a command executes in content phase for a specific location, it usually means its Nginx module registers a "content handler" for the location. However, what happens if no module registers its command as "content handler" for phase content ? Who will be taking the glory of generate content and output responses ? The answer is the static resource module, which maps the request URI to the file system. Static resource module only comes into play when there is none "content handler", otherwise it hands off the duty to "content handler".

Typically Nginx has three static resource modules for the content phase (unless one or more of those modules are disabled explicitly, or some other conflicting modules are enabled when Nginx is built) The three modules, in the order of their execution order, are ngx_index module, ngx_autoindex module and ngx_static module. Let's discuss them one by one.

Module ngx_index and ngx_autoindex only apply to those request URI, which ends with /. For the other request URI which does not end with /, both modules ignore them and let the following content phase module handle. Module ngx_static however, has an exact opposite strategy. It ignores the request URI which ends with / and handles the rest.

Module ngx_index mainly looks for a specific home page file, such as index.html or index.htm in the file system. For example:

location / {
    root /var/www/;
    index index.htm index.html;
}

When address / is requested, Nginx looks for file index.htm and index.html (in this order) in a path in the file system. The path is specified by command root. If file index.htm exists, Nginx jumps internally to location index.htm; if it does not exist and file index.html exists, Nginx jumps internally to location index.html. If file index.html does not exist either, and handling is transferred to the other module which executes it commands in phase content.

We have learnt in Nginx Variables (02), commands echo_exec and rewrite can trigger "internal redirects" as well. The jump modifies the request URI, and looks for the corresponding location directive for subsequent handling. In the process, phases rewrite, access and content are reiterated for the location. The "internal redirect" is different from the "external redirect" defined by HTTP response code 302 and 301, client browser won't update its URI addresses. Therefore as soon as internal jump occurs when module ngx_index finds the files specified by command index, the net effect is like client would have been requesting the file's URI at the very beginning.

We can check following example to witness the "internal redirect" triggered by module ngx_index, when it finds the needed file.

location / {
    root /var/www/;
    index index.html;
}

location /index.html {
    set $a 32;
    echo "a = $a";
}

We need to create an empty file index.html under the path /var/www/, and make sure the file is readable for the Nginx worker process. Then we could send request to /:

$ curl 'http://localhost:8080/'
a = 32

What happened ? Why the output is not the content of file index.html (which shall be empty) ? Firstly Nginx uses directive location / to handle original GET / request, then module ngx_index executes in content phase, and it finds file index.html under path /var/www/. At this moment, it triggers an "internal redirect" to location /index.html.

So far so good. But here comes the surprises ! When Nginx looks for location directive which matches to /index.html, location /index.html has a higher priority than location /. This is because Nginx uses "longest matched substring" semantics to match location directives to request URI's prefix. When directive is chosen, phases rewrite, access and content are reiterated, and eventually it outputs a = 32.

What if we remove file /var/www/index.html in the example, and request to / again ? The answer is error 403 Forbidden. Why? When module ngx_index cannot find the file specified by command index (index.html in here), it transfers the handling to the following module which executes in content. But none of those following modules can fulfill the request, Nginx bails out and dumps us error. Meanwhile it logs the error in Nginx error log:

[error] 28789#0: *1 directory index of "/var/www/" is forbidden

The meaning of directory index is to generate "indexes". Usually this implies to generate a web page, which lists every file and sub directories under path /var/www/. If we use module ngx_autoindex right after ngx_index, it can generate such a page just like what we need. Now let's modify the example a little bit:

location / {
    root /var/www/;
    index index.html;
    autoindex on;
}

When / is requested again meanwhile file /var/www/index.html is kept missing. A nice html page is generated:

$ curl 'http://localhost:8080/'
<html>
<head><title>Index of /</title></head>
<body bgcolor="white">
<h1>Index of /</h1><hr><pre><a href="../">../</a>
<a href="cgi-bin/">cgi-bin/</a>  08-Mar-2010 19:36   -
<a href="error/">error/</a>      08-Mar-2010 19:36   -
<a href="htdocs/">htdocs/</a>    05-Apr-2010 03:55   -
<a href="icons/">icons/</a>      08-Mar-2010 19:36   -
</pre><hr></body>
</html>

The page shows there are a few subdirectories under my /var/www/. They are cgi-bin/, error/, htdocs/ and icons/. The output might be different if you have tried by yourself.

Again, if file /var/www/index.hmtl does exist, module ngx_index will trigger "internal redirect", and module ngx_autoindex will not have a chance to execute, you may test it yourself too.

The "goal keeper" module executed in phase content is ngx_static. which is also used intensively. The module serves the static files, including the static resources of a web site, such as static .html files, static .css files, static .js files and static image files etc. Although ngx_index could trigger an "internal redirect" to the specified home page, but the actual output task (takes the file content as response, and marks the corresponding response headers) is carried out by module ngx_static.

Nginx directive execution order (07)

Let's check an example in which module ngx_static serves disk files, with following configuration snippet:

location / {
    root /var/www/;
}

Meanwhile two files are created under /var/www/. One file is named index.html and its content contains one line of text this is my home. Another file is named hello.html and its content contains one line of text hello world. Again be aware of the files' privileges and make sure they are readable by Nginx worker process.

Now we send requests to the files' corresponding URI:

$ curl 'http://localhost:8080/index.html'
this is my home

$ curl 'http://localhost:8080/hello.html'
hello world

As we can see, the created file contents are sent as outputs.

We can examine what is happening here: location / does not have any command to execute in phase content, therefore no module has registered a "content handler" in the location. The handling thus falls to the three static resource modules which are the last resorts of phase content. The former two modules ngx_index and ngx_autoindex notices that the request URI does not end with / so they hand off immediately to module ngx_static, which runs in the end. According to the "document root" specified by command root, module ngx_static maps the request URIs /index.html and /hello.html to disk files /var/www/index.html and /var/www/hello.html respectively. As both files can be found, their content are outputted as response, meanwhile response header Content-Type, Content-Length and Last-Modified are accordingly indicated.

To verify module ngx_static has executed, we could enable the "debug log" introduced in (01). Again we send request to /index.html and Nginx error log will contain following debug information:

[debug] 3033#0: *1 http static fd: 8

This line is generated by module ngx_static. Its meaning is " outputting static resource whose file handle is 8". Of course the numerical file handle changes every time, and the line is only a typical output in my setup. To be reminded, builtin module ngx_gzip_static could generate the same debug info as well, by default it is not enabled though, which will be discussed later.

Command root only declares a "document root", it does not enables the ngx_static module. The module is as matter of fact, always enabled already, but it might not have the chance to execute. This is entirely up to the other modules, which execute earlier in content phase. Module ngx_static execute only when all of them have "gave up". To prove this, check following blank location definition:

location / {
}

Because there is no root command, Nginx computes a default "document root" when the location is requested. The default shall be the html/ subdirectory under "configure prefix". For example suppose our "configure prefix" is /foo/bar/, the default "document root" is /foo/bar/html/.

So who decides "configure prefix" ? Actually it the Nginx root directory when it is installed (or the value of --prefix option of script ./configure when Nginx is built). If Nginx is installed into /usr/local/nginx/, "configure prefix" is /usr/local/nginx/ and default "document root" is therefore /usr/local/nginx/html/. Certainly a command line option --prefix can be given when Nginx is started, to change the "configure prefix" (so that we can easily test multiple setups). Suppose Nginx is started as following:

nginx -p /home/agentzh/test/

For this server, its "configure prefix" becomes /home/agentzh/test/ and its "document root" becomes /home/agentzh/test/html/. The "configure prefix" not only determines "document root", it actually determines the way many relational path resolutes to absolute path in Nginx configuration. We will encounter many examples which reference "configure prefix".

In fact there is a simple way of telling current "document root", which is to request a non-existed file, Such as:

$ curl 'http://localhost:8080/blah-blah.txt'
<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

Naturally, the 404 error page is returned. Again when we check Nginx error log, we shall have following error message:

[error] 9364#0: *1 open() "/home/agentzh/test/html/blah-blah.txt" failed (2: No such file or directory)

The error message is printed by module ngx_static, since it cannot find a file blah-blah.txt in its corresponding path. And because the error message contains the absolute path, which ngx_static attempts to open with, it's quite obvious that current "document root" is /home/agentzh/test/html/.

Many newbies might take it for granted that error 404 is caused when the needed location does not exist. The former example tells us, 404 error could be returned even if the needed location is configured and matched. This is because error 404 means the non-existence of an abstract "resource", not the specific location.

Another frequent mistake is missing the command for phase content, when they actually don't expect the default static modules to come into play,for example:

location /auth {
    access_by_lua '
        -- a lot of Lua code omitted here...
    ';
}

Apparently, only commands for phase access are given for /auth, which is access_by_lua. And it has no commands for phase content. So when /auth is requested, the Lua code specified in access phase will execute, then the static resource will be served in phase content by module ngx_static. Since it actually looks for the file /auth on the disk normally it dumps a 404 error unless we are luckily and file /auth is created on the corresponding path. So the thumb of rule, when error 404 is encountered under no static resource circumstances, we shall first check if the location has properly configured its commands for phase content, the commands can be content_by_lua, echo and proxy_pass etc. In fact, Nginx error log error.log could only give very confusing message for the case. As the ones below, which is found for the above example:

[error] 9364#0: *1 open() "/home/agentzh/test/html/auth" failed (2: No such file or directory)

Nginx directive execution order (08)

So far we have addressed in detail rewrite, access and content, which are also the most frequently encountered phases in Nginx request processing. We have learnt many Nginx modules and their commands that execute in those phases, and it's clear to us that the commands' execution order is directly decided by the phase they are running in. Understanding the phase is our keynote for correct configuration which orchestrates various Nginx modules. Therefore let's cover the rest phases we've not met.

As mentioned in (01), altogether there can be 11 phases when Nginx handles a request. In their execution order the phases are post-read, server-rewrite, find-config, rewrite, post-rewrite, preaccess, access, post-access, try-files, content, and finally log.

Phase post-read is the very first, commands registered in this phase execute right after Nginx has processed the request headers. Similar to phase rewrite we've learnt earlier, post-read supports hooks by Nginx modules. Built-in module ngx_realip is an example, it hooks its handler in post-read phase, and forcefully rewrite the request's original address as the value of a specific request header. The following case illustrates ngx_realip module and its commands set_real_ip_from, real_ip_header.

server {
    listen 8080;

    set_real_ip_from 127.0.0.1;
    real_ip_header   X-My-IP;

    location /test {
        set $addr $remote_addr;
        echo "from: $addr";
    }
}

The configuration tells Nginx to forcefully rewrite the original address of every request coming from 127.0.0.1 to be the value of the request header X-My-IP. Meanwhile it uses the built-in variable $remote_addr to output the request's original address, so that we know if the rewrite is successful.

First we send a request to /test from localhost:

$ curl -H 'X-My-IP: 1.2.3.4' localhost:8080/test
from: 1.2.3.4

The test utilizes -H option provided by curl, the option incorporates an extra HTTP header X-My-IP: 1.2.3.4 in the request. As we can tell, variable $remote_addr has become 1.2.3.4 in rewrite phase, the value comes from the request header X-My-IP. So when does Nginx rewrite the request's original address ? yes it's in the post-read phase. Since phase rewrite is far behind phase post-read, when command set reads variable $remote_addr, its value has already been rewritten in post-read phase.

If however, the request sent from localhost to /test does not have a X-My-IP header or the header value is an invalid IP address, Nginx will not modify the original address. For example:

$ curl localhost:8080/test
from: 127.0.0.1

$ curl -H 'X-My-IP: abc' localhost:8080/test
from: 127.0.0.1

If a request is sent from another machine to /test, it original address won't be overwritten by Nginx either, even if it has a perfect X-My-IP header. It is because our previous case marks explicitly with command set_real_ip_from, that the rewriting only occurs for the requests coming from 127.0.0.1. This filtering mechanism protect Nginx from malicious requests sent by untrusted sources. As you might have expected, command set_real_ip_from can designate a IP subnet (by using CIDR notation introduced earlier in (03)). Besides, command set_real_ip_from can be used multiple times so that we can setup multiple trusted sources, below is an example:

set_real_ip_from 10.32.10.5;
set_real_ip_from 127.0.0.0/24;

You might be asking, what's the benefit module ngx_realip brings to us? Why would we rewrite a request's original address ? The answer is: when the request has come through one or more HTTP proxies, the module becomes very handy. When a request is forwarded by a proxy, its original address will become the proxy server's IP address, consequently Nginx and the services running on it will no longer have the actual source. However, we could let proxy server record the original address in a specific header (such as X-My-IP) and recover it in Nginx, so that its subsequent processing (and the services running on Nginx) will take the request as if it comes right from its original address and the proxies in between are transparent. For this exact purpose, module ngx_realip needs hook handlers in the first phase, the post-read phase, so the rewriting occurs as early as possible.

Behind post-read is the server-rewrite phase. We briefly mentioned in (02), when module ngx_rewrite and its commands are configured in server directive, they basically execute in server-rewrite phase. We have an example below:

server {
    listen 8080;

    location /test {
        set $b "$a, world";
        echo $b;
    }

    set $a hello;
}

Attention the set $a hello statement is put in server directive, so it runs in server-rewrite phase, which runs earlier than rewrite phase. Therefore statement set $b "$a, world'" in location directive is executed afterwards and it obtains the correct $a value:

$ curl localhost:8080/test
hello, world

Since phase server-rewrite executes later than post-read phase, command set in server directive always runs later than module ngx_realip, which rewrites the request's original address, example:

server {
    listen 8080;

    set $addr $remote_addr;

    set_real_ip_from 127.0.0.1;
    real_ip_header   X-Real-IP;

    location /test {
        echo "from: $addr";
    }
}

Send request to /test we have:

$ curl -H 'X-Real-IP: 1.2.3.4' localhost:8080/test
from: 1.2.3.4

Again, command set is written in front of commands of ngx_realip, its actual execution is only afterwards. So when command set assigns variable $addr in server-rewrite phase, the variable $remote_addr has been overwritten.

Nginx directive execution order (09)

Right after server-rewrite is the phase find-config. This phase does not allow Nginx modules to register their handlers, instead it is a phase when Nginx core matches the current request to the location directives. It means a request is not catered by any location directive until it reaches find-config. Apparently, for phases like post-read and server-rewrite, the effective commands are those which get specified only in server directives and their outer directives, because the two phases are executed earlier than find-config. This explains that commands of module ngx_rewrite are executed in phase server-rewrite only if they are written within sever directive. Similarly, the former examples configure the commands of module ngx_realip in server directive to make sure the handlers registered in post-read phase could function correctly.

As soon as Nginx matches a location directive in the find-config phase, it prints a debug log in the error log file. Let's check following example:

location /hello {
    echo "hello world";
}

If Nginx enables the "debug log", a debug log can be captured in file error.log whenever interface /hello is requested.

$ grep 'using config' logs/error.log
[debug] 84579#0: *1 using configuration "/hello"

For the purpose of convenience, the log's time stamp has been omitted.

After phase find-config, it is our old buddy rewrite. Since Nginx already matches the request to a specific location directive, starting from this phase, commands written within location directives are becoming effective. As illustrated earlier, commands of module ngx_rewrite are executed in rewrite phase when they are written in location directives. Likewise, commands of module ngx_set_misc and module ngx_lua ( set_by_lua and rewrite_by_lua) are also executed in phase rewrite.

After rewrite, it is the post-rewrite phase. Just like find-config, this phase does not allow Nginx modules to register their handlers either, instead it carries out the needed "internal redirects" by Nginx core (if this has been requested in rewrite phase). We have addressed the "internal jump" concept in (02), and demonstrated how to issue the "internal redirect" with command echo_exec or command rewrite. However, let's focus on command rewrite for the moment since command echo_exec is executed in content phase and becomes irrelevant to post-rewrite, the former draws greater interest because it executes in rewrite phase. Back to our example in (02):

server {
    listen 8080;

    location /foo {
        set $a hello;
        rewrite ^ /bar;
    }

    location /bar {
        echo "a = [$a]";
    }
}

The command rewrite found in directive location /foo, rewrites the URI of current request as /bar unconditionally, meanwhile, it issues an "internal redirect" and execution continues from location /bar. What ultimately intrigues us, is the magical bits and pieces of "internal redirect" mechanism, "internal redirect" effectively rewinds our processing of current request back to the find-config phase, so that the location directives can be matched again to the request URI, which usually has been rewritten. Just like our example, whose URI is rewritten as /bar by command rewrite, the location /bar directive is matched and execution repeats the rewrite phase thereafter.

It might not be obvious, that the actual act of rewinding to find-config does not occur in rewrite phase, instead it occurs in the following post-rewrite phase. Command rewrite in the former example, simply requests Nginx to issue an "internal redirect" in its post-rewrite phase. This design is usually questioned by Nginx beginners and they tend to come up with an idea to execute the "internal jump" directly by command rewrite. The answer however, is fairly simple. The design allows URI be rewritten multiple times in the location directive,which is matched at the very beginning. Such as:

location /foo {
    rewrite ^ /bar;
    rewrite ^ /baz;

    echo foo;
}

location /bar {
    echo bar;
}

location /baz {
    echo baz;
}

The request URI has been rewritten twice in location /foo directive: firstly it becomes /bar, secondly it becomes /baz. As the net effect of both rewrite statements, "internal redirect" occurs only once in post-rewrite phase. If it would have executed the "internal redirect" at the first URI rewrite, the second would have no chance to be executed since processing would have left current location directive. To prove this we send a request to /foo:

$ curl localhost:8080/foo
baz

It can be asserted from the output, the actual jump is from /foo to /baz. We could further prove this by enabling Nginx "debug log" and interrogate the debug log generated in find-config phase for the matched:

$ grep 'using config' logs/error.log
[debug] 89449#0: *1 using configuration "/foo"
[debug] 89449#0: *1 using configuration "/baz"

Clearly, for the specific request, Nginx only matches two location directives: /foo and /baz, and "internal jump" occurs only once.

Quite obviously, if command ngx_rewrite/rewrite is used to rewrite the request URI in server directive, there won't be any "internal redirects", this is because the URI rewrite is happening in server-rewrite phase, which gets executed earlier than find-config phase that matches in between the location directives. We can check the example below:

server {
    listen 8080;

    rewrite ^/foo /bar;

    location /foo {
        echo foo;
    }

    location /bar {
        echo bar;
    }
}

In the example, every request whose URI starts with /foo gets its URI rewritten as /bar. The rewriting occurs in server-rewrite phase, and the request has never been matched to any location directive. Only afterwards Nginx executes the matches in find-config phase. So if we send a request to /foo, location /foo never gets matched because when the match occurs in find-config phase, the request URI has been rewritten as /bar. So location /bar is the one and the only one matched directive. Actual output illustrates this:

$ curl localhost:8080/foo
bar

Again let's check Nginx "debug log":

$ grep 'using config' logs/error.log
[debug] 92693#0: *1 using configuration "/bar"

As we can tell, Nginx altogether finishes once the location match, and there is no "internal redirect".

Nginx directive execution order (10)

After post-rewrite, it is the preaccess phase. Just as its name implies, the phase is called preaccess simply because it is executed right before access phase.

Built-in module ngx_limit_req and ngx_limit_zone are executed in this phase. The former limits the number of requests per hour/minute, and the latter limits the number of simultaneous requests. We will be discussing them more thoroughly afterwards.

Actually, built-in module ngx_realip registers its handler in preaccess as well. You might need to ask then: "why do it again? Did it register its handlers in post-read phase already". Before the answer is uncovered let's study following example:

server {
    listen 8080;

    location /test {
        set_real_ip_from 127.0.0.1;
        real_ip_header X-Real-IP;

        echo "from: $remote_addr";
    }
}

Comparing to the earlier example, the major difference is that commands of module ngx_realip are written in a specific location directive. As we have learnt before, Nginx matches its location directives in find-config phase, which is far behind post-read, hence the request has nothing to do with commands written in any location directive in post-read phase. Back to our example, it is exactly the case where commands are written in a location directive and module ngx_realip won't carry out any rewrite of the remote address, because it is not instructed as such in post-read phase.

What if we do need the rewrite? To help resolve the issue, module ngx_realip registers its handlers in preaccess again, so that it is given the chance to execute in a location directive. Now the example runs as we would've expected:

$ curl -H 'X-Real-IP: 1.2.3.4' localhost:8080/test
from: 1.2.3.4

Be really careful though, module ngx_realip could easily be misused, as our following example illustrates:

server {
    listen 8080;

    location /test {
        set_real_ip_from 127.0.0.1;
        real_ip_header X-Real-IP;

        set $addr $remote_addr;
        echo "from: $addr";
    }
}

In the example, we introduces a variable $addr, to which the value of $remote_addr is saved in rewrite phase. The variable is then used in the output. Slow down right here and you might have noticed the issue, phase rewrite occurs earlier than preaccess, so variable assignment actually happens before module ngx_realip has the chance to rewrite the remote address in preaccess phase. The output proves our observation:

$ curl -H 'X-Real-IP: 1.2.3.4' localhost:8080/test
from: 127.0.0.1

The output gives the actual remote address (not the rewritten one) Again Nginx "debug log" helps assert it too:

$ grep -E 'http script (var|set)|realip' logs/error.log
[debug] 32488#0: *1 http script var: "127.0.0.1"
[debug] 32488#0: *1 http script set $addr
[debug] 32488#0: *1 realip: "1.2.3.4"
[debug] 32488#0: *1 realip: 0100007F FFFFFFFF 0100007F
[debug] 32488#0: *1 http script var: "127.0.0.1"

Among the logs, the first line writes:

[debug] 32488#0: *1 http script var: "127.0.0.1"

The log is generated when variable $remote_addr is fetched by command set, string "127.0.0.1" is the fetched value.

The second line writes:

[debug] 32488#0: *1 http script set $addr

It indicates Nginx assigns value to variable $addr.

For the following two lines:

[debug] 32488#0: *1 realip: "1.2.3.4"
[debug] 32488#0: *1 realip: 0100007F FFFFFFFF 0100007F

They are generated when module ngx_realip rewrites the remote address in preaccess phase. As we can tell, the new address becomes 1.2.3.4 as expected but it happens only after the variable assignment and that's already too late.

Now the last line:

[debug] 32488#0: *1 http script var: "127.0.0.1"

It is generated when command echo outputs variable $addr, clearly the value is the original remote address, not the rewritten one.

Some people might come up with a solution immediately:" what if module ngx_realip registers its handlers in rewrite phase instead, not in preacccess phase ?" The solution however is, not necessarily correct. This is because module ngx_rewrite registers its handlers in rewrite phase too, and we have learnt in (02) that the execution order, under the circumstances, can not be guaranteed, so there is a good chance that module ngx_realip still executes its commands after command set.

Always we have the backup option: instead of preaccess, try use ngx_realip module in server directive, it bypasses the bothersome situations encountered above.

After phase preaccess, it is another old friend, the access phase. As we've learnt, built-in module ngx_access, 3rd party module ngx_auth_request and 3rd party module ngx_lua ( access_by_lua) have their commands executed in this phase.

After phase access, it is the post-access phase. Again as the name implies, we can easily spot that the phase is executed right after access phase. Similar to post-rewrite, the phase does not allow Nginx module to register their handlers, instead it runs a few tasks by Nginx core, among them, primarily is the satisfy functionality, provided by module ngx_http_core.

When multiple Nginx module execute their commands in access phase, command satisfy controls their relationships in between. For example, both module A and module B register their access control handlers in access phase, we may have two working modes, one is to let access when both A and B pass their control, the other is to let access when either A or B pass their control. The first one is called all mode ("AND" relation), the second one is called any mode ("OR" relation) By default, Nginx uses all mode, below is an example:

location /test {
    satisfy all;

    deny all;
    access_by_lua 'ngx.exit(ngx.OK)';

    echo something important;
}

Under /test directive, both ngx_access and ngx_lua are used, so we have two modules monitoring access in access phase. Specifically, statement deny all tells module ngx_access to rejects all access, whereas statement access_by_lua 'ngx.exit(ngx.OK)' allows all access. When all mode is used with command satisfy, it means to let access only if every module allows access. Since module ngx_access always rejects in our case, the request is rejected:

$ curl localhost:8080/test
<html>
<head><title>403 Forbidden</title></head>
<body bgcolor="white">
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx</center>
</body>
</html>

Careful readers might find following error log in the Nginx error log file:

[error] 6549#0: *1 access forbidden by rule

If however, we change the satisfy all statement to satisfy any.

location /test {
    satisfy any;

    deny all;
    access_by_lua 'ngx.exit(ngx.OK)';

    echo something important;
}

The outcome is completely different:

$ curl localhost:8080/test
something important

The request is allowed to access. Because overall access is allowed whenever one module passes the control in any mode. In our example, module ngx_lua and its command access_by_lua always allow the access.

Certainly, if every module rejects the access in the satisfy any circumstances, the request will be rejected:

location /test {
    satisfy any;

    deny all;
    access_by_lua 'ngx.exit(ngx.HTTP_FORBIDDEN)';

    echo something important;
}

Now request to /test will encounter 403 Forbidden error page. In the process, the "OR" relation of access control of each access module, is implemented in post-access.

Please note that this example requires at least ngx_lua 0.5.0rc19 or later; earlier versions cannot work with the satisfy any statement.