Thursday, November 3, 2011

Reverse Proxy


Reverse proxy
In computer networks, a reverse proxy is a type of proxy server that retrieves resources on behalf of a client from one or more servers. These resources are then returned to the client as though it originated from the reverse proxy itself. A reverse proxy is usually situated closer to the server(s) and will only return a configured set of resources.
reverse proxy, by contrast, appears to the client just like an ordinary web server. No special configuration on the client is necessary. The client makes ordinary requests for content in the name-space of the reverse proxy. The reverse proxy then decides where to send those requests, and returns the content as if it was itself the origin.
A typical usage of a reverse proxy is to provide Internet user’s access to a server that is behind a firewall. Reverse proxies can also be used to balance load among several back-end servers, or to provide caching for a slower back-end server. In addition, reverse proxies can be used simply to bring several servers into the same URL space.
Example
A = Client Machine
B = The Reverse proxy machine
C = The Origin server

Normally, one would connect directly from A --> C

However, in some scenarios, it is better for the administrator of C to restrict disallow direct access, and force visitors to go through B first. So, as before, we have data being retrieved by B --> C on behalf of A, which chains as follows: A --> B --> C.
The difference is user A does not know he is accessing C.
A Reverse Proxy requires no configuration or special knowledge by the client, A.
The client A probably thinks he is visiting directly (A --> C), but the reality is that B is the invisible go-between (A --> B --> C again).

Reasons to set up a reverse proxy server
  1. Reverse proxies can hide the existence and characteristics of the origin server(s).
  2. Application firewall features can protect against common web-based attacks. Without a reverse proxy, removing malware or initiating takedowns, for example, can become difficult.
  3. In the case of secure websites, the SSL encryption is sometimes not performed by the web server itself, but is instead offloaded to a reverse proxy that may be equipped with SSL acceleration hardware.
  4. A reverse proxy can distribute the load from incoming requests to several servers, with each server serving its own application area. In the case of reverse proxying in the neighborhood of web servers, the reverse proxy may have to rewrite the URL in each incoming request in order to match the relevant internal location of the requested resource.
  5. A reverse proxy can reduce load on its origin servers by caching static content, as well as dynamic contentProxy caches of this sort can often satisfy a considerable amount of website requests, greatly reducing the load on the origin server(s). Another term for this is web accelerator.
  6. A reverse proxy can optimize content by compressing it in order to speed up loading times.
  7. In a technique known as "spoon feeding", a dynamically generated page can be produced all at once and served to the reverse-proxy, which can then return it to the client a little bit at a time. The program that generates the page is not forced to remain open and tying up server resources during the possibly extended time the client requires to completing the transfer.
  8. Reverse proxies can be used whenever multiple web servers must be accessible via a single public IP address. The web servers listen on different ports in the same machine, with the same local IP address or, possibly, on different machines and different local IP address altogether. The reverse proxy analyses each incoming call and delivers it to the right server within the local area network.

How to configure a reverse proxy in Apache
The following configuration enables the reverse proxy in apache
LoadModule proxy_module      modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule headers_module    modules/mod_headers.so
LoadFile   /usr/lib/libxml2.so
LoadModule proxy_html_module modules/mod_proxy_html.so
LoadModule xml2enc_module modules/mod_xml2enc.so
 
ProxyRequests off
ProxyPass /app1/ http://internal1.example.com/
ProxyPass /app2/ http://internal2.example.com/
ProxyHTMLURLMap http://internal1.example.com /app1
ProxyHTMLURLMap http://internal2.example.com /app2
 
<Location /app1/>
        ProxyPassReverse /
        ProxyHTMLEnable On
        ProxyHTMLURLMap  /      /app1/
        RequestHeader    unset  Accept-Encoding
</Location>
 
<Location /app2/>
        ProxyPassReverse /
        ProxyHTMLEnable On
        ProxyHTMLURLMap /       /app2/
        RequestHeader   unset   Accept-Encoding
</Location>
Explanation for above configuration
1).Loading Modules
Apache Module mod_proxy implements a proxy for Apache. In keeping with Apache's modular architecture, mod_proxy is itself modular and a typical proxy server will need to enable several modules. For Example
a.    mod_proxy: The core module deals with proxy infrastructure and configuration and  managing a proxy request
b.    mod_proxy_http: This handles fetching documents with HTTP and HTTPS.
c.    mod_proxy_ftp: This handles fetching documents with FTP.
d.    mod_proxy_connect: This handles the CONNECT method for secure (SSL)  Tunneling
e.    mod_proxy_ajp: This handles the AJP protocol for Tomcat and similar backend servers.
f.     mod_proxy_balancer: implements clustering and load-balancing over multiple back ends.
g.    mod_cache, mod_disk_cache, mod_mem_cache: this deal with managing a document cache. To enable caching requires mod_cache and one or both of disk_cache and mem_cache.
h.    mod_proxy_html: This rewrites HTML links into a proxy's address space.
i.      mod_xml2enc: This supports internationalization (i18n) on behalf of mod_proxy_html and other markup-filtering modules. space.
j.     mod_headers: This modifies HTTP request and response headers.
k.    mod_deflate: Negotiates compression with clients and back ends.

As with any modules, the first thing to do is to load them in httpd.conf (Of course, you may not need all the modules). 
LoadModule    proxy_module           modules/mod_proxy.so
LoadModule   proxy_http_module    modules/mod_proxy_http.so
LoadModule   headers_module         modules/mod_headers.so
LoadModule   deflate_module           modules/mod_deflate.so
LoadFile        /usr/lib/libxml2.so (For windows libxml2.dll and iconv.dll and xlib.dll)
LoadModule   xml2enc_module        modules/mod_xml2enc.so
LoadModule   proxy_html_module    modules/mod_proxy_html. So


2).Basic Configuration part
The ProxyRequests directive should usually be set off when using ProxyPass. Do not set "ProxyRequests On". It turns your server into an Open Proxy.
                       ProxyRequests off
3). ProxyPass
The fundamental configuration directive to set up a reverse proxy is ProxyPass.
ProxyPass       /app1/ http://internal1.example.com/
ProxyPass       /app2/  http://internal2.example.com/
This directive allows remote servers to be mapped into the space of the local server; the local server does not act as a proxy in the conventional sense, but appears to be a mirror of the remote server. Path is the name of a local virtual path; url is a partial URL for the remote server and cannot include a query string.
Suppose the local server has address http://example.com/ ; then
ProxyPass /mirror/foo/  http://backend.example.com/
Will cause a local request for http://example.com/mirror/foo/bar  to be internally converted into a proxy request to http://backend.example.com/bar
The ! Directive is useful in situations where you don't want to reverse-proxy a subdirectory, e.g.
ProxyPass /mirror/foo/i !
ProxyPass /mirror/foo http://backend.example.com
Will proxy all requests to /mirror/foo to backend.example.com except requests made to /mirror/foo/i.
ProxyPass just sends traffic straight through. So when the application servers generate references to themselves (or to other internal addresses), they will be passed straight through to the outside world, where they won't work
For example, an HTTP redirection often takes place when a user (or author) forgets a trailing slash in a URL. So the response to a request for http://www.example.com/app1/foo proxies to http://internal.example.com/foo which generates a response:
        HTTP/1.1 302 Found
        Location: http://internal.example.com/foo/
        (etc)

The command to enable such rewrites in the HTTP Headers is ProxyPassReverse. The Apache documentation suggests the form:

            ProxyPassReverse /app1/ http://internal1.example.com/
            ProxyPassReverse /app2/ http://internal2.example.com/

5).ProxyHTMLURLMap

Mod_proxy_html is based on a SAX parser: specifically the HTMLparser module from libxml2 running in SAX mode (any other parse mode would of course be very much slower, especially for larger documents). It has full knowledge of all URI attributes that can occur in HTML 4 and XHTML 1. Whenever a URL is encountered, it is matched against applicable ProxyHTMLURLMap directives. If it starts with any from-pattern, that will be rewritten to the to-pattern. Rules are applied in the reverse order to their appearance in httpd.conf, and matching stops as soon as a match is found.
Here's how we set up a reverse proxy for HTML. Firstly, full links to the internal servers should be rewritten regardless of where they arise, so we have:
 
ProxyHTMLURLMap http://internal1.example.com /app1
ProxyHTMLURLMap http://internal2.example.com /app2

Note that in this instance we omitted the "trailing" slash. Since the matching logic is starts-with, we use the minimal matching pattern. We have now globally fixed case 3 above.
Case 2 above requires a little more care. Because the link doesn't include the hostname, the rewrite rule must be context-sensitive. As with ProxyPassReverse above, we deal with that using <Location>
 
<Location /app1/>
        ProxyHTMLURLMap / /app1/
</Location>
<Location /app2/>
        ProxyHTMLURLMap / /app2/
</Location>
 

1 comment:

  1. Are there any security issues with setting up a reverse proxy?

    ReplyDelete