Content Filtering
Content filtering enables the operator to deny the transit of web pages that contain textual media that matches filters specified by the operator.
The content filtering mechanism allows the operator to specify independent content filtering policies for different groups. The intention is to allow the operator to generate direct revenue from the application of filtering or for enabling unfettered Internet access.
For example, residential customers may be offered a porn blocking premium service while business customers receive unfettered access. The converse may also be useful where advertising supported hotspot access is restricted from accessing all known objectionable content and a paid upgrade enables unfettered access to the Internet.
In order for content filtering to operate, the rXg must have access to the web page that the end-user requests. This is accomplished by enabling the transparency web proxy which intercepts end-user HTTP requests. If locally cached copies of the content are expired or out of date, a new copy is requested from the server. The rXg content filtering mechanism then operates on the local copy of the end-user requested web page. If the requested web page does not match the profile configured by the operator, an HTTP response containing the requested page is then transferred to the end-user.
Content Not Permitted
When an end-user attempts to access prohibited content, they are redirected to /content_filter
view of the captive portal. By default, this view displays a denied graphic along with the desired URL, the reason for the denial and the categories that were matched. This view of the portal may be customized to any format that the operator desires.
The /content_filter
view of the portal is an integral part of revenue generation through content filtering. For example, if the content filter is being applied to limit access for to end-users in an advertising supported group, the /content_filter
view should be customized to advertise the availability of unfettered access for a fee. If the content filter is enabled as part of a threat management package, affiliate program links to security software downloads (e.g., McAfee, Norton, etc.) would be appropriate.
Content Filters
The content filters scaffold presents the fields necessary to configure the rXg policy enforcement engine to deny access to HTTP responses containing operator specified classes of content.
The name field is an arbitrary string descriptor used only for administrative identification. Choose a name that reflects the purpose of the record. This field has no bearing on the configuration or settings determined by this scaffold.
The note field is a place for the administrator to enter a comment. This field is purely informational and has no bearing on the configuration settings.
The filtering section configures the global behavior and filtering sources for this content filter.
The lists and remote lists fields specify the lists of URLs and domains that will be pass or denied in this content filter. The lists and remote lists are configured using the scaffolds below.
The blanket block checkbox is used to place the content filter into whitelist mode. All web pages except those specified in the whitelists are denied when this mode is enabled.
The denied portal action specifies the action on the captive portal that will be called when prohibited content is requested.
The WAN targets field limits the effect of the content filtering settings defined by this record to traffic that is destined to the IP addresses or DNS names listed in the selected WAN targets. By default, a content filter record affects all HTTP traffic matched by the policy regardless of WAN origin / destination. Setting a WAN target causes the breadth of the content filter to be limited to the specified hosts in the manner specified by the WAN target mode.
The content filter only manipulates HTTP traffic by default. Setting the intercept SSL/TLS checkbox enables the content filter to manipulate encrypted HTTPS traffic in the same manner as if it were regular HTTP traffic.
The safesearch checkbox causes the content filter to always enable safe-search mode in search engines. This feature requires the intercept SSL/TLS feature to be enabled because almost all search engines run over HTTPS.
The YouTube EDU ID field configures the content filter to add the specified data to all traffic to/from YouTube. This mechanism enables server-side content filtering for YouTube. Use of this feature requires educational facility registration with Google / YouTube.
The tunnel detection field configures the content filter to look for IP tunneling over HTTPS. The detection may be configured in real-time (higher overhead) or background (higher performance). The operator may also choose to log the presence of HTTPS tunneling rather than denying this behavior outright.
The enhanced HTTPS security checkbox configures the content filter to block access to HTTPS sites that fail to present a SSL certificate that is signed by a trusted third-party.
The policy field relates this record to a set of groups through a policy record.
Remote Content Filter Lists
Entries in the remote content filter lists scaffold are used to configure the parameters needed to periodically download third party maintained lists. The de facto standard list format is a compressed archive (.tar.gz) file that extracts into a series of subdirectories named by blocking category. Each entry in this scaffold configures the periodic download of a compressed archive (.tar.gz) file. Multiple archive files may be used in a single content filter policy enforcement.
The name field is an arbitrary string descriptor used only for administrative identification. Choose a name that reflects the purpose of the record. This field has no bearing on the configuration or settings determined by this scaffold.
The categories field allows the operator to choose from one or more groups of URLs and/or domains to include in this remote content filter list definition. This field is only useful when the URL of this remote content filter list refers to a gzipped tarball archive. The names of subdirectories present in the archive are presumed to be the names of blocking categories and show up in this field.
When a remote content filter list is created, this field will be empty. Once the list is downloaded and extracted for the first time, the subdirectories will then appear in this field. After creating a remote content filter list it is necessary to edit the record and select the desired categories in order for proper operation. Initial download time (and hence, population of the categories field depends upon the size of the archive file as well as the speed of the network connection between the server addressed by the URL.
When Custom is selected in the Provider field it allows the operator to specify a custom URL to a list maintained by the operator or another source.
The URL field contains the URL of the file that the rXg will download. The target file is expected to be in one of two formats: .tar.gz archive or a .txt plain text file. If the file is a gzipped tarball archive (.tar.gz), it is expected to extract multiple subdirectories named by category, each of which contains a "urls" file containing a list of specific URLs to list, and/or a "domains" file containing a list of domains (i.e., entire sites) to list. If the file is plain-text, it is expected that the file is a list of domains and URLs to list. The rXg supports lists formatted for ufdbGuard,DansGuardian and SquidGuard. Well known providers include URLfilterDB, Shalla Listand Squidblacklist.org.
The frequency defines the periodicity with which the rXg will download the list archive file from the URL specified. The request to download a new remote content filter list occurs immediately after the create button is clicked. The periodicity of subsequent downloads are determined by the value of specified in this field. The downloading of subsequent updates will always occur between 4 and 5 AM local time.
The note field is a place for the administrator to enter a comment. This field is purely informational and has no bearing on the configuration settings.
Example Content Filter Configuration
In this example we will configure content filter that will block adult content. Navigate to Policies::Content Filtering and create a new Remote Content Filter List.
Give it a name, type should be set to Blacklist. For this example we will use the UT1 Provider. Set the Frequency for often it downloads the list, for UT1 we do not need a username/password, click create.
The system will download the remote list, after waiting about a minute we can edit the Remote Content Filter List we created. The Categories section should now list the categories that can be filtered. For this example adult, dating, and lingerie is selected. Click update.
Next create a new Content Filter.
Give it a name, leave Filter DNS checked, we can change how it lookup response behaves by changing the response, for this example I will leave it on Block w/ NXDOMAIN (return name does not exist). Under Content Lists verify that the Remote list created in the previous step is checked. Lastly select the polices the filter should apply to, and click create.
A custom Content Filter list can be hosted on any accessible web server. The rXg can be configured to use this Custom Content Filter (plain-text or tar/gzip) file by creating a new Remote content Filter List and selecting Custom for the Provider and populating the URL with the direct link to the file. The content filter can be set to synchronize as a daily, weekly, or monthly recurrence.
To host the Custom Content Filter on a rXg device, the filter list can be uploaded to the /space/rxg/console/public folder on the host.
A content filter list is contained within a tar.gz file. The root of the file contains a set of directories which serve as categories for the filter list. The categories are used to select specific types of content to filter from the broader list. Inside each folder there are one or more extensionless text files which contain lists of domains and URLs. The file domains consists of fully qualified domain names for entire websites to be used in a content filter list. The file urls consists of full paths to pages which will be used in the content filter list.
File Structure:
Domain List:
URL List