National Geothermal Data System:  CKAN Node Software Installation Instructions

Arizona Geological Survey and Siemens Corp.

version 1.03; 3/26/2014

This work is licensed under a Creative Commons Attribution 3.0 Unported License. Copyright © Arizona Geological Survey, 2014

 

Edit History

Version:

Author:

Date:

Details

0.1

Roberto Silva Filho

05/28/2013

Initial Draft Created

0.2

Monica McKenna

06/11/2013-07/24/2013

Minor updates, Combining comments from a few people, Added appendix with summary of development.ini changes, A little re-organization, more hints, and added gdal, Updating with feedback

0.7

Christoph Kuhmuench

12/26/2013

Updating to latest installer.

0.8

Jordan Matti

2/7/2014

Many changes

0.9

Christy Caudill

2/19/2014

Moved Window OS/ Oracle VM install to Appendix A.

0.91

Jordan Matti

2/20/2014

Minor changes (formatting, headers, comments)

1.03

Christy Caudill

2/21/2014- 3/26/2014

Formatting changes, minor edits and figure updates, Edits from test install with VM, Serious revisions, updating the installer script used, Incorporating changes per Matt MacKenzie, Update to installer script, edits per Matt MacKenzie. Added Section 5, added Section 6.

 

Stephen Richard

4/2/2014

Editing, formatting, review


Contents

1        Introduction. 2

1.1         Purpose and Audience. 2

1.2         Document Roadmap. 2

1.3         System Scope and Background. 2

1.4         The NGDS Software Stack. 3

2        Install the NGDS Software Stack. 4

2.1         Install Git. 5

2.2         Obtain the NGDS Software Stack Installation Files. 5

2.3         Set Installation Parameters. 5

2.4         Run the Installation Script. 6

2.5         Final Steps. 7

3        Troubleshooting your NGDS Installation. 7

4        Short tutorial on using a publisher node installation. 8

4.1         Tiers of NGDS data delivery. 8

4.1.1          Tier 1. 8

4.1.2          Tier 2. 9

4.1.3          Tier 3. 9

4.2         How to upload tier 1 and 2 data. 9

4.3         How to upload teir 3 data and publish web services. 9

4.4         How to bulk upload metadata. 10

4.5         Register your NIAB with the main NGDS aggregator. 10

5        Tips for using an aggregator node installation. 10

 

 



 

1         Introduction

National Geothermal Data System (NGDS) was initiated as a Department of Energy-funded effort to facilitate public access to information about geothermal resources from public and private sources. NGDS data is available through a distributed, scalable network of data providers. One of the goals of the NGDS is to provide an open-source software stack for releasing open data on the World Wide Web project that is sustainable and a cost-effective option for data producers. With this documentation, system administrators will be able to quickly understand the system and deploy a productive node in the NGDS.

 

1.1       Purpose and Audience

This document is a step-by-step tutorial to setup an instance of the NGDS CKAN Software Stack for an NGDS node (Node-In-A-Box, NIAB ). This document should also provide system administrators a more thorough understanding of the components and architecture of NGDS Node-In-A-Box (NIAB).

This document is intended for NGDS System Administrators to understand the broad concepts and methods of installation. A more technical discussion targeted for Software Architects and Software Developers is available in a series of wiki pages at: https://github.com/ngds/ckanext-ngds/wiki/_pages.

Two particularly useful pages on the wiki are recommended for those interested in technical details:

https://github.com/ngds/ckanext-ngds/wiki/The-NGDS-Package-and-Resource-Schema

https://github.com/ngds/ckanext-ngds/wiki/Configuration-Parameters-for-NGDS

For documentation regarding the CKAN API (application programming interface), see the following website: http://docs.ckan.org/en/ckan-2.1/api.html

1.2       Document Roadmap

This document outlines the architecture of NGDS and is structured in the following way:

·         Section 2: NGDS Software Stack prerequisites-- A description of NGDS function, and of the components that are necessary to install the NGDS Software Stack

·         Section 3: Installing the NGDS Software Stack on an Ubuntu Linux operating system -  Step-by-step installation instructions for the NGDS Software Stack on an Ubuntu Linux operating system, in production mode or development mode

·         Section 4: NGDS Software Stack installation troubleshooting

·         Appendix A: Installation guide for an Ubuntu Linux virtual machine in VirtualBox

·         Appendix B: NGDS architecture and diagrams and notes

1.3       System Scope and Background

The National Geothermal Data System (NGDS) is a distributed data-sharing network. NGDS data providers (publishers) host data using their own computing resources and present web-accessible metadata describing their data holdings for harvest by aggregating catalog nodes (aggregators) in the data network.

Metadata presented by registered publisher nodes is regularly harvested into NGDS aggregator catalogs. The aggregator node(s) host web sites from which users can search the aggregated metadata catalog for datasets, documents and services. Thus, the aggregator node becomes the one-stop search interface for the entirety of the system. The central NGDS aggregator node can be accessed at http://geothermaldata.org.

1.4       The NGDS Software Stack

The NGDS Software Stack is a collection of applications that support release of data for the NGDS, creation and publication of metadata records, and search of the metadata catalog hosted by an NGDS node.

When installed, the NGDS Software Stack allows the computer on which it is installed to become an NGDS node. There are two types of NGDS nodes:

·         Publisher nodes: When installed on a server and configured to act as a publisher node, the NGDS Software Stack provides a web-accessible interface that can be used to create and manage metadata records. Metadata held by a publisher node that has been registered with an NGDS aggregator node will be harvested by the aggregator node at regular intervals.

·         Aggregator nodes: When installed on a server and configured to act as an aggregator node, the NGDS Software Stack provides a web-accessible metadata catalog that can be configured to harvest metadata from NGDS publisher nodes. An NGDS aggregator node harvests metadata from any NGDS publisher node or Open Geospatial Consortium Catalog Service for the Web (CSW) with metadata that conforms to the USGIN ISO19115 profile. This same installation can produce either a publisher or and aggregator, but this document focuses on the installation of a publishing node.

 

Note that the NGDS Software Stack can also be installed in two modes:

·         Production mode: a deployment to support day to day operation with minimal disruption and maximal performance.

·         Development mode: used by software developers to update sofware components in the stack; generally installed in a development framework that enables debugging at the cost of slower performance.

 

Installation rperequisites

Installing and configuring the individual components utilized by the NGDS Software Stack requires a physical or virtual computer with the following properties:

·         Internet access

·         A properly configured clean Ubuntu Linux distribution 12.04 or higher operating system installed (example: Xubuntu 13.04 desktop-i386.iso)

·         A user account with Super-User (Administrator) privileges

·         At least 1024 megabytes of RAM; a physical computer that will be used to host a virtual machine should have sufficient RAM to allocate at least 1024 MB of RAM to the virtual machine running the NGDS software.

 

The continuation of this document describes the steps necessary to install the NGDS Software Stack as a publisher node or an aggregator node. For those strictly using Windows OS, Appendix A of this document describes preliminary steps to create your own virtual machine and install Ubuntu Linux using Oracle VM Virtual Box (free download).

2         Install the NGDS Software Stack

The NGDS Software Stack depends on a number of softwaresoftware components, all of which which will be installed automatically by the NGDS Software Stack installation script (with the exception of Git, see 3.1 t, below). For the script to successfully run the installation, the computer must have access to the Internet. These components include:

Java Development Kit (JDK)

Git

Apache SOLR

PostgreSQL database

PostgreSQL extensions for Geographical Information Systems (POSTGIS)

GeoServer

Apache Tomcat

CKAN

Python extensions

gdal

 

 

Figure 1 provides a visual representation of the manner in which these components interact. Components higher in the figure have dependencies on the components beneath the box in which they are depicted. For instance Apache SOLR is a Java servlet that is hosted by the Tomcat application server, which requires a Java environment to execute Java programs. The Upstart process (an Ubuntu operating system service) monitors processes and restarts them if there are any crashes, and that service (among many others…) depends on the foundation functionality of the Ubuntu (Linux) operating system for file access, interaction with the user, and many other functions.

 


2.1       Install Git

To install Git, make sure you are logged in as ngds. Open an Ubuntu Linux terminal and execute the following commands:

%  cd ~ngds

%  sudo apt-get install git git-core

 

The 'sudo' command will run the following commands as a super user; you will be asked to enter the password for your ngds login.

2.2       Obtain the NGDS Software Stack Installation Files

To obtain the installation files for the NGDS Software Stack, create a tmp directory in the ~ngds directory (the home directory for the ngds user) and clone the git repository:

%  mkdir tmp

%  cd tmp

%  git clone https://github.com/ngds/install-and-run.git

 

2.3       Set Installation Parameters

Before running the NGDS Software Stack installation script, you will need to ensure specific required installation parameters exist. To do so, navigate to the following directories and use a text editor to make the following changes:

%  cd install-and-run/installation

%  sudo nano install-ngds.sh

 

The most important variables to specify are:

·         depolyment_type: User may choose between publisherpublisher or aggregator mode:

"node"-- will set up the installation as a publisher node

"central" --  will set up the installation as an  aggregator node

·         site_url=’http://myservername_IPname’. This can be left to the default value (http://127.0.0.1), which is the IP for 'localhost', i.e. a local address for the machine on which the install is being done.

·         SERVER_NAME = Should be the same as the 'site_url' (ex. http://127.0.0.1)

·         SMTP_SERVER = A server that receives outgoing email messages and routes them to recipients (ex. smtp.gmail.com:587)

·         SMTP_STARTTLS = An extension which upgrades a plain text connection to an encrypted one (ex. True)

·         SMTP_USER = An email address that will send automated emails from CKAN, through the SMTP server (ex. email@gmail.com)

·         SMTP_PASSWORD = The password associated with the above email address

·         GEOSERVER_REST_URL = Which contains the connection parameters for Geoserver, in the form:

"geoserver://{username}:{password}@{geoserver_rest_url}"

·         # Customize CSW: Here, the user may enter metadata for the custom CSW which accompanies the NIAB installation.

The site_url should indicate the exact web-facing server or IP address on which this NGDSNGDS CKAN node is to be accessible; this must be a publically accessible IP address if users beside the installing user are to be able to access the site. Other variables such as the Apache Tomcat home directory can be configured in this file as well. Do not change anything beyond line 95, which reads, “DO NOT CHANGE ANYTHING BELOW THIS POINT”.

The same site_url indicated above must be indicated in the GEOSERVER_REST_URL variable.  This will allow web services published through the installed GeoServer instance to be web-accessible.  Remember the username and password, as they will be needed in the post-installation process to create a user account that CKAN can use with Geoserver:

GEOSERVER_REST_URL = "geoserver://admin:geoserver@myservername_IPname:8080/
geoserver/rest"

 

2.4       Run the Installation Script

In an Ubuntu Linux terminal, in the installation directory, execute the following command:

%  sudo ./install-ngds.sh

 

The above script will take some time to install various features and functions.

Watch the console output during this process to spot any errors that might be flagged; if the there is a message to "See the log file… (file location path)…to fix the issue and try again" near the bottom of the console output, before returning to the Ubuntu command line prompt, you'll have to check the log file to continue. If no such messages appear, thet NGDS Software Stack has been installed; follow the additional steps below to complete configuration of your new node.

2.5       Final Steps

If the installation was performed correctly, the web-accessible interface provided by the NGDS Software Stack can be reached at: http://127.0.0.1/

NavigateN to the above address and finish the installationinstallation:

1.       Log in with the following credentials:

·         Username: admin

·         Password: admin

2.    Navigate to http://127.0.0.1/organization.

3.       Create a new organization.  Make the name of the organization 'Public'

4.       Navigate to the installed GeoServer

Change the master and administrator passwords in GeoServer for security, indicating the same username and passwords from the GEOSERVER_REST_URL parameter entered in section 3.3:

·         http://127.0.0.1:8080/geoserver/web/

·         Log in with the following credentials:

·         Username: admin

·         Password: geoserver

·         Follow the Change it instructions on the home page indicated by the  icon and enter the username and password set in the installer script for the GEOSERVER_REST_URL.

Congratulations! The system has now been configured and is ready for use.

3         Troubleshooting your NGDS Installation

If the installation seems to stall out, check the output of the installation script to look for error messages. If the site http://127.0.0.1/ seems slow or does not load correctly, give it a few extra minutes and try again. Likely, you will need to restart the Apache web server and wait a few more minutes. You can also try running the following command:

%  sudo a2dissite default

 

If install_ngds.sh is not recognized as an executable file, allow the current user to run it as an executable:

%  sudo chmod u+x install_ngds.sh

 

The most common errors are:

Typos: typos appearing in commands or paths can be very difficult to spot and can sometimes lead to unclear error messages. Check your text and paths carefully. Some scripts (such as BASH) are case-sensitive, so a lower-case or upper-case letter in the wrong place can cause problems.

Permission Errors: Permission errors occur when you try to perform an action without super-user capabilities. If you notice permission errors, use the sudo command (“super-user do”) to open up permissions on the directories involved.

Helpful error log and other file locations:

The log file for CKAN: /var/log/apache2/

Source code: /opt/ngds/bin/default

CKAN error log: /var/log/apache2/ckan_default.error.log

CKAN custom log: /var/log/apache2/ckan_default.custom.log

Harvest gather queue log: /var/log/ckan-central-gather.log

Harvest fetch queue log: /var/log/ckan-central-fetch.log

Harvest run log: /var/log/ckan-central-run-harvest.log

Celery log: /var/log/ckan-node-celery.log

Geoserver and SOLR run on top of Tomcat:

Tomcat log: /var/log/ckan-tomcat.log

 

After evaluating the output of the installation script and fixing any errors you find, re-run the installation script.

4         Short tutorial on using a publisher node installation

This section is intended to be a brief introduction to uploading data to a new installation of a publisher node, perhaps more appropriate for a database manager than technical staff or software developer. Additional information can be found at the NGDS help site http://geothermaldata.org/ngds/data.

4.1       Tiers of NGDS data delivery

4.1.1       Tier 1

The simplest and most common access to resources is provided by simple Web links that result in a file download. Information contained in files can be accessed by users who have software that can recognize and open these files. This is the standard model for files accessible on the web, supported by HTTP servers and desktop web browser software.

Unstructured data requires user interpretation before it can be used for analysis. Users can utilize the information if they can understand the encoding and language, but the system provides no support for this understanding, and little or no automation is possible. Audio files must be transcribed; text files must be parsed and mined for data that is then broken down and structured in ways that can be processed by computers; images must be scanned, interpreted, and often georeferenced. Preparing Tier 1 data for analysis can be a painstaking and time-consuming process.

4.1.2       Tier 2

Tier 2 interoperability indicates that information content is structured (consistently organized) in a spreadsheet or database file thatis amenable to computer processing. However, Tier 2 data does not use a shared, documented datadata structure. Data in this tier must be transformed by the data consumer on a case-by-case basis for integration with other datasets, requiring them to study each new data source to figure out how to extract the information they need. Obtaining data in a structured format is a step towards interoperability because once the format is understood, computer programs can be instructed to extract the desired information.

4.1.3       Tier 3

Tier 3 data is structured data that conforms to an NGDS information exchange. Data that is published according to the exchange specification (content model, interchange format, service protocol) is interoperable with any other data published using that exchange. This is referred to as Tier 3 interoperability. This is the most valuable data in the system, as it allows end users (researchers or computer programs) to retrieve and manipulate data from any source in a predictable and expedient way. When a data file (CSV or XML, for example) is “schema-valid”, itit means that the names of fields in the data structure and the data type of the content in those fields conforms to an information exchange specification. Excel workbook files are available for each information exchange at http://schemas.usgin.org/models/http://schemas.usgin.org/models/;  each contains a worksheet with a template table for thethe information exchange.  When the field headings, data types, etc.data conforms to these specifications (f) the file can be validated at http://schemas.usgin.org/validate/cm before uploading to the node as a tier 3, structured dataset (see Section 5.3).

4.2       How to upload tier 1 and 2 data

If a structured flat file (like CSV or Excel file) is uploaded, the user can preview the data table. Any file type that is uploaded will be available for users to download after the following steps:

·         Go to the Contribute page.

·         Fill in Title for the resource, keywords (Tags), and other metadata. Click Next.

·         Click Upload a file and navigate to a file or other resource.

·         Choose Unstructured or Offline Resource and enter in all metadata. Click Add.

·         Enter all metadata. Click Finish.

4.3       How to upload teir 3 data and publish web services

This upload requires using only schema-valid CSV data files (see Section 5.1.3). Uploading Structured data files and publishing them as web services creates GIS points in WFS and WMS formats which is the greatest of utility within the system.

·         Go to the Contribute page.

·         Title will be the title of the web service. Find the required names of services at https://github.com/ngds/system-design/wiki/_pages.  Fill in keywords (Tags) and other metadata. Click Next.

·         Click Upload a file and navigate to the schema-valid CSV file.

·         Choose Structured Resource and enter in all the metadata.

·         Choose the appropriate content model from the drop-down list.  Always use the latest version; this will enter automatically or prompt to enter to associated layer name (again, check https://github.com/ngds/system-design/wiki/_pages if unsure which layer name to select). Click Next.

·         Fill out all metadata here, click Finish.

·         The metadata for that dataset is now published, but the web service is not yet published. To publish the web service, click Publish as OGC.

4.4       How to bulk upload metadata

Version 1 of the NGDS CKAN does not include functionality to upload tabular metadata compilations (like tier 3 data). Please see Section 5.2 instead for how to register resources and create metadata in the system.

4.5       Register your NIAB with the main NGDS aggregator

To make your NIAB installation a contributing node in NGDS, register your installation URL with NGDS system administrators by emailing metadata@usgin.org. To do this, you must ensure that your installation is on a web-facing server. The metadata provided by your NIAB will then be available from the central site at geothermaldata.org. Speak with system administors to determine whether you prefer one-time or intervaled harvests; intervaled harvested gives a data provider complete control over their contributions, as at each intervaled harvest the data from previous harvests will be removed before beginning a new harvest.

5         Tips for using an aggregator node installation

Whereas the publisher node is used to upload and manage individual resources or collections of resources, the aggregator node is used as a harvester. As such, it has different functionality in the Contribute page than does a publisher node. Here, the UI is used to manage those harvest sources. The following are a few helpful hints for using the version 1 aggregator node as a harvester.

·         Version 1 of this installation will support harvesting from:

o   an OGC CSW endpoint with ISO-standards-compliant metadata (http://usgin.org/specifications; http://www.opengeospatial.org/standards/cat )

o   publisher nodes using the CKAN Harvester (ckan_harvester plugin, which is CKAN’s API for harvesting between CKAN instances)

·         Harvest using the NGDS CSW Harvester

·         To harvest a CSW, use only the string before the '?' in the CSW base URL, for example:
http://catalog.usgin.org/geothermal/cswVersion 1 of this installation does not support implementation of a CSW endpoint from an aggregator node

·         Common Harvesting Issues

o   Harvest requests timing out:  Check if the Server/Servlet the CSW is being hosted on has enough power to support the types of requests being made by the NGDS Harvester.

o   Seeing less or more records than expected: Depending on what catalog server is being used, updates and changes to resources within a catalog can have a temporary effect on the indexing for the CSW. If the issue does not resolve itself within 24 hours, there may be pagination issues with the CSW.

o   ISO validation errors: Use the unique ID within the validation error provided to build a GetRecordsByID request to check the XML details for the invalid record.

o   Harvester failing on sections, schemas, or parameters in the capabilities request: Make sure to check the OWSlib Code (https://github.com/geopython/OWSLib/blob/master/owslib/csw.py) against the CSW capabilities document you are implementing.  The NGDS harvester relies on OWSlib as part of the validation process for CSW request, viewing the code can provide further insight to what may be missing from the capabilities document if any errors occur.

 

Appendix A   Installing a Virtual Machine

If you are working with an operating system other than Linux (e.g., Windows) youyou will need to install the Ubuntu Linux operating system on a virtual machine.  A virtual machine is a software application that emulates an operating system environment (e.g. Linux, Windows XP, MacOS) as if it were an application (with its own windows) running on a computer under a different, host operating system. A virtual machine therefore exists entirely within the memory of the host machine that is actually running a different operating system.  Virtual machines are supported by virtualization software that provides an abstract hardware representation emulating real host hardware. Virtualization allows the installation of a full operating system within a host OS.  

Although the NGDS CKAN software and its various underlying dependencies can be installed in a wide variety of operating system environments, the instructions in this tutorial are specific to anan Ubuntu Linux OS environment.. A virtual environment installed inside your current computer operating system, whatever that it, can provide an identical environment to make using this tutorial easier.

YouYou will need to choose appropriate virtualization software. Currently, two free virtual environment managers are available: VMware Player, and Oracle VirtualBox. They can be downloaded on the links below:

·         VMWare Player: http://www.vmware.com/products/player/

·         Oracle VM VirtualBox: https://www.virtualbox.org/wiki/Downloads

This tutorial was developed using VirtualBox version 4.2.10 for Windows. Here, we install Linux Ubuntu 12.04 LTS from Canonical.

A.1      Creating an Ubuntu Linux Virtual Machine using VirtualBox

The steps in Appendix A of this document describe the installation of Ubuntu Linux (or Xubuntu) on a virtual machine supported by version 4.2.10 of Oracle VM VirtualBox under the Windows operating system. Newer versions of VirtualBox can also be utilized (tested with VirtualBox 4.3.8 as well). The VirtualBox software is also available for Apple OS X, various other Linux variants, and for Sun Solaris hosts; follow the VirtualBox instructions to get the virtualization environment set up on these other operating systems.

A.1.1        Install Oracle VM VirtualBox Manager

1.    Download the VirtualBox software from: https://www.virtualbox.org/wiki/Downloads

2.       Run the installer and follow the on-screen instructions to install VirtualBox.

3.       Create an Ubuntu Linux Virtual Machine

4.       Run the VirtualBox application installed previously and use it to create a virtual machine:

5.       Run the VirtualBox application

6.       Create a new virtual machine

7.       Specify the following (Figure 2):

Name: NGDS

Type: Linux

Version: Ubuntu

The maximum stack configured for Java is 2048 MB, so choose to allocate at least 3072 MB of RAM to your virtual machine

8.       Create a hard drive for your new virtual machine (Figure 3)

·         Specify the type of hard drive used by your virtual machine (Figure 4); the drive type you select determines the compatibility of the virtual hard disk you create with different virtualization software

·         Specify disk space allocation (Figure 5); dynamic allocation allows your virtualization software to allocate more hard drive space from the virtualization platform to this virtual hard drive as-needed.

·         Allocate disk space to your virtual hard drive (Figure 6); this allocates a specified amount of hard drive space from the virtualization platform to the virtual machine. Ubuntu 12.04 systems will function with 8Gb, but if you are using more recent Ubuntu builds (e.g. 13 or 14 series, more disk space should be allocated, e.g. 12 Gb)

 

 

Figure 2: Create a new Linux virtual machine

 

 


 

 


 

 

 


 

 

 


 


A.1.2       Configure your Virtual Machine

Open the Oracle VM VirtualBox Manager (Figure 7); select your virtual machine and click Settings.

1.       First, enable the Shared Clipboard:

·         Select General settings

·         Select the Advanced tab (Figure 8)

·         Click the Shared Clipboard dropdown menu

·         Select Bidirectional

This will enable a virtual machine user to copy and paste between the virtual machine and the host computer. The virtual machine is distinct from the host computer and does not share the same clipboard by default.

 

 

 


The virtual machine is created and configured, next install the Linux operating system.

 

 


A.1.3       Linux Installation

 The operating system will be installed from a file called an ISO 'image' (ISO stands for International Standards Organization) which contains an image of an ISO-standard CDROM file system. This image file can be loaded and read by the virtual CD drive that is part of the virtual machine you have just created. Virtual CD drives are software applications that emulate a CD-ROM drive in much the same way that an entire computer can be emulated by virtualization software.

To install Ubuntu on a virtual machine, you will need an ISO image of an Ubuntu installation file, available at: http://releases.ubuntu.com/12.04/. This tutorial utilizes the Long Term Service (LTS) version of Ubuntu, which features long-term support (3 years). Thee download site provides files for various machine configurations. Use: http://releases.ubuntu.com/12.04/ubuntu-12.04.4-desktop-i386.iso, which is designed for a 32-bit machine emulating an Intel CPU. After downloading an ISO image, mount it within the VirtualBox environment and use it to install the Ubuntu operating system on your virtual machine.

1.       In the Oracle VM VirtualBox Manager, select the virtual machine you created in section A.1 and click Settings.

2.       In the Settings window, click Storage (Figure 9).

3.       In the Storage panel under Attributes, click the CD icon next to the CD/DVD Drive drop-down menu on the far right.

4.      

Navigate to the ISO image file you downloaded and select it.

5.       In the Storage panel, click OK to mount the image.

 

 

 

A.1.4       Install Ubuntu Linux

In the Oracle VM VirtualBox Manager, select your virtual machine and click Start. When started, your virtual machine will prompt you to install the operating system loaded in the image in much the same manner as you would on a physical computer.

1.       Click Install Ubuntu to begin; follow the on-screen instructions (Figure 10).

 


2.       When you are prompted to do so, create a user ngds . Enter ngds for Your name as well as for Pick a username; specify a password of your choice. This user will be created as the administrator (super user).

When the installation is complete, you will be prompted to restart. Once the virtual machine is shutting down, press Enter when prompted. When your machine is restarted, log in using the username ngds and the password you specified during the Ubuntu installation process.

In addition to the above, it is recommended that you install the Guest Additions module. Choose Device drop-down from the top left. Choose Install Guest Additions and follow the installation steps.

If the Device drop-down is not obvious, open a terminal window, and from the command line, run: 

sudo apt-get install virtualbox-guest-additions-iso

Note on Ubuntu 13.10 you must first run (because of some missing dependencies):

sudo apt-get install virtualbox-guest-dkms

The 'sudo' command runs the following commands as a super user, and will ask for your ngds user password. Use the password you created in step 2 above.

A.1.5       Take a Snapshot

Take a Snapshot of your virtual machine before continuing. A Snapshot is a record of the virtual machine that can be used to restore it to its condition at the time the Snapshot was taken. Snapshots are typically used as precautions against failure at a later date.

A Snapshot can be taken via the VirtualBox Manager or from the Machine drop-down on the top left.

When Oracle VM Virtual Box install is complete, return to section 2 'Install the NGDS Software Stack'' to continue installation of the NGDS node.  There are many online resources for learning more about getting things done using Ubuntu--we recommend using your browser to search the web when you run into problems.

A.2      Accommodating a corporate firewall (OPTIONAL)

If the computer you are using to host your virtual machine is behind a corporate firewall, your virtual machine may not have immediate Internet access. Internet connectivity is required in order to install NGDS Software Stack components on your virtual machine (as will be discussed in Section 3).

A.2.1        Install and Configure CNTLM (OPTIONAL)

CNTLM is a proxy that authenticates the user with a log-in and password, a typical requirement for corporate firewalls. If you are not behind a firewall that requires authentication, you can skip this step.

CNTLM is available at: http://cntlm.sourceforge.net/

After installing CNTLM on your host machine, use a text editor to modify the cntlm.ini file; here, specify the credentials your host machine uses to bypass your corporate firewall. An example appears in Table 1 below:

Username    yourcorporateproxyusernamehere

Domain      us008

Password    yourpasswordhere

# List of corporate proxies

Proxy       proxyfarm-us.3dns.netz.sbs.de:84

Proxy       129.73.8.72:8080

Proxy       129.73.11.208:3128

NoProxy     localhost, 127.0.0.*, 10.*, 192.168.*

# local port used by CMTLM

Listen      3128

 

In the example above, text strings preceded by a pound sign or hash symbol (#) are comments for the benefit of human operators; comments are not interpreted by any program reading the cntlm.ini file.

When configuring CNTLM, be sure to specify a localhost (NoProxy) entry with appropriate IP addresses and an appropriate port. The default CNTLM port is 3128. Asterisks (*) are wildcard characters which indicate the range of available possibilities for a given character – so 10.* can be 10.0, 10.1, or 10.2, all the way up to 10.9.

To use CNTLM, make sure CNTLM is running on your host machine whenever you run the virtual machine you created previously. If CNTLM is not running on the host machine, your virtual machine will be unable to establish an Internet connection.

CNTLM can be executed by command prompt or set to run as a Windows service. Starting CNTLM from a command prompt is useful within a development environment because doing so allows you to manually restart CNTLM in response to freezes or crashes.

A.2.2         Configure CNTLM proxy (OPTIONAL)

Log in to your virtual machine; navigate to the etc directory and use a text editor to manually edit the environment file. Add the proxies specified above to the environment file; an example appears below:

http_proxy=http://10.0.2.2:3128/

https_proxy=http://10.0.2.2:3128/

ftp_proxy=http://10.0.2.2:3128/

no_proxy="localhost,127.0.0.1,192.168.50.1,192.168.50.2"

HTTP_PROXY=http://10.0.2.2:3128/

HTTPS_PROXY=http://10.0.2.2:3128/

FTP_PROXY=http://10.0.2.2:3128/

NO_PROXY="localhost,127.0.0.1,192.168.50.1,192.168.50.2"

 


Alternatively, you can use the Ubuntu Network Configuration application to manually specify the desired proxies (Figure 11).

A.2.3         Problems with CNTLM

If possible, finish the install on a virtual machine connected to the Internet instead of a local intranet. If this is not possible, you will need to configure your virtual machine’s settings in such a way that you are able to use the apt get command; negotiating an intranet may require installation of CNTLM within your virtual machine, as well.

If CNTLM causes issues after you have successfully installed the software, but then when you try to open the web sites locally hosted and CNTLM then causes issues, establish port forwarding within your virtual machine to forward the ports of interest (e.g. 5000, etc) to your physical machine, and browse the web sites on your physical machine. At least at CT RTC this solves the issues with the proxy.

Appendix B   Architectural and Deployment Diagrams

 

Figure 12: A diagram of NGDS

 

 


A.3      What is CKAN?

CKAN stands for Comprehensive Knowledge Archive Network.

CKAN is modular free-and-open-source data portal software. When properly installed on a server, CKAN provides a web-accessible interface by which users can submit and manage metadata records. The CKAN user interface also allows users to configure automated metadata harvesting from registered CKAN instances (an instance is a specific installation of the CKAN software); metadata harvested in this way is used to generate a web-accessible catalog. These traits are well-suited to the requirements of NGDS.

A CKAN extension is a user-generated modification of the CKAN software. The NGDS CKAN Extension is a CKAN extension designed to interact with NGDS data, metadata, and interchange formats. See Figure 13 for an overview of the components of CKAN as developed for use in NGDS.

 

 

 

Figure 13: NGDS High-level Components

 

A.4      Domain Model

The Domain Model of NGDS can be represented as a class diagram (Figure 14). This shows the relationships of the separate entities that comprise the system; boxes on the left and bottom represent  end users accessing the system, which results in discovering datasets, OGC-compliant web services, and other resources.


Figure 14: NGDS Domain Model as a Class Diagram
 

 


A.5      Additional Notes on CKAN in Production Mode

When running CKAN in production mode, consider the following:

·         The celeryd runs as a service; you can control it with the following command:
sudo service ngds-celeryd start|stop|restart|status

·         If Tomcat needs to be started manually, do so with the following command:
cd /opt/ngds/tomcat/bin; ./catalina.sh run

·         The log file for CKAN is in the following location:
/var/log/apache2/

·         Source code is installed at the following location:
/opt/ngds/bin/default/