Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development...
Transcript of Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development...
Mangrove DocumentationRelease 0.1
Jeff Wishnie
December 15, 2011
CONTENTS
1 Project Organization 31.1 Project Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Faces of Mangrove . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Developer Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Setting Up the development environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Design and Technical Documents 92.1 Mangrove Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Datastore document structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 System Concepts & Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 API SPEC EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.6 Data Dictionary Expected API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.7 Querying couch by hierarchy and time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.8 Setting up the ‘DataWinners’ Web App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.9 Setting Up the POSTGIST and importing location data . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Indices and tables 25
i
ii
Mangrove Documentation, Release 0.1
Mangrove is an open source platform for exploring data with location and time information.
Project Home http://mangroveorg.github.comEmail http://groups.google.com/group/mangroveorgIRC chat #mangrove on freenode.netSource https://github.com/mangroveorg/mangroveJenkins CI http://178.79.163.33:8080/
CONTENTS 1
Mangrove Documentation, Release 0.1
2 CONTENTS
CHAPTER
ONE
PROJECT ORGANIZATION
1.1 Project Governance
[coming soon!]
1.2 Faces of Mangrove
1.2.1 Kick-off team as of Feb-2011
Akshay Naval
organization ThoughtWorks
location Pune, India
Alan Viars
organization Videntity
location Baltimore, MD, US
Alex Dorey
organization Columbia University
location New York, US
Andrew Marder
organization Earth Institute/Columbia University
location New York, US
Aroj George
organization ThoughtWorks
location Pune, India
3
Mangrove Documentation, Release 0.1
Asif Momin
organization ThoughtWorks
location Pune, India
David McAfee
organization HNI
location Antananarivo, Madagascar
Diptanu Choudhury
organization ThoughtWorks
location Bangalore, India
Jeff Wishnie
organization ThoughtWorks
location Portland, OR, US
Kedar Bapat
organization ThoughtWorks
location Pune, India
Kevin Samuel
organization
location Nice, France
Mamy Dafy
organization HNI
location Antananarivo, Madagascar
Matt Berg
organization Columbia University
location New York, US
Shweta Shetty
organization ThoughtWorks
4 Chapter 1. Project Organization
Mangrove Documentation, Release 0.1
location Pune, India
Ravi Kumar
organization ThoughtWorks
location Pune, India
Simon de Haan
organization Praekelt Foundation
location Cape Town, South Africa
Photos hosted here on Flickr
1.3 Developer Practices
• We are using Git as the Source Control Manager (SCM)
• We are using GitFlow for better version control and branching
• Our Documentation style is restructuredtext (RST)
– We can try out RST text with the online tool reSTrenderer
• Our python coding style guide is PEP8
– 4 spaces per indentation level
– Soft tabs (indentation is with spaces only)
• We have a continuous integration server set up using jenkins. It can be viewed on http://178.79.163.33:8080/
• We have detailed test reports and code coverage for every build
• We are using nose tests to write unit tests. You are requested to maintain the unit test suit for every code youcheck in. Please make sure that the test coverage for code is high :)
• Our functional tests are written in WebDriver (Selenium 2.0b2)
• We are using fabric for automatic deployment
• We use virtualenv and pip to set up our python environment
1.3.1 Other important links
• Our transport layer is managed by VUMI
• Django 1.3 is our web framework
1.3. Developer Practices 5
Mangrove Documentation, Release 0.1
1.4 Setting Up the development environment
In order to get your machine setup to start contributing to mangrove - this is what you need to do:
• Install chef client
• Clone the chef repository from mangrove
• Run chef solo
1.4.1 Details
We are detailing out the above steps for Ubuntu 10.10. For any other OS - please look through the chef documentationgiven here
• Install ruby:
$ sudo apt-get install ruby-full
• Install rubygems:
$ cd /tmp$ wget http://rubyforge.org/frs/download.php/70696/rubygems-1.3.7.tgz$ tar zxf rubygems-1.3.7.tgz$ cd rubygems-1.3.7$ sudo ruby setup.rb
You can verify your installation was successful with
$ gem -v1.3.7
• Install chef client:
$ sudo gem install chef
You can verify your installation was successful with
$ chef-client -vChef: 0.9.0
• Install git:
$sudo apt-get install git
• Clone the chef repository:
$git://github.com/mangroveorg/chef-repo.git
• Make sure your system is updated and upgraded before you run the chef script:
$sudo apt-get update$sudo apt-get upgrade
• Create a user mangrover and give him sudo rights:
$useradd mangrover$passwd mangrover$sudo usermod -aG sudo mangrover
6 Chapter 1. Project Organization
Mangrove Documentation, Release 0.1
• Run chef solo(as mangrover):
$cd chef-repo$sudo chef-solo -c chef-solo/solo.rb -j chef-solo/node.json
1.4. Setting Up the development environment 7
Mangrove Documentation, Release 0.1
8 Chapter 1. Project Organization
CHAPTER
TWO
DESIGN AND TECHNICALDOCUMENTS
2.1 Mangrove Tutorial
2.1.1 Introduction
Follow are the main concepts in mangrove.
2.1.2 Entity Type:
create entity type:
entity_type = ["HealthFacility", "Clinic"]# entity type is hierarchy. example "Education School" etcdefine_type(self.dbm, entity_type)
2.1.3 Entity:
create entity
entity_type = ["HealthFacility", "Clinic"]# entity type is hierarchy. example "Education School" etccreate_entity(self.dbm, entity_type=entity_type, short_code="1")
2.1.4 Data Record:
get datarecord:
DataRecord.get(self.dbm,data_record_id)
2.1.5 Form Model:
Create a Form:
9
Mangrove Documentation, Release 0.1
default_ddtype = DataDictType(self.dbm, name=’Default String Datadict Type’, slug=’string_default’,primitive_type=’string’)
default_ddtype.save()question1 = TextField(name="Q1", code="ID", label="What is the reporter ID?",
language="eng", entity_question_flag=True, ddtype=default_ddtype)
question2 = TextField(name="Q2", code="DATE", label="What month and year are you reporting for?",language="eng", entity_question_flag=False, ddtype=default_ddtype)
question3 = TextField(name="Q3", code="NETS", label="How many mosquito nets did you distribute?",language="eng", entity_question_flag=False, ddtype=default_ddtype)
form_model = FormModel(dbm, entity_type=["Reporter"], name="Mosquito Net Distribution Survey",label="Mosquito Net Distribution Survey",form_code="MNET",type=’survey’,fields=[question1, question2, question3])
form_model.save()
2.1.6 Data Submission:
Submit data to the form directly
values = { "ID" : "rep45", "DATE" : "10.2010", "NETS" : "50" }form = get_form_model_by_code(dbm, "MNET")form_submission = form.submit(dbm, values, submission_id)
Submit data to the player
text = "MNET .ID rep45 .DATE 10.2010 .NETS 50"transport_info = TransportInfo(transport="sms", source="9923712345", destination="5678")sms_player = SMSPlayer(dbm)response = sms_player.accept(Request(transportInfo=transport_info, message=text))
The player will also log the submission for you in Mangrove.
Load all submissions for the form::
get_submissions_made_for_form()
2.1.7 Aggregation:
Monthly Aggregate on all data records for a field per entity for the form code
values = aggregate_for_time_period(self.manager,form_code=’CL1’,aggregates=[Sum("patients"), Min(’meds’), Max(’beds’),Latest("director")],period=Month(2, 2010))
10 Chapter 2. Design and Technical Documents
Mangrove Documentation, Release 0.1
Returns one row per entity, with the aggregated values for eachfield.{"<entity_id>": {"patients": 10, ’meds’: 20, ’beds’: 300 , ’director’: "Dr. A"}}
Weekly Aggregate on all data records for a field per entity for the form code
values = aggregate_for_time_period(self.manager,form_code=’CL1’,aggregates=[Sum("patients"), Min(’meds’), Max(’beds’),Latest("director")],period=Week(52, 2009))
52 is the weeknumber and 2009 is the year.Returns one row per entity, with the aggregated values for each field.{"<entity_id>": {"patients": 10, ’meds’: 20, ’beds’: 300 , ’director’: "Dr. A"}}
Yearly Aggregate on all data records for a field per entity for the form code
values = aggregate_for_time_period(self.manager,form_code=’CL1’,aggregates=[Sum("patients"), Min(’meds’), Max(’beds’),Latest("director")],period=Year(2010))
2010 is the year.Returns one row per entity, with the aggregated values for each field.{"<entity_id>": {"patients": 10, ’meds’: 20, ’beds’: 300 , ’director’: "Dr. A"}}
2.2 APIs
To get a quick idea of the current state of the mangrove.datastore API, we have included the API for all the files inmangrove/datastore/.
Here’s what’s in mangrove.datastore:
2.2. APIs 11
Mangrove Documentation, Release 0.1
2.2.1 aggregationtree
2.2.2 config
2.2.3 database
2.2.4 datadict
2.2.5 data
2.2.6 datarecord
2.2.7 documents
2.2.8 entity
2.2.9 initializer
This module tries to import validate.py, but I can’t get it to work on my machine.
2.2.10 reporter
This module tries to import validate.py, but I can’t get it to work on my machine.
2.2.11 settings
2.3 Datastore document structure
Entities:
Reporter{
"_id": "a676766fe45440f48ff4e9a0ce58b329","_rev": "1-f83b88a382c14b5ade660710adde0d9e","name": "reporter1","entity_type": "reporter","last_updated_on": null,"created_on": "2011-03-24T07:32:15Z","aggregation_trees": {
"org_chart": ["Country Manager","Field Manager","Field Agent"
]},"attributes": {
"age": 25,"entity_type": "reporter"
},"document_type": "Entity"
}
12 Chapter 2. Design and Technical Documents
Mangrove Documentation, Release 0.1
Clinic{
"_id": "961cefb2a0324878bed06ab736e5dc09","_rev": "1-c4d02559a13e7698437aae58f35e8440","name": "Clinic 1","entity_type": "clinic","last_updated_on": null,"created_on": "2011-03-24T07:32:15Z","aggregation_trees": {
"location": ["India","Maharashtra","Pune"
]},"attributes": {
"entity_type": "clinic"},"document_type": "Entity"
}
Data Records:
Data Record{
"_id": "e4d5cb3e76ca40a78088c7bfe5d0cf03","_rev": "1-7ca2ee8ad10a444eb6f3a8bad19ff957","reporter_backing_field": {
"_id": "a676766fe45440f48ff4e9a0ce58b329","name": "reporter1","entity_type": "reporter","_rev": "1-f83b88a382c14b5ade660710adde0d9e","last_updated_on": null,"created_on": "2011-03-24T07:32:15Z","aggregation_trees": {
"org_chart": ["Country Manager","Field Manager","Field Agent"
]},"attributes": {
"age": 25,"entity_type": "reporter"
},"document_type": "Entity"
},"last_updated_on": null,"source": {
"report": "hn1.2424","phone": "1234"
},"created_on": "2011-03-24T07:32:15Z","attributes": {
"beds": "100","event_time": "2011-02-01 00:00:00","arv": "200"
},"document_type": "DataRecord",
2.3. Datastore document structure 13
Mangrove Documentation, Release 0.1
"entity_backing_field": {"_id": "880552a483594ca9af07508e379f4520","name": "Clinic 2","entity_type": "clinic","_rev": "1-99c4e6ebd76bba417dcd034f935d7483","last_updated_on": null,"created_on": "2011-03-24T07:32:15Z","aggregation_trees": {
"location": ["India","Karnataka","Bangalore"
]},"attributes": {
"entity_type": "clinic"},"document_type": "Entity"
}}
2.4 System Concepts & Terminology
2.4.1 System Overview
2.4.2 Terminology
• SMS - Short Message Service (limited to 160 char), message string that is sent over mobile phones
• ICT - Information & Communication Technology
• ICT4D - ICT for Development
• M4D - Mobile for Development
• ARV - Anti-retroviral medication (against HIV Virus)
• IVR - Interactive Voice Response
• HCN - Host Country National
• USSD - Unstructured Supplementary Service Data
• MDG - Millenium Development Goals
• Indicator - it is a field that is computed or used in Visualization of associated data
• M&E - Monitoring and Evaluation
• MNO - Mobile Network Operators
• WASP - Wireless Access Service Provider
• SMSC - Short Message Service Center (component that sends the SMS messages to the recipients)
• MSL - Master Story List
14 Chapter 2. Design and Technical Documents
Mangrove Documentation, Release 0.1
2.4.3 Core Datastore Overview
Introduction
The high-level goal of the Mangrove datastore is to allow the free submission of data about a known set of entities andthe quick and easy retrieval of data aggregated across time and hierarchy without requiring any upfront definition ofschemas or entity structure.
The key goals can be summarized as:
• Support Schema-less submission of arbitrary data
This is motivated by expected usage patterns where an organization will frequently modify the data collectedbased on actual usage. By avoiding requiring any a-priori definition of data-sets users are given full flexibilityto adjust data collected on-the-fly.
for example, a health NGO operating rural clinics might begin by simply collecting a monthly report of howmany patients where seen in that month. As they get more sophisticated they may start collecting separate valuesfor men, women, girls and boys. This transition should not require any datastore restructuring.
• Support aggregation of data across time and hierarchy (geographic as special case)
Time-based aggregations include queries such as “Average number of patients seen in 2011” or more complexsegmented time aggregations such as “Average number of patients seen each month in 2011”
The key hierarchical aggregation is by geographic administrative boundaries. For example: “Total number ofpatients seen in 2011 for all clinics in San Francisco (or California or United States)”
Non-geographic arbitrary aggregation trees as supported as well. For example, aggregation by organizationchart: “Patients seen at clinics managed by the Child Protection group”
• Provide data consistency on a field level via ‘Data Dictionary’
To make it easy for users to aggregate data collect for a given entity via unstructured data submissions, thecore datastore will include a ‘Data Dictionary’ where semantic-types are defined at stored. These types arethen applied to submitted data fields allowing aggregation across different submissions and encouraging dataconsistency.
For example, our health NGO now wishes to collecting data on each patient who receives an HIV test so theysubmit data for each patient test in form (name, age-in-years, test-administered).
Later they start recording patients who receive family-planning counseling and collect: (name, age-in-years,counseling-program-attended)
When they want to get the average age of patients who received HIV Tests or Family Planning Counselingthe system can aggregate values of ‘age-in-years’ from both submissions even though the structure of eachsubmission is different.
And later, when they want to start registering infants seen, they can define a more useful ‘Age in Months’field (with values ranging from 0-60) and still run aggregations of the form “Average age of patients seen” bymultiplying any aggregated “Age in Years” values by 12 before averaging with “Age in Months” fields.
• Provide simple Python and RESTful APIs for accessing data and standard aggregation queries
The datastore is agnostic as to both the sources and consumers of data. These APIs will allow data sourcesranging from SMS engines, to XForms clients and Web applications to submit data.
On the visualization and reporting side, charting, plotting, graphing, and geographic visualization clients mayaccess data series suitable for visualization pre-aggregated across time and hierarchy.
2.4. System Concepts & Terminology 15
Mangrove Documentation, Release 0.1
Core Structures
The logical architecture as envision has very few structures:
• Entity
An ‘entity’ is anything that users may want to report on. For example: a patient, a clinic, a waterpoint, etc...Entities are typed (e.g. ‘Clinic’, ‘Waterpoint’) and uniquely identified
Entities contain no data beyond UID and TYPE
Entities must be registered in the system before data can be collected on them. Registration is nothing morethan the process of assigning a UID to the entity and does not have to be a distinct user-action—the datastorecan register an entity as part of the process of recording the first submission of data on the entity.
• Data Record
Every time data is submitted to the datastore it is saved as an independent time-stamped data record.
Each data record is associated with a single Entity. The set of data records for a given Entity comprises all thevalues/data known about that Entity.
For example, if a user submits a report that 10 patients were seen in May at Clinic1, and other user submits areport that Clinic1 had stock of 20 bednets in May, the set of information known about Clinic1 is that in May10 patients were seen and 20 bednets are in stock.
• Fields and Values
Each data record contains an arbitrary set of field/value tuples with fields optionally typed from the Data Dic-tionary.
• Data Dictionary Types
These are definitions of types which can be associated with fields in a data record. Defined types maybe containthe following:
– Type name
– Base type (numeric, string, choice, geocode etc...)
– User readable description
– Validation constraints
Questions we want to ask the Data Store
Rather than set out specific technical proposals, or get caught in the argument over what should be done in the DB vs.in application logic, here I try to categorize the different kinds of questions we want to be able to ask the data store.
For the examples, assume the datastore is holding information for a NGO that operates health clinics throughout theUnited State.
Basic Retrieval
Question Retrieve all the Entities of a specific type.
Example Show a list of all health clinics.
Question Retrieve specific entity by a unique id.
Example Show health clinic with ID Clinic001:
16 Chapter 2. Design and Technical Documents
Mangrove Documentation, Release 0.1
Question Retrieve specific entity by a semi-unique id. This may return a list if there are multiple matches.
Example Show health clinic with “Free Clinic” in its name.
State Queries
Question Retrieve an Entity (or set of Entities) with a specific set of values.
Example Show a list of all health clinics and include with each clinic:
• Geographic location
• Clinic Directors Name
• Current stock of Cipro (an antibiotic)
Question Return an Entity (or set of Entities) with all the latest values associated with it.
Example Show the latest information for Clinic001. This should include the latest reported value of every field everyreported on this clinic.
Question Retrieve an Entity (or set of Entities) a set of values as of a given date
Example Show all the latest information on Clinic001 as of Jan 15, 2010
Time Aggregated Queries
Question Retrieve an Entity (or set of Entities) with a specific set of values aggregated by a function such as sum()or avg() over a given time range.
Example Show a list of all health clinics and include with each clinic:
• Total number of patients seen in 2011
Question Retrieve an Entity (or set of Entities) with a specific set of values aggregated by a function such as sum()or avg() over a given time range with a given periodicity.
Example Show a list of all health clinics and include with each clinic:
• Average number of patients seen each month for each month in 2011
Selection Queries
Question Retrieve all Entities which have a specific value.
Example Show all health clinics where “Population Served” > 1000
Question Retrieve all Entities which have a specific aggregated value.
Example Show all health clinics where “Total Patients Seen” > 1000
2.4. System Concepts & Terminology 17
Mangrove Documentation, Release 0.1
Question Retrieve all Entities which have a specific aggregated value over time.
Example Show all health clinics where “Total Patients Seen in 2011” > 1000
Hierarchy Aggregated Queries
Note: These queries don’t return entities, they return values aggregated by a hierarchy node (e.g. ‘California’ or ‘SanFrancisco’) which suggests that maybe Matt Berg is right and hierarchy nodes maybe should be consider ‘Entities’, or‘Generated Entities’...
Question Retrieve a set of Values aggregated by a given node in a hierarchy.
Example From the set of all clinics in California show:
• Total number of patients seen in 2011 (in California)
• Average number of patients seen in 2011 (in California)
Question Retrieve a set of Values aggregated by a given level in a hierarchy.
Example From each State in the United States show:
• Total number of patients seen in clinics in that state 2011
• Average number of patients seen in clinics in that state in 2011
2.4.4 Data Dictionary Concept and usages
The data dictionary is an stand alone service that hosts data type definitions in order to allow user to share them.
This project is part of the mangrove project.
It provides
• A simple format to define any kind of data by providing a basic type, tags and contraints.
• An unique ID to refence a definition in external services.
• HTTP API to get the definition from external service.
• A python wrapper around the HTTP API than provide contraints checking and type casting.
• A replication system to synchronize several data dictionaries together.
• A versioning system ensuring that updating definitions doesn’t break others references.
Use cases
• You share common types among several systems and keep them up to date and in sync.
• You expect the user to define data himself.
• You store data in a schemaless data base and want to attach a type, a meaning or constraints to it.
18 Chapter 2. Design and Technical Documents
Mangrove Documentation, Release 0.1
What it’s not
• An semantic data base. No complexe defintion nor RDF/SPARQL magic in there.
• A user interface to let the user enter data types. You have to provide it.
• A bullet proof solution for anything. This is work in progress and design to solve a very specific problem, not astandard or anything to solve anybody’s data type problems.
The data dictionary in the Mangrove project
We need to store data in a schemaless database.
The data is going to be defined by the user and organisations will want to share their definations.
The data is going to be inputs from field agents about any subjects they are studying so the system cannot know inadvance the data type. For example some NGO would be studying school results in India and other would be studyingschool attentance in Africa. They all would want data defination for school and students. The data dictionary isdesigned to hold this defination so that it is reusable across orgnanisation.
Current implementation
• CouchDB database
• couchdb-python
2.5 API SPEC EXAMPLES
2.5.1 Top Level Format:
Each API response shall have a dictonaty with 4 items. When responding via HTTP the status filed shall match theHTTP response status code.
FIELD TYPE DESCRIPTIONstatus int HTTP status 2xx, 3xx, 4xx, 5xxmessage str A string containing a message about the responsenum_results int The number of results <=0results list A list of results of length num_results. If 0, then empty list
An example of an Error response in JSON.
{status: 401,message: "Not Authorized",num_results: 0,results: ()}
An example of an successful created response in JSON.
{status: 201,message: "Data Record Created",num_results: 1,results: (
’_id’: ’ee7c7583-1afe-4985-a1ea-69fd4764552b’,
2.5. API SPEC EXAMPLES 19
Mangrove Documentation, Release 0.1
’field1’: ’foo’,’field2’: ’bar’,)
}
An example of an successful search response in JSON.
{status: 200,message: "Search results successful",num_results: 2,results: (
{’_id’: ’ee7c7583-1afe-4985-a1ea-69fd4764552b’,’field1’: ’foo’,’field2’: ’bar’},
{’_id’: ’4d4eb8de-3955-412c-a078-3e846182380b’,’field’: ’milli’,’field2’: ’vanilli’},
)}
2.6 Data Dictionary Expected API
2.6.1 How it is stored in nosql database.
In the mangrove system, this is termed as data dict storage:
{"_id": ""","primitive type": "int","name": "Malaria pills stock","description": "Description of this drug and the stock itself.","version": "2010-10-10 07:06:45.45646","tags": [
"health","medicine","drug","malaria","pill"
],"constraints": {
"gt": "0","lt": "10"
},}
2.6.2 How it is refereneced in external service.
In Mangrove system, the data is stored in datastore. Each data instance holds a reference to its type.:
20 Chapter 2. Design and Technical Documents
Mangrove Documentation, Release 0.1
type {"uuid" : "b4cd35d9f04887da905c051b894568","version" : "2010-10-10 07:06:45.45646"
}
2.6.3 Python wrapper API
For querying the data dict, you can use a Python restful wrapper that provides filtering, type casting and constraintvalidations.
For example:
In settings.py:
DATABASE_NAME = ’datadict’SERVER_ADDRESS = ’http://localhost’SERVER_PORT = ’5984’
Create a data type:
dt = DataType(name="test", contraints={’gt’:4}, tags=[’foo’, ’bar’],type="int", description="Super dupper type")
dt.save()
or:
DataType.create(name="test", contraints={’gt’:4}, tags=[’foo’, ’bar’],type="int", description="Super a type")
Searching for datatype with tags:
DataType.with_tags(’foo’, ’bar’)
Getting datatype:
dt = DataType.get(id, version)
validating data:
try:dt.validate(value)
except dt.ValidationError as e:for error in e.errors:
print error
casting data:
dt.to_python(value)dt.to_json(value)dt.to_xform(value)
2.7 Querying couch by hierarchy and time
2.7.1 The Problem :
Doing aggregation by hierarchy and time in couchdb. For example :
2.7. Querying couch by hierarchy and time 21
Mangrove Documentation, Release 0.1
Our aim is to be able to give results for the following types of queries:
1)Total population in Country/State/City wise for all months. 2)Monthly population country/state/city wise. Forexample Total Population in the state of Maharashtra in March.
2.7.2 The Data:
The population would be stored as couchdb document in the following format. (It is simplified for the purpose of theillustration) The basic document structure is as follows:
{"_id": "Entity name","path": [
"India","MH","Pune"],
"population": 20,"month": "feb"
}
MH - Maharashtra is a state Pune is a city
We have written a map-reduce function to aggregate data by multilevel location hierarchy and time. The “path” fieldindicates the location hierarchy tree for the entity. Month is the time value. It will be a proper date - we have takenmonth for the purpose of the spike.
2.7.3 The Map-Reduce:
The map function is as follows:
function(doc){for (i in doc.path){
emit([i,doc.path[i],doc.month], doc.population);}
}
The reduce function is _sum
2.7.4 The Output:
The sample output will be as follows:(when reduced to level 2 in couchdb):
{["2", "Pune", 7] : 150["2", "Pune", 3] : 80["2", "Pune", 2] : 100["1", "TN", 2] : 120["1", "MH", 7] : 150["1", "MH", 2] : 100["0", "India", 7] : 150["0", "India", 2] : 220
}
TN-TamilNadu is a state It gives month-wise aggregates.(The third key is the month 7-July,2-Feb etc. The second keyis the label for the state)
22 Chapter 2. Design and Technical Documents
Mangrove Documentation, Release 0.1
At level 1 - it gives totals for all months:
{["2", "Pune"] : 330["1", "TN"] : 320["1", "MH"] : 330["0", "India"] : 650
}
2.8 Setting up the ‘DataWinners’ Web App
• Pre-requisites:
1. Install python 2.7 apt-get install python2.7
2. Install couchdb apt-get install couchdb
3. Install python2.7-dev apt-get install python2.7-dev
4. Install subversion (SVN) apt-get install subversion
5. Install virtualenv apt-get install virtualenv
6. Install python-setuptools apt-get install python-setuptools
• Environment Setup:
1. Create virtual environment virtualenv --no-site-packages --python=python2.7<foldername>
2. Go inside folder <foldername> cd <foldername>
3. Clone git repository git clone https://github.com/mangroveorg/mangrove.git
4. Go to mangrove folder cd mangrove
5. Switch to develop branch git checkout develop
6. Check the status git status
7. Go out of folder <foldername> cd ../..
8. Run requirement.pip file pip install -E <foldername> -r <foldername>/mangrove/requirements.pip
• Execute Environment:
1. Activate virtual environment source <foldername>/bin/activate
2. Run server python <foldername>/mangrove/src/web/manage.py runserver
• Access URLs:
1. Website URL: http://localhost:8000/login
2. Couchdb URL: http://localhost:5984/_utils
2.9 Setting Up the POSTGIST and importing location data
In order to get postgis configured and import location data - this is what you need to do:
• Install Spatial Database PostgreSQL 8.4 (with PostGIS 1.5),
• Install Geospatial Libraries¶
2.8. Setting up the ‘DataWinners’ Web App 23
Mangrove Documentation, Release 0.1
• Create Spatial Database Template for PostGIS
• Create spatial database using the template
• Import the shape files in the spatial database
2.9.1 Details
• Install Spatial Database PostgreSQL (with PostGIS) and Geospatial Libraries
$ sudo apt-get install postgresql$ sudo apt-get install binutils gdal-bin postgresql-8.4-postgis \postgresql-server-dev-8.4 python-psycopg2 python-setuptools
for details visit the url https://help.ubuntu.com/community/PostgreSQL
• Create Spatial Database Template for PostGIS:
follow the instructions from the url: https://docs.djangoproject.com/en/dev/ref/contrib/gis/install/#spatialdb-templateuse Debian/Ubuntu create_template_postgis-debian.sh
• Create spatial database using the template:
follow the instructions from the url: https://docs.djangoproject.com/en/dev/ref/contrib/gis/tutorial/#setting-up$ createdb -T template_postgis geodjango
• Import the shape files in the spatial database:
Clone the [email protected]:mangroveorg/shape_files.git to a folder which is at the same level as the mangrove repository.E.g. /home/user/code/mangrove and /home/user/code/shape_filesrun python manage.py loadshapes
24 Chapter 2. Design and Technical Documents
CHAPTER
THREE
INDICES AND TABLES
• genindex
• modindex
• search
25