Optimal insert methods of geographical information to Spatio-temporal DB

Optimal insert methods of Optimal insert methods of geographical information to geographical information to Spatio-temporal DBSpatio-temporal DB

Final Presentation

Industrial Project 234313

June 17,2012

Students: Michael Tsalenko & Belal KabatSupervisor: Flora Gilboa

IntroductionIntroductionResearch ways of inserting and

querying geographical information related to video represented as polygons or circles into geographic DB.

GoalsGoalsFinding the optimal way of

inserting Geo-data by determining the best representation of the captured polygon.

What representation is the optimal when querying the Data Base.

Determine whether amount of vertices of the polygon affect the querying efficiency.

Methodology: Main StepsMethodology: Main StepsData creationReading data into memory (from file

system).Measuring insertion times.Measuring querying times.Adding tables with index using R-

tree.Parallelizing insertion and querying.Statistical examination of the results.

Methodology-Methodology-Data creation Data creation (1):(1): We chose 10 different locations, and

randomly created 10,000 quad polygons around

each location. Our data setconsists of 100,000 polygons in total.

Methodology-Methodology-Data creation Data creation (2):(2):

Snapshot of the random data

A closer look at Haifa

Methodology-Methodology-Data creation Data creation (3):(3): Intuitively circle data manipulation

should be more efficient than quad data as it has less fields (3 vs. 8).

So we bounded each quad of the 100,000 with a circle and later improved it to a minimal bounding circle (to compare between different data representations of same locations).

(32.75762277268422, 34.97107551062562),(32.660576514078066, 34.97748364184535),(32.828346159124294, 35.086360632091974),(32.91352284093535, 35.023992366269674)

((32.78704967750671, 35.028718071358796),(0.13898966103824473))

Methodology- Reading into Methodology- Reading into memorymemoryIn order to improve the insertion

time we have created a pool which included all the data, so when we insert the data to the data-base we read immediately from the pool and not from the files, to make as less operations as possible after the connection to the server was made.

Methodology- Measuring Methodology- Measuring insertion time.insertion time.We have created 2 tables, one

for quad polygons and one for circles.For instance, the circles table:

id Asset name

Area

1

2

3

Circle#1

Circle#2

Circle#3

((32.7870496671, 35.02871358796), (0.138989664473))((29.927900734077, 31.2508816016), (0.27799441977))

((33.87026129199, 35.5400182072), (0.2372748702324))

Methodology- Measuring Methodology- Measuring insertion and querying time.insertion and querying time.We started to insert 100,000 objects

to each table, but after few hours of running we stopped the process (single thread), and reduced the data to 10,000 objects, the difference between the times of both insertion and querying was not significant, so we decided to check with indexed tables* for both circles and polygons.

(and started to plan parallelizing the research due to the 100,000 objects insertion inefficiency)

Methodology- Measuring Methodology- Measuring insertion time.insertion time. Indexed tables:

Are tables which are built using R-tree Data structure, which is a tree data structure used for spatial access methods, similar to the B-tree, R-tree is also a balanced search tree, so allleaf nodes at thesame height.

Methodology- Measuring Methodology- Measuring insertion time.insertion time.The difference between the results of the

querying and insertion to indexed and non-indexed tables were not significant:

(*) Hybrid circle in polygon means the query was to select a circle intersections in polygons table, and Hybrid polygon in circle means the opposite.

Parallelizing inserting.Parallelizing inserting.

We created 10 threads, each thread inserted 10,000 objects to each one of the 2 indexed tables (quads, circles).

This parallelizing enabled us to insert 100,000 objects in a reasonable time (~15mins instead of more than 2 hours when single thread).

We stopped inserting into the non-indexed tables since we saw that indexed tables give better results.

We can see that the circles insertion is better than quads insertion by ~13%.

Parallelizing querying.Parallelizing querying.We created 10 threads, each thread

inserted 10,000 object to each one of the 4 tables (quads, circles)

The difference between the results of the tables were not significant.

Methodology- vertices Methodology- vertices amountamountIn order to check if the number of vertices in a

polygon makes a difference, we have converted each polygon to an octagon in the following way:We added a vertex in the middle of each edge of original quadrate and added a factor of 0.001, which will prevent DB vendor optimizations

Methodology- vertices Methodology- vertices amountamount

Insertion results:

We can see that the insertion to octagon is worse than to the quad polygons by ~7% and to the cycles by ~18%

Methodology- vertices Methodology- vertices amountamountQueries results:

The results show that there is no significant difference between the querying to the tables. But it may be expected that on average intersecting a circle with quad polygons geo-data yields the best results.

Achievements & Achievements & Conclusions:Conclusions:We helped the IBM researchers to

decide that the best way to represent the geo-data (for both insertion and querying) is just to keep its original representation (no mistake areas) – as long as it is up to 8 vertices (or a circle), and if it has more than 8 vertices – further research is needed.

There is no need to look for other heuristics for data manipulation (bounding or bounded circles, dividing into several geo-objects, etc…).

Achievements & Achievements & Conclusions:Conclusions:We learned about:

◦The importance of making a good work plan – founding the basics, ensure all works and building on them the needed extensions.

◦How the indexed tables work◦Using JDBC with geodetic extension ◦Statistical tools and analyzing results◦It’s very helpful to have a supervisor

who is an expert in the project field and can direct us properly and save lots of time.

Optimal insert methods of geographical information to Spatio-temporal DB

Documents

Transcript of Optimal insert methods of geographical information to Spatio-temporal DB