Final project - SJTU
Transcript of Final project - SJTU
Final project
Group members:Xinzhe CaoFan zhouZheyu HuaHanwen Liu
June 23, 2018
1
Contents
1 Brief introduction of our website 31.1 The composition of the website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 The main functions of the website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Main algorithm of the website 42.1 Search engine design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Recommand algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Data Visualization 63.1 Force-Directed-Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 3-step Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 The database improvemnets 94.1 Elasticsearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.1.2 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.1.3 Request Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.1.4 Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.1.5 Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.1.6 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.1.7 Kibana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 The improvement of our sql database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 The UI designing 135.1 Home page designing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 navigation parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2.1 Top navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2.2 side navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Other pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2
1 Brief introduction of our website
1.1 The composition of the website
Home page
– Provide a search engine to search the title, author and venue.It would direct to these page.
∗ result page for author
∗ result page for title
∗ Conference page
Author page
– Provide two parts of information of the target author.
∗ Papers of the authors
∗ Relaionship graph of the author
Paper page
– Include two parts of information
∗ Detail information of the paper
∗ A recommandation of the paper
Conference page
– Include two parts of the information
∗ Brief introduction of the conference
∗ All the papers of the conference
Renference page
– Include two parts of the information
∗ Detail information of the paper
∗ Reports cited that target paper
1.2 The main functions of the website
1 The searching engine of the authorpaper and conference.
– At the home page, the search box is provided for searching the different information of theauthors.
– We are trying different direction for the improvement of the website response speeds, and byapplying the elesticsearch as well as some sql improvements we success to limit the time lessthat about 1s for each page.
2 Complete information as well as hyperlinks between each pages for better information searching.
– We arrange the detail information for each papers including the papers’ title, year, conference,affilation , and all the author ordered by their author sequence. As for these title, author,conference, we can click it to get into others pages.
3 Visualization of data for more useful presentation the information of the author.
– A force image is provided in the author page to present the relationship of the authors and hiscooperators. We use the softmax regression to predict the relationship of these cooperators,and draw the graphes by apllying the method of d3.js.
3
2 Main algorithm of the website
2.1 Search engine design
In order to satisfy the need of different search direction, I use the option caption.
1 <select name="catid" id="catid_1" class="selectpicker form-control"
onchange="show_or_hide(this.value)">
2 <option value="result.php">Author</option>
3 <option value="title.php">Title</option>
4 <option value="conference.php">Conference</option>
5 </select>
At this time, I use the onchange function to gain the real time selection of the option.for this onchange function, it could change the link of the form and desiging different autocompletefunction for it.
1 <script type="text/javascript">
2 function show_or_hide(v) {
3 if(v=="result.php") {
4 $(function(){
5 $("#key").autocomplete({
6 source: "search.php",
7 minLength: 2,
8 autoFill: true
9 });
10 });
11 }
12 ....
13 }
14 </script>
Also, we attach a function selectionAction() to the search button, in this function we get the value ofthe option at the real time, and change the link in turn. So that we could use different type of searchby our selection.
1 html::
2 <button class="btn btn-info" style="background-color: white" aria-label="Left Align"
onclick="selectAction();">
3 javascript
4 function selectAction() {
5 var url="http://localhost/begin/";
6 var selector = document.getElementById("catid_1");
7 var theForm = document.getElementById("head_sh");
8 var checkValue = selector.options[selector.selectedIndex].value;
9
10 theForm.action = url + checkValue;
11 theForm.submit();
12 }
2.2 Recommand algorithm
Another function of our page is to recommend papers.It is very common that in google scholar or otherscholar website, we can find this recommend part.So we also add this function in our website. The keyto solve this problem is to think of an algorithm to recommend relative paper according to the paperusers are viewing.
4
Table 1: recommend papers in baidu scholar
5
We dicide that our page will mainly use 3 types of papers:
• The author’s other papers
• other papers cited
• other papers namely alike
And we can sort these papers, in order to show the most influential papers. Here we use cited times ofeach paper and thus we can use python to do this part of job.
1 while True:
2 line = f.readline().strip(’\n’)
3 if line:
4 list = line.split(’\t’)
5 exestr="""select PaperID,count(*) from paper_reference where
paper_reference.PaperID=’%s’ group by PaperID""" %list[0]
6 cursor.execute(exestr)
7 db.commit()
8 a=cursor.fetchall()
9 exestr="""update papers set papers.ReferenceTime=%d where papers.PaperID=’"""
%a[0][1]+a[0][0]+"""’"""
10 cursor.execute(exestr)
11 db.commit()
12 else:
13 break
And then use either SQL or elasticsearch. In the paperpage, according to the paperID, we then searchand locate the top5 or 10 papers in the result list and output the information in the page.This part isvery similar to the UI and page design.
3 Data Visualization
3.1 Force-Directed-Graph
So Force-Directed-Graph is actually an algorithm whose purpose is to position the nodes of a graph intwo-dimensional or three-dimensional space so that all the edges are of more or less equal length andthere are as few crossing edges as possible, by assigning forces among the set of edges and the set ofnodes, based on their relative positions, and then using these forces either to simulate the motion of theedges and nodes or to minimize their energy.
Here are some examples of the Force-Directed-Graph.
Table 2: result page1
6
3.1.1 3-step Process
In our project, the FDG is used to show the relationship between various academic cooperaters in authorpage. We want users to view clearly the relationship between the author and his/her cooperater.
To establish a Force-Directed-Graph, there are mainly 3 steps.
• Feed data to nodes and lines in json form
• Decide the Canvas size and other basic constant
• Add more detail to optimize visualization
So in our page, the relationship is predicted by Back-end data.And we already store such data usingPython machine learning method.And we can set this aside.
The graph is mainly about nodes and lines,so in php, we should give definition of such two types.
1 var link = svg.append("g")
2 .attr("class", "links")
3 .selectAll("line")
4 .data(graph.links)
5 .enter().append("line")
6 .attr("stroke-width", function(d) { return Math.sqrt(d.value); });
7
8 var node = svg.append("g")
9 .attr("class", "nodes")
10 .selectAll("circle")
11 .data(graph.nodes)
12 .enter().append("circle")
13 .attr("r", 5)
14 .attr("fill", function(d) { return color(d.group); })
15 .call(d3.drag()
16 .on("start", dragstarted)
17 .on("drag", dragged)
18 .on("end", dragended))
Next we should set some necessary things to the graph.
1 function isConnected(a, b) {
2 return linkedByIndex[a.index + "," + b.index] || linkedByIndex[b.index + "," + a.index] ||
a.index == b.index;
3 }
4
5 node.append("title")
6 .text(function(d) { return d.id; });
7
8 simulation
9 .nodes(graph.nodes)
10 .on("tick", ticked);
11
12
13
14 simulation.force("link")
15 .links(graph.links);
16
17 function ticked() {
18 link
19 .attr("x1", function(d) { return d.source.x; })
20 .attr("y1", function(d) { return d.source.y; })
7
21 .attr("x2", function(d) { return d.target.x; })
22 .attr("y2", function(d) { return d.target.y; });
23 node
24 .attr("cx", function(d) { return d.x; })
25 .attr("cy", function(d) { return d.y; });
26 }
27
28
29 }
30 });
31
32
33 function dragstarted(d) {
34 if (!d3.event.active) simulation.alphaTarget(0.3).restart();
35 d.fx = d.x;
36 d.fy = d.y;
37 }
38 function drag(){
39 return force.drag()
40 .on("dragstart",function(d){
41 d3.event.sourceEvent.stopPropagination();
42 d.fixed=true;
43 });
44 }
45 function dragged(d) {
46 d.fx = d3.event.x;
47 d.fy = d3.event.y;
48 }
49
50 function dragended(d) {
51 if (!d3.event.active) simulation.alphaTarget(0);
52 d.fx = null;
53 d.fy = null;
54 }
The above sentences ensure that a FDG is then established.But the graph is still very plain, to sortauthor cooperate types and their relationship more clearly, we can add more style to the ndoes anddefine some extra function in this graph.
1 .on("mouseover",function(d,i){
2 link.style("stroke-width",function(edge){
3 if (edge.source===d || edge.target===d){return "2px";}
4 else {
5 return "0.5px";}
6 })
7
8 .style("stroke",function(edge){
9 if (edge.source===d||edge.target===d){
10 return "#000";}
11 });
12
13 node.append("title").text(function(d){return d.group;});
14
15
16 })
17
18 .on("mouseout",function(d,i){
19 link.style("stroke-width",function(edge){
20 if (edge.source===d ||edge.target===d){return "2px";}
21 else{return "2px";}
8
22 }).style("stroke",function(edge){
23 if (edge.source===d ||edge.target===d){return d.value;}
24 })
25
26 });
The above are part of the codes to realize the function that when users moves the mouse over one nodes,it will stroke and others would fade in order to emphasize and make the current node stnad out. Soour final Force-Directed-Graph looks like the picture showed below, and we can drag it to analyze itsstructure.
Table 3: Force-Directed-Graph results
4 The database improvemnets
4.1 Elasticsearch
Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you tostore, search, and analyze big volumes of data quickly and in near real time. It is generally usedas the underlying engine/technology that powers applications that have complex search features andrequirements.
4.1.1 Installation
I first installed the following tools to build the environment:
Java, Elasticsearch, Composer and curl.
Then we can go and search.
4.1.2 Searching
Elasticsearch provides us with two kinds of formulas.One is the request body, the other one is the request URI.
4.1.3 Request Body
Elasticsearch provides a JSON-style domain-specific language that you can use to execute queries.
9
1 Qbody = {
2 "query": {
3 "match": {
4 "Title": "home" # Title home
5 }
6 }
7 }
• Request URIThis way is brief and suitable for our needs.
1 $value = "localhost:9200/hwtry/paper/_search?q=Title:*home*&size=100";
1 $value = "localhost:9200/hwtry/paper/_search?q=Title:a%20a&size=100";
• Fuzzy SearchWe use * to instead of the missing letters to achieve fuzzy search.
• More ConditionWe use & to search with more than one condition. In my code, it fetch data and limit it less than100 columns. It promote the efficiency and speed.
4.1.4 Space
One thing haunted me for several days. That is, how to search with a space.
• InspirationFortunately, when I was surfing the internet, Zhihu inspired me. You can see the %20 in the picture.
• FoundNot only Zhihu, I found that many other websites like Baidu & Wekipedia also use Elasticsearch.
4.1.5 Codes
• Python-Create DatabaseI use python to create the Database.
1 action = {
2 "_index": "hwtry",
3 "_type": "paper",
4 "_id": i,
5 "_source": {
6 "paperID": paper[0],
10
7 "Title": paper[1:-2][0],
8 "PaperPublishYear": paper[-2].rstrip("\n"),
9 "ConferenceID": paper[-1].rstrip("\n")
10 }
11 }
To do such thing, it only needs 7 seconds.
• PHP-Search
1 $client = Elasticsearch\ClientBuilder::create()->setHosts([’localhost:9200’])->build();
2
3 $params[’index’] = ’hwtry’;
4 $params[’type’] = ’paper’;
5 $params[’body’][’query’][’match’][’Title’] = ’based on’
6 $results = $client->search($params);
Besides the basic data we need, it can also give us many other details like those ones. The shards,I want to emphasize to you, its the great thing of Elasticsearch.
1 # took
2 # timed_out
3 # _shards
4 # hints.total
5 # hints.max_score
4.1.6 Comparison
As you can see, the original database needs 3.8 seconds, while with the new one, we cut down the timeto 0.56 second. However, with Elasticsearch the number is 0.08, about 2 percent of the original one.
11
4.1.7 Kibana
The last one I want to introduce to u is Kibana, which enhanced our interaction with Elasticsearch.
12
4.2 The improvement of our sql database
In order to satisfy the respond speed requirement, our group try a new database to store these groupsof data.
old table new table
affiliations new new paperauthors new paperconference new authorpaper author affiliationpaper referencepapers
In our new database, we need only three database. The convenience is that we need not to use some joinand union operation thus we could greatly simplify the program.
Then we introduce these different databases.
1 new new paper
– To statisfy our website-paper’s requirement, these database are intended with fields:PaperID,Title,CitedNum,AffiliationName,PublishYear,ConferenceID,ConferenceName.It could satisfy three needs.
1 Could satisfy the vague searching of the title.
2 Could give the detail information of certain paper.
3 Could provide the citation index of the paper.
2 new paper
– It’s a inherited chart from the new new database, for it contain the all the author of onecertain paper. So we could quick get the result and list it by the author sequence.
3 new author
This database is designed for the authors’ vague search. It contain these fields:AuthorID,AuthorName,PaperNum,MainAffiliationSo it can easily satisfy the need of our searching of the authors.
At last, our website equiping these improved database could reach the speed at within 1 second forresponse. It’s a quite big improvements.
5 The UI designing
5.1 Home page designing
I laid a background picture under the search bar. Since scandals of academic cheating have been springingup those days, I finally chose a picture of mountains. In Chinese culture(and maybe many other cultures),mountains can hardly be removed and therefore they represent permanence. I intended to convey throughthe page that scholars should stick to their heart and not be moved by the outer environment. As forour students, the background picture is also a reminder that we should keep to be righteous and honestin our daily studies.
Below the search bar is some information about IEEE and ACM, the two largest associations incomputer science and electronic engineering. The brief introduction adds to the practicality of our page.At the bottom of the home page is a footer, where we have put our logo. If possible, we can also addmore information such as names and contact details of our team. There is also a side bar which can be
13
activated by the icon in the top right corner of the home page. You can be easily navigated to the restof the pages through the bar.
Home
5.2 navigation parts
As for a website, a good navigation is good for their visitor to get to target place.
14
5.2.1 Top navigation
So, I design a navigation at the topside for the website.
As it shows, it contain several hyperlink to different page, and with a search bar for us to search targetinformation.
5.2.2 side navigation
At this time, I use a bootstrap plug-in affix.js to make a side navigation.
we could use this navigation to tell the vistor what the information the website would give, also it helpthe visitor locating the target information more swiftly.What we may add in the future is adding the title of the paper or the key words of these papers,which may give the visiter more help.
15
5.3 Other pages
As for the other pages, I keep the idea of the simple ,highly effcient and easy to use. So I design theother pages in a relativey simple but with enough information.
Result of the Author search
Author
16
Paper
17
Conference
18