Titan DB setup with Cassandra and Elastic search

Titan DB with Cassandra and Elastic search Setup

What is titan db ?

         Titan db is a scalable Graph and Transactional database. which can be optimized to store and querying graph data's using complex graph traversal with thousands of concurrent user access.
It is using Cassandra, HBase  and Oracle Berkeley DB as back end storage.

Titan DB Features:
  1. Elastic and linear scalability for a growing data and user base
  2. It uses data monitoring and replication techniques for avoid fault tolerance
  3. It support multiple data center and increase high data availability
  4. It support graph data analytic's with integration on some other technologies.
  5. It support high data search with the help of Elastic search or solr or lucene
  6. It support ACID and Eventual consistency.
Use below steps to install and setup titan with cassandra. 

Step 1 : 
   
       Download Apache Cassandra : Apache cassandra
      
       Download Titan DB :  titan-db

both files are download in zip format. Need to extract in one location

Step 2 :

     Extract both downloads in one location. like /home/ramakavanan/titan-db and /home/ramakavanan/cassandra.

Step 3 : 
    
   Setup Cassandra :

           If you have  installed cassandra from rpm or deb, then it have the correct permission for write logs and data storage directory. Other wise, we should provide the permission manually.  

           So under cassandra folder have conf (/home/ramakavanan/cassandra/conf) folder, In that we have the cassandra.yaml file. Which has the cassandra log , data and cache setup. So we can change it based on our needs. Here we need to check and make sure these directories are exist and can be written.


data_file_directories  (/home/ramakavanan/cassandra/data)
commitlog_directory (/home/ramakavanan/cassandra/commitlog) 
saved_caches_directory (/home/ramakavanan/cassandra/saved_caches)

By default, Cassandra will write its logs in ${cassandra.logdir}/logs which means /home/ramakavanan/cassandra/logs . Make sure this directory exists and is writable, or change this line in conf/log4j-server.properies:

log4j.appender.R.File=/var/log/cassandra/logs/system.log

Note:  that in Cassandra 2.1+, the logger in use is logback, so change this logging directory in your conf/logback.xml file such as:


<file>/var/log/cassandra/system.log</file>

JVM-level settings such as heap size can be set in conf/cassandra-env.sh.

   Start Cassandra:
           Start up Cassandra by invoking bin/cassandra -f from the command line1. The service should start in the foreground and log gratuitously to the console. Assuming you don’t see messages with scary words like “error”, or “fatal”, or anything that looks like a Java stack trace, then everything should be working.

Press Control-C to stop Cassandra

Step 4: Setup Titan-DB
    We can setup cassandra with titan in two ways.

    1 .  By Default , inbuilt cassandra and cassandrathrift – DB available with titan, where it will run the node tool and elastic search(for index storage back end db) automatically .

    We can using default titan configuration, start up titan by invoking bin/sh titan.sh start from command line. It forking cassandra, node tool and start elastic search server.

Forking Cassandra
Running `nodetool statusthrift`. OK (returned exit status 0 and printed string running).
Forking Elasticsearch
Connecting to Elasticsearch (127.0.0.1:9300). OK (connected to 127.0.0.1:9300).
Forking Gremlin-Server
Connecting to Gremlin-Server (127.0.0.1:8182). OK (connected to 127.0.0.1:8182).
Run gremlin.sh to connect.

   Then we can connect gremlin server by invoking bin/gremlin.sh.



\,,,/
(o o)
—–oOOo-(3)-oOOo—–
plugin activated: aurelius.titan
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
11:05:19 INFO  org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph   HADOOP_GREMLIN_LIBS is set to: /var/lib/titan/titan-1.0.0-hadoop1/lib
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.tinkergraph

gremlin>

Note : Open titan present root folder path and then provide the above command

  2. Second way to config Cassandra with titan :

              go to titan conf folder and then find the file titan-cassandra-es.properties (conf/titan-cassandra-es.properties) and Edit as the following properties.



gremlin.graph=com.thinkaurelius.titan.core.TitanFactory
# Other values: cassandrathrift, astyanax (synonym: cassandra), embeddedcassandra, inmemory
storage.backend=cassandra
storage.hostname=127.0.0.1

# Index backend
index.search.backend=elasticsearch
index.search.directory=/tmp/ramakavanan/es
index.search.elasticsearch.local-mode=true
index.search.elasticsearch.client-only=false

    then start cassandra. 
         go to cassandra root path upto bin folder and execute cassandra -f comand
   then start Nodetool
        go to cassandra root path upto bin folder and execute nodetool statusthrift command. This command return whether this nodetool running or not. If not running means we should start that server using nodetool enablethrift
  then start Elastic search server if you required.

Note : Dont need to start titan db server.

Check if working or not using gremlin server.

now connect gremlin server from titan db. go to titan db root path and execute the following command bin/sh gremlin.sh


\,,,/
(o o)
—–oOOo-(3)-oOOo—–
plugin activated: aurelius.titan
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
11:05:19 INFO  org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph   HADOOP_GREMLIN_LIBS is set to: /var/lib/titan/titan-1.0.0-hadoop1/lib
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.tinkergraph

gremlin>

then open titan properties to connect graph database using follow command

gremlin> g = TitanFactory.open(conf/titan-cassandra-es.properties)
==>standardtitangraph[cassandra:[127.0.0.1]]
now cassandra connected. Then we can create the vertex in gremlin server and check whether it is stored or not.

Check below commands in gremlin server to show how to create vertices and edge in titan db


gremlin> g1 = g.traversal()
==>graphtraversalsource[standardtitangraph[cassandra:[127.0.0.1]], standard]
gremlin> g1.V()
gremlin> g1.V().count()
==>0
Creatin vertices
gremlin> v1 = g.addVertex(T.label, person, name, rama, age, 25)
gremlin> v2 = g.addVertex(T.label, software, name, lop, lang, java)
 
Create An edge between above two created vertices:-
gremlin> v1.addEdge(created, v2, weight, 63)
==e[2rl-360-4r9-38g][4104-created-;4192]
 
gremlin> g1.V().has(name,rama).values(name)
==rama
gremlin> g1.V().has(name,rama)
==v[4104]

Note : we can format the gremlin server output as JSON or XML

Checking in Cassandra :

We can check if this changes are affected in cassandra means, just open the cassandra logs and look on that. It produces some keys and values for that vertices and edges we created in titan through gremlin server.
we can check through cassandra keyspace created. If you want to look keyspaces created in cassandra means, in cassandra console execute the following commands and look at the outputs.



cqlsh:titan> describe keyspaces;

titan          system_auth  mykeyspace          system_traces
system_schema  system       system_distributed

cqlsh:>use titan;

cqlsh:titan>
cqlsh:titan> describe tables;

cqlsh:titan> select * from titan_ids;

key                | column1                                                                                              | value
——————–+——————————————————————————————————+——-
0x0000000000000003 | 0xfffffffffffec77f000535233381db083766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 |    0x
0x6000000000000003 | 0xfffffffffffec77f0005352337d4aac83766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 |    0x
0x6000000000000000 | 0xffffffffffffd8ef0005352337cf82783766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 |    0x
0x0000000000000004 | 0xffffffffffffff9b00053523337cc2583766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 |    0x
0x0000000000000004 | 0xffffffffffffffcd0005352333779a083766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 |    0x
0x0800000000000000 | 0xffffffffffffd8ef000535233387b7083766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 |    0x
0x0800000000000003 | 0xfffffffffffec77f00053523338cd7883766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 |    0x

Note : Along with this technologies we can use Redis server cache to make our application as too fast and optimized.

          

Comments

Popular posts from this blog

Pyhton auto post to blogger using Google blogger API

Connect VPN via Python

Website crawl or scraping with selenium and python