Distributed Architecture
Horizontal scaling through NOSQL service
Horizontal scaling is a strategy that CYBERQUEST users can use to enhance the performance of the server node by adding more instances of the server to your existing pool of servers so the load can be equally distributed. In horizontal scaling, the capacity of the individual server is not changed, but the load on the server is decreased. Horizontal scalability is achieved with the help of a distributed file system, clustering, and load balancing. Some of the reasons why businesses choose to scale horizontally include an increase in their I/O concurrency, need to reduce the load on existing nodes and to expand disk capacity. Horizontal-scaling is considerably easy as you can add more machines to the existing pool. It follows the partitioning of the data in which each node contains only one part of the data. For the NOSQL sevice that ensures horizontal scaling, CYBERQUEST uses ElasticSearch.
Introduction to ElasticSearch Clustering
ElasticSearch is built to be always available, and to scale with your needs. Scaling can come from deploying more performant servers, especially on CPU and memory considerations (vertical scale, or scaling up) or from deploying more servers (horizontal scale, or scaling out). While ElasticSearch can benefit from more powerful hardware, vertical scale has its limits. Real scalability comes from horizontal scale, the ability to add more nodes to the cluster and to spread load and reliability between them. With most databases, scaling horizontally usually requires a major overhaul of your application to take advantage of these extra boxes. By contrast, ElasticSearch is distributed by nature: it knows how to manage multiple nodes to provide scale and high availability. This also means that your application doesn't need to care about it. A node is a running instance of ElasticSearch, while a cluster consists of one or more nodes in the same cluster.name, that are working together to share their data and workload. As nodes are added to or removed from cluster, the cluster reorganizes itself to spread the data evenly.
Advantages of ElasticSearch clusters are:
-
Distributed data: In cluster data is distributed, replicated to another server. So, in case of failure of one node, data can be restored from replica node. It avoids single point of failure.
-
Dedicated node roles: Every node has dedicated role assigned to it, that ensures specific role and role-based load distribution hence increasing performance. Here are two important node roles
-
Data node: These nodes only store data and do data related operations, search and data manipulation.
-
Master node: Master of all nodes, it holds responsibility of overall cluster, addition and removal of nodes from cluster, keeping track of alive nodes, master reselection in appropriate cases.
-
Scalability: Cluster model is easily scalable to multiple no of nodes, thus increasing performance and reliability of ElasticSearch.
One node in cluster is elected to be the master node, being in charge of managing cluster-wide changes like creating/deleting an index or adding/removing a node from the cluster. The master node does not need to be involved in document-level changes or searches, which means that having just one master node will not become a bottleneck as traffic grows. Any node can become the master.
Every node knows where each document lives and can forward our request directly to nodes that hold the data we are interested in. Whichever node we talk to manages the process of gathering a response from the node or nodes holding data and returning the final response to client. All this is managed transparently by ElasticSearch.
CYBERQUEST takes advantage of this technology so whether the underlying database is clustered or single node deployment, no additional configuration of CYBERQUEST is required.
Checking cluster health
CYBERQUEST comes installed with the ElasticSearch Cerebro service for a visual representation of the database, by accessing the CYBERQUEST IP address on port 9000 via a web browser.
The results are:
for a single node installation, or:
for two or more clustered nodes.
Adding cluster nodes
Out-of-the-box, ElasticSearch is configured to use unicast discovery to prevent nodes from accidentally joining a cluster. Only nodes running on the same machine will automatically form a cluster. To use unicast, you provide ElasticSearch with a list of nodes that it should try to contact. When a node contacts a member of the unicast list, it receives a full cluster state that lists all the nodes in cluster. It then contacts the master and joins the cluster.
This means your unicast list does not need to include all the nodes in your cluster. It just needs enough nodes that a new node can find someone to talk to. If you use dedicated masters, just list your three dedicated masters and call it a day. This setting is defined in elasticsearch.yml configuration file:
discovery.zen.ping.unicast.hosts:
["OtherEasticSearchHost1","OtherEasticSearchHost2"]
When finished, save and restart ElasticSearch service:
systemctl restart elasticsearch.service
Additional ElasticSearch documentation
Additional database documentation can be found here: https://www.elastic.co/guide/en/elasticsearch/guide/master/index.html
Horizontal scaling on MariaDB
Introduction
CYBERQUEST uses the MariaDB database for solution configuration data.
For High Availability architectures the MariaDB nodes are configured in a cluster.
The purpose of this document is to technically describe the MariaDB cluster configuration process. For the clustering we use MariaDB Galera Cluster.
MariaDB cluster configuration
1) Network interface configuration
Make sure there are fixed ip's on all nodes
Ifdown ens192
Ifup ens192
Add in /etc/hosts node records (on all nodes, in this case db1 and db2) nano /etc/hosts
nano /etc/hosts
2) Configuring the 60-galera.conf file
The clustering configuration file 60-galera.cnf is node specific and is created manually - if it is not in /etc/mysql/mariadb.conf.d/
- Configuration file for machine 192.168.200.139 (db1)
# * Galera-related settings
#
# See the examples of server wsrep.cnf files in /usr/share/mysql
# and read more at https://mariadb.com/kb/en/galera-cluster/
[galera]
# Mandatory settings
#wsrep_on = ON
#wsrep_provider =
#wsrep_cluster_address =
#binlog_format = row
#default_storage_engine = InnoDB
#innodb_autoinc_lock_mode = 2
# Allow server to accept connections on all interfaces.
#bind-address = 0.0.0.0
# Optional settings
#wsrep_slave_threads = 1
#innodb_flush_log_at_trx_commit = 0
[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
# Galera Cluster Configuration
wsrep_cluster_name="CQ_CLUSTER"
wsrep_cluster_address="gcomm://192.168.200.139,192.168.200.140"
# Galera Synchronization Configuration
wsrep_sst_method=rsync
# Galera Node Configuration
wsrep_node_address="192.168.200.139"
wsrep_node_name="db1"
- Configuration file for machine 192.168.200.140 (db2)
# * Galera-related settings
#
# See the examples of server wsrep.cnf files in /usr/share/mysql
# and read more at https://mariadb.com/kb/en/galera-cluster/
[galera]
# Mandatory settings
#wsrep_on = ON
#wsrep_provider =
#wsrep_cluster_address =
#binlog_format = row
#default_storage_engine = InnoDB
#innodb_autoinc_lock_mode = 2
# Allow server to accept connections on all interfaces.
#bind-address = 0.0.0.0
# Optional settings
#wsrep_slave_threads = 1
#innodb_flush_log_at_trx_commit = 0
[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
# Galera Cluster Configuration
wsrep_cluster_name="CQ_CLUSTER"
wsrep_cluster_address="gcomm://192.168.200.139,192.168.200.140"
# Galera Synchronization Configuration
wsrep_sst_method=rsync
# Galera Node Configuration
wsrep_node_address="192.168.200.140"
wsrep_node_name="db2"
3) Final configuration
- Copy the configuration file to /home/superadmin on both nodes
- Mariadb services stop on both nodes
- Copy the configuration file to each node
sudo cp /home/superadmin/60-galera.cnf /etc/mysql/mariadb.conf.d/
Checking the cluster
1) Checking configurations
Check the configuration in the file (on each node)
Bootstrap new cluster (on db1)
sudo galera_new_cluster
Start the service on db2 (which is already configured):
sudo systemctl start mariadb.service
Check that the database statement has been received
2) GUI checking
Check replication: when accessing the web interface, the configurations must be identical for the two CYBERQUEST machines.
db1:
db2:
You can see the identical configuration of dashgroups: Events, Users, Asset Dashboards.