Thursday, 17 July 2014

MaxScale - from proxy to MySQL replication relay

During the first part of the year I spent a lot of time working on a proof of concept to use MaxScale as a way to distribute MySQL binlogs for large replications installations. I have to admit when I first heard the idea from Booking.com my reaction was - "MaxScale is a proxy for client applications, it can't do this". However I was wrong, proving that making versatile, configurable software can throw up surprises even for the people that design it.

The Problem

There have been posts elsewhere about the problem this is trying to solve, so I will not go into too much detail. Suffice to say that with large numbers of slaves connected to a single master the load on the master becomes too high, using intermediate relay servers causes other issues because of the way MySQL replication re-executes the statements on the relay server and then sends the binlog records for that re-executed SQL rather than the original binlog records.

MaxScale's Structure

As followers of MaxScale are probably bored of hearing by now MaxScale is built as a general purpose core that provides support facilities for a proxy and a number of plugins of different types. Hence the original idea of using it as the basis for a replication relay service came about. My problem was how to fit replication into something that was designed to act as a proxy for database applications and forward queries from those applications to a backend server. The most obvious feature that needs to be utilised is the query router within MaxScale. Normally these routers take requests in and forward them to one of a number of database servers, very much a push model of interaction. In replication however the slave servers each register with the master and then the master will stream changes, in the shape of binlog records, to the slaves - not the classical request/response model.

The conventional structure of a query router plugin within MaxScale is to have an instance of the router per service and that each client that connects is allocated a new session within the router. Each request that then arrives at the router for a given session is routed to an outbound server using rules in the router. Replies back from that backend server are sent to the client connection that was opened when the session was created. Replication calls for a somewhat different structure however.

In replication we need the router module to register with the master server and request binlog records from the master. This should be done once, and should probably not be in response to any event that comes in from the clients of MaxScale - in this case the clients are the slave servers. The other difference is that we don't get a single binlog record, the equivalent of a response in a more traditional router, and return it to the single client. Rather we may send it nowhere or to multiple connections, it depends how many slave servers we have attached and the current binlog positions of those slaves servers. We also may need to send this record to a slave at some undetermined time in the future, if the slave is lagging behind or not connected at the time. Therefore the router has to act as a store and forward relay rather than a mere proxy, forwarding request and response packets.

So after spending a little time thinking about the problem and listening to Jean-Francois at Booking.com explain just why something like this would be useful, I decided that what seemed like a crazy idea at first was indeed a very good idea. A germ of an implementation plan began to form and I started to construct a proof of concept of a router plugin module that would route binlog records. This rest of this post is the story of how I put that proof of concept together.

The Prototype Router

The requirements for this new router plugin where now becoming clearer
  • It must request and receive binlog records from the master autonomously of any slave activity.
  • Binlog records must be stored on permanent or semi-permanment storage to allow replaying of those records to the slave servers.
  • The slave servers must be able to request historical binlog records without sending any additional traffic to the master server.
  • Binlog records received from the master must be relayed to the slaves that are able to accept them, i.e. not lagging behind the master, in a timely fashion.
I was also very keen that while doing this I did not do anything that would specialise MaxScale, I wanted the binlog router to work with the same MaxScale core, with the same protocol plugins, monitors and filters, as any other plugin. It is fine to enhance the core to provide new generic services that are not already available, but wholesale adaptation of the core for this one requirement would not be desirable.

With these requirements and constraints in place I decided the best thing to do, as always, was to divide and conquer. Hence the first part of the implementation would concentrate on the interaction between MaxScale and the master. Registering for the binlog records, retrieving them and storing them in a file. Once this was working the next step would be to move to the slave side.

Master Interaction

The first step in the process was to examine a typical interaction between a slave and the master during the registration. Although the protocol defines the registration process itself, the slaves run a number of queries in order to determine server version and settings in place on the master that might have an impact on the slave. Since MaxScale would need to act as a master it must also be able to support these queries from slaves of MaxScale. The choice was that either MaxScale should know how to respond to these queries or it merely proxies these slave requests to the real master. The problem with the later approach is that it would require a different connection from the one that MaxScale uses to receive binlog records. It is not possible to use the same connection, since once the connection is receiving binlog records you can not send another query on that connection without halting the flow of binlog records. The decision was therefore taken to not use this method, but rather to have MaxScale respond directly to these requests without forwarding them on to the master.

The method of providing these query responses is fairly simple, a list of the requests that slaves may make was built up by observing the message traffic for a number of registration sessions. These queries are then saved in the binlog router; the router executes each of these queries itself during MaxScale's own registration process and the responses are stored within the router. When a slave makes a request some time later the saved response the master made to MaxScale is simply replayed to the slave.

Registration Process

Having settled on the method of dealing with the registration queries from the slave servers the next issue was how to get MaxScale to register with the master as a slave. The normal flow of router interactions is for a client session to be created, with traffic begin forwarded to the backend databases only when that client session made a request. In this case the client session is a slave server connection, and the backend server is the master from which MaxScale is receiving the binlog records. Ideally MaxScale should register with the master and start the process of receiving and storing binlog records before the first slave connects to MaxScale. Or indeed it should collect binlog records even if no slaves are connected; so the existing workflow of a router would not be acceptable.

Fortunately there was a solution, as part of the process of creating a listener for a service MaxScale will load the router that the service uses and create an instance of the router. This involves calling the createInstance() entry point of the router. In the case of most routers this will simply set up any router wide data structures, check any router options etc. However in the case of the binlog router we use this as the trigger for MaxScale to connect to the master. The connection to the master is not a simple process however, it requires several interactions with the master server. The rule of MaxScale implementation, no thread should block, means that we can not complete the process in the createInstance() call, as it would have to block waiting for a response for the master.

The solution to this problem was for the createInstance() call to create a dummy client and establish a MaxScale session to connect to the master. This allows the usual proxy calls to be used, but with an internal client rather than an external client application. Requests are sent via this dummy client, and responses received back. A finite state machine is built to execute the various queries required in the registration process.

The trigger to move from each state to the next is the reception of the response to the previous query or in the case of the last two states the COM_REGISTER_SLAVE and COM_BINLOG_DUMP messages. This allows the registration process to be implemented as a non-blocking process, with the clientReply() entry point of the router triggering the transition to the next state and sending the next request. After each message is sent control of the thread is returned back to the MaxScale core, thus satisfying the "no blocking" rule.

Upon completion of the state machine the master server will then start to stream binlog records to MaxScale. These messages arrive asynchronously at MaxScale, as replies to the COM_BINLOG_DUMP message that was sent by MaxScale during the final state transition.  As far as MaxScale is concerned it is receiving an endless stream of responses to the COM_BINLOG_DUMP message that it sent to the master. MaxScale then saves these into a local binlog file maintained by the router plugin. The router must examine each of the binlog records in order to determine if these are records that affect the management of the binlog file itself, i.e. a binlog rotate event, or if it is a binlog record that should not be stored in the file.

With the completion of this phase of the router we now had a MaxScale setup that could connect to a master server, register for replication updates and save binlog files locally on the MaxScale server. This enabled us to test the process and confirm the contents of the MaxScale maintained binlog files matched those on the master server. The next phase in the implementation of the router was to create the slave side of the interaction, this will be covered in another post.

Links


Friday, 11 July 2014

Using MaxScale to segregate load in a MySQL server cluster

The server weighting feature of MaxScale may initially be seen as a way to influence load balancing for unequal servers, but it can also be useful for other reasons. Here we look at how we can use it to segregate different workloads within a cluster of MySQL or MariaDB servers.

Concept

The concept is based around a scenario in which you might have a number of applications that share the same database, but that have different workload profiles on that data. For example an online order processing system, with a fulfilment application and a reporting application. Each of these has different demands it places on the database and each has different priorities within the business. It is probably more important for a business to accept orders and fulfil them than to have up-to-the-minute reporting if failures start to occur within the infrastructure.

The Implementation

There are a few things to understand about MaxScale which make the implementation here possible before delving into the configuration.

Services

An application accesses databases via MaxScale using services that are defined within the MaxScale configuration file. These services define a combination of the protocols used, the databases that support the service and the servers to which the service will route. A single instance of MaxScale can support multiple services and the same server can appear in multiple services within MaxScale.

Server Weighting

The server weighting is not performed on a fixed weight parameter for every server, instead every service can define the parameter used by the service to weight the connections. This means that a server can have many weights associated with it, and each service the includes the server can weight the distribution on different server parameters.

Example

Returning to the example in the introduction, assume we define three parameters for each server, 
  • processing_weight - the order processing weight
  • fulfilment_weight - the fulfilment weight
  • reporting_weight - the reporting weight
We also define three services within MaxScale
  • processing - used by the online order processing application
  • fulfilment - used by the fulfilment application 
  • reporting - used by the reporting application
Each of these services uses the weight associated with it from the above list.

[processing]
type=service
weightby=processing_weight

[fulfilment]
type=service
weightby=fulfilment_weight

[reporting]
type=service
weightby=reporting_weight

If we assume we have six servers, called db1to db6, then we can define the set of servers for each service  to be all six servers. Each server can be assigned a weight to be used by each service.

[db1]
type=server
processing_weight=1000
fulfilment_weight=1
reporting_weight=1

[db2]
type=server
processing_weight=1000
fulfilment_weight=1
reporting_weight=1

[db3]
type=server
processing_weight=1000
fulfilment_weight=1
reporting_weight=1

[db4]
type=server
processing_weight=1
fulfilment_weight=1000
reporting_weight=1

[db5]
type=server
processing_weight=1
fulfilment_weight=1000
reporting_weight=1

[db6]
type=server
processing_weight=1
fulfilment_weight=1
reporting_weight=1000

Using the values above, the order processing application, connecting to the processing service would have 1000 connections to db1, db2 and db2 before any connections would be routed to db4, db5 or db6. The fulfilment application would have 1000 connections routed to db4 and db5 before any other servers are used. The reporting application would have 1000 connections to db6 before any other servers are used. So what do you gain by doing this above simply having services that just include the subset of hosts that you wish to have the particular application connect with? The answer is you have high availability.

If in our example the server db6 fails, connections from the reporting application would be shared amongst the remaining servers. In this case we have made it equally likely that these go to the ordering system servers and the fulfilment system servers, but the weight can be adjusted to make reporting favour the fulfilment database servers if that is a more desirable failure mode.

In this example I have used the term connections, but the same mechanism is available for both connection routing and routing using the Read/Write Splitter. The segregation method can also be applied to both replication clusters and Galera Clusters. In the former only read operations (or connections) can be segregated; however in a Galera Cluster both read and write operations can be segregated.

Tuesday, 8 July 2014

MaxScale Modules - What's In 1.0Beta?


MaxScale is a modular proxy application, the modules can be considered as the building blocks of your proxy implementation within your MySQL database environment. It is important to know what building blocks you have at your disposal.  The release of version 1.0 as a beta means that the number of available modules has grown once again. Normally I post about the incremental changes in what is available, but I thought that maybe it was a good time to post a short summary of all the modules. This is not designed to be comprehensive documentation for the functionality available, merely as a summary of what is available.

Routing Modules

Routers are perhaps the most important modules within MaxScale, since they make the decisions as to where to send requests. However they are incapable of functioning autonomously and require monitor and protocol modules in order to fulfil a useful function.

There are two classes of router within MaxScale, real query routers and pseudo routers that are used as ways to expose internal MaxScale state. There are two important query routers included in the release, the readconnrouter and readwritesplit modules.

Readconnroute

The readconnroute module should perhaps be renamed, it is a general purpose routing module that is used for all forms of connection routing, not just for read only connections. It will distribute connections amongst a set of servers using the constraints passed in the routing options. The following routing options are available;
  • master - only route to a server which has the master status bit set
  • slave - only route to a server that has the slave status bit set
  • synced - only route to a server that has the synced status bit set
If no routing options are given then the router will distribute the connections over all the servers that are currently running. Servers that are down or in maintenance mode will not have connections routed to them.

In a hierarchical MySQL replication tree the master node is treated as the node nearest the root of the tree that has the master bit set. Any intermediate nodes, which have both the master and slave bits set, will be treated as a slave node.

The synced option is reserved for use with Galera clusters and ensures that connections are not routed to servers that a not a fully fledged member of the cluster. In addition the Master/Slave states may be used in Galera along with the special monitor module that elects one of the Galera nodes as a master.

The distribution of connections within a set of servers may not be even, but will honour any weighting bias that is introduced via the weightby setting of the service and the individual server weights.

Read/Write Splitter

The Read/Write Splitter is a much more sophisticated routing module, it examines the incoming request and determines if that request should be sent to a read/write server or if it may be sent to a read only server; i.e. a master or a slave in MySQL replication environment. The Read/Write Splitter is not however limited to MySQL Replication, it may also be used with Galera Clusters.

When used with Galera the monitor module will elect one of the nodes in the Galera cluster as the master to which all writes will be routed. The read load may then be spread across all of the other nodes within the Galera Cluster. This provides a means to use Galera in a pure HA mode, with read load balancing, rather than a true multi-master configuration.

The Read/Write Splitter is able to balance the load on the slave servers using several different criteria, these as chosen by the use of a router option setting in the MaxScale service. The options available are;

  • least number of connections from all MaxScale services
  • least number of connection from this MaxScale service
  • least number of current operations currently in progress
  • least lagging behind the master
The server weighting mechanism may also be applied to the first three of these methods in order to define a non-uniform distribution within the chosen balancing criteria is required.

In addition it is possible to define a maximum replication lag, in terms of seconds behind master, which a slave must adhere to in order to be chosen as an acceptable destination for read only statements to be sent.

Pseudo Routers

MaxScale 1.0 beta also comes with a number of pseudo routers.

CLI

The CLI module is a simple information provider used by the maxadmin client interface in order to execute statistics gathering and administration commands. This is not a router in the true sense, but rather a way to expose internal MaxScale data.

Debugcli

The debugcli router is closely related to the CLI router and provides an interface for developers to access internal MaxScale data structures.

Monitors

Monitors provide data to the rest of the modules within MaxScale which will be used by them to determine the current state of the servers within the system and hence the best destination to send requests. It is possible to use MaxScale without any monitors, setting the server states manually, however this looses many of the advantages of using MaxScale. There are currently two monitors available in the 1.0 beta version of MaxScale.

Mysqlmon

A monitor module that is designed to monitor MySQL replication configurations. It will detect the master and slave status of each server it is monitoring and will build the relationships between these nodes in order to give MaxScale modules access to the replication hierarchy information. In addition the mysqlmon module will measure the replication lag for data written to the master before it is available on each of the slaves.

Galeramon

This monitor module is designed to monitor Galera clusters and provide information regarding the membership of the Galera cluster for each of the servers. In addition the Galera monitor is able to nominate a single server within a Galera cluster as a master node and the remainder as slaves. This allows Galera to be used as a highly available database with a single writable master and read only slaves with a very fast failover response.

Filters

Filters provide a means to extend the MaxScale functionality by adding small modules into the processing pipeline of the MaxScale service. They allow for examination and modification of SQL requests, four such modules are included in the 1.0Beta release.

QLA

The qlafilter is a simple query logging filter that writes copies of a query to a per user connection log file. The qlafilter has mechanisms to limit those queries that are logged using regular expressions, connection source address and connection user name.

Regex

The regexfilter provides a means to alter SQL statements as they pass through MaxScale. It uses regular expressions to match and replace content as it traverses MaxScale.

Tee

A filter that will duplicate all or some of the requests to a second service within MaxScale. The tee filter  also has mechanisms to limit those queries that are logged using regular expressions, connection source address and connection user name.

Top

The top filter is a logging filter that will record the top N longest running SQL statements in your connection. When the connection is closed a log file per connection will be written. The number of statements that are retained by this filter is controllable via a configuration parameter for the filter.  The top filter uses the same mechanisms to limit those queries that are measured as those available for the QLA and Tee filters, regular expressions, connection source address and connection user name.

Protocol Modules

Protocol modules are, as the name suggests, responsible for the protocol aspect of interface to or from external systems and MaxScale. The most important protocol modules are those used for the communication from the client application to MaxScale and the MaxScale to database server communication path.

MySQLClient

The protocol module used for client applications which normally directly connect to a MySQL database to connect to MaxScale.

MySQLBackend

The protocol module MaxScale uses to connect to the backend MySQL databases.

Telnet

Terminal protocol used to connect the debug interface to MaxScale. This is only used for the debug interface and will never be used to carry database traffic.

Maxscaled

The protocol used by the MaxScale administration interface to connect to MaxScale.

HTTPD

An experimental HTTP protocol for use by HTTP requests for REST style interfaces.




Bugzilla: https://bugs.skysql.com

IRC: #maxscale on freenode

Friday, 4 July 2014

MaxScale 1.0 Beta Highlights

MaxScale, the pluggable proxy for MySQL databases has just gone beta. With this new release comes a number of new features that improve the usability, functionality and versatility of MaxScale. Here are  a few of the things that have now become available with MaxScale 1.0 Beta.

Read/Write Splitter Routing Module

A number of enhancements have been made to the Read/Write splitter routing module for MaxScale
  • Support for prepared statements
  • Fault tolerance for read operations via multiple slaves
  • Configurable load balancing methods
  • Integration with slave replication lag monitoring to cap the acceptable lag
  • Ability to weight the load distribution mechanism

Prepared Statements

The preparation stage is now correctly recognised and distributed to all servers within a session that might need to execute these statements at a later date. The execution is then routed to an appropriate server at the time the prepared statement is executed. This allows applications that use prepared statements to the used with the Read/Write Splitter routing module for the first time.

Slave Fault Tolerance

When multiple slaves are available the Read/Write Splitter can now be configured to use more than one of the slaves within a single session. Failure of a slave can thus be masked from the client application by switching form the failed slave server to another slave server. The router may even be configured to balance individual statement execution amongst the pool of slave servers using the operation based balancing criteria below.

Configurable Load Balancing Criteria

It is possible with the 1.0 beta release to define one of a set of criteria to be used to balance the load generated by the Read/Write Splitter routing module. There is a choice four methods that may be used;
  • The total number of connections to the servers, from this MaxScale instance
  • The number of connections to the server for this particular MaxScale service
  • The number of statements currently being executed on the server on behalf of this MaxScale instance
  • Route statements to the slave that has the least replication lag

Replication Lag Cap

Support has been added to the Read/Write Splitter which allows a maximum replication lag value to be defined for a service. Any slave server that is more than the specified number of seconds behind the master server will be discounted from the set of slaves that data may be read from.

Weighted Load Distribution

It is possible to define parameters on each server that can then be used to weight the distribution of load between the servers. The router then applies a weighting algorithm based on these parameters to the load balancing. For example, let's assume you have three servers, one of them has 16Gb of RAM available for the MySQL server and the other two have 32Gb of RAM. Parameters can be added to the servers to specify how much RAM each server has. The router (or service) can then be configured to weight the load balancing based on the about of RAM the server has. Assuming all three servers are eligible to receive the load, then two of the server would get 40% of the load each, whilst the other server with just 16Gb of RAM would get 20% of the load.

Improved Monitoring

The MaxScale MySQL Replication monitor module is now able to understand complex replication implementations that involve the use of replication tress with intermediate master servers. This allows for correct identification of the server within the replication tree to which updates should be routed and prevents issues with intermediate maters being mistaken for valid severs to which to direct write statements.

This tree structured replication monitor may be used with either the Read/Write Splitter or the connection routing modules. Coupled with the replication lag heartbeat mechanism and basic monitoring of server state that was already available within the monitor module, this provides the routing logic with a comprehensive view of the state of your MySQL replication cluster. Providing the raw data that the routers can then use to decide the optimal destination for your SQL statements.

Filter Enhancements

Filters first became available in version 0.7 of MaxScale, at that point only downstream filtering was available. This meant that only the queries form the client to the router could be examined and modified by the filter, a filter had no visibility of the results that came back from the server. In 1.0 both upstream and downstream filtering has been implemented, this allows a filter to not only obtain the request data but also the response to that request.

An example of a filter that uses this functionality is included within the 1.0 beta release, the "top" filter. The filter keeps track of the SQL queries that are sent, measures the time delay between the request being sent and the first response being received and stores those that take the longest to execute within a single session. When the session is closed the filter will write out a report file with the longest running queries, the time each took and statistics regarding the session duration and average execution time. The number of queries logged in this way is a configurable parameter of the filter.

Another new filter in the 1.0 beta release is the "tee" filter. This acts in a similar way to the tee command in Linux, it allows statements that are sent through the filtering pipeline for a MaxScale service to be duplicated and sent to a second service. This may be used to duplicate some or all of your SQL statements to another system, such as a reporting or analytics system for example, or to duplicate  your SQL statements to a test system during upgrade testing.

New configuration options have been added to all the example filters such that the filter can be enabled on a per source address or per user name for each session. In addition most filters provide a pair of regular expressions that are used to include or exclude statements that match those expressions.

Client Application

A new client application is included in the 1.0 beta version. This client application provides a command line interface to MaxScale for monitoring and administration purposes. Access may be either via an interactive shell mode or by passing individual MaxScale commands as command line arguments. The interactive mode supports command line history mechanisms for recalling and editing previous commands.

The client also supports command scripts via the interactive source command or by using the shell #! mechanism to develop executable scripts for use within the Linux shell.

Other Items

The 1.0 beta includes a number of bug fixes, support for running MaxScale under Pacemaker and Heartbeat control and various smaller enhancements. Binary distributions are available for download from the SkySQL downloads page in various formats, as is a tar ball of the 1.0 beta source. All MaxScale source is available via GitHub, and bugs may be logged in the SkySQL bugs site.

Wednesday, 11 June 2014

Pipes, Filters and MaxScale

It was back in the 1980's when I was first introduced to Unix, the concept of scripting and in particular pipes. At that time I remember being really impressed by the concept of small utilities that could be easily strung together to quickly create tools. It was a great way to implement those one off jobs or things that would not warrant the development of a specific tool,  or so it seemed. I soon realised that actually, for some purposes, there was no need to go on a develop a bespoke tool to do the job, scripts were good enough. What does this have to do with MySQL and MaxScale?

The answer lies in a new feature that has been introduced in version 0.7 of MaxScale, filters. Filters are simple MaxScale plugins that can be inserted into the processing stream of a MaxScale service. One or more filters can be placed between the client application protocol module and the router module within MaxScale. The request data received from the client is passed to the first filter in the chain, that filter may examine and optimally modify the data before passing it on to the next filter in the chain. This is termed downstream filtering in MaxScale and allows for the implementation of such filters as query modifiers, logging and filtering. In the next release of MaxScale support will also be added such that a filter may also receive the upstream data, the result set returned by the database. Upstream filtering will be optional as not all filters require access to the data.

The concept is to bring some of the power and flexibility of Unix pipes to the SQL world. Because the filter API is relatively simple, and MaxScale provides lots of utility functions to make life easier, the task of writing a filter is relatively easy. Also there are some filters already available and the hope is that more will be provided, both by us and via community contribution. 

The advantage of this approach is that, because the filters are in MaxScale, the way an application interacts with the database can be altered without the need to modify either the application or the database. As the catalogue of filters grows it is hoped that filters may be combined in order to allow MaxScale to not only be a useful proxy and scaling tool, but also to add to the utility of database programming.

In the 0.7 release one of the most interesting filters included is the regex filter, this can be thought of sed for MySQL. It effectively allows SQL statements to be modified as they pass through MaxScale, matching a pattern and substituting a replacement string. Although a simple example of a filter, it does have some practical use. Imagine you have an application that uses syntax that is no longer supported, such as the MySQL create table syntax from the days of 5.1 which used the keyword TYPE and has now been superseded with ENGINE. Rather than modify your application you can merely put a filter in place that will match the TYPE keyword and replace it with ENGINE.

Another filter included  in the 0.7 release is a logging filter, it simply writes a copy of every SQL command sent through MaxScale into a log file. A new log file is created for each client connection through MaxScale. This is a simple way to see what an individual connection is doing without modifying your MySQL logging levels or your application. In the next version another filter will be available, that uses the upstream filtering to monitor when results are returned. This filter keeps only the top N queries by execution duration and writes a log file when the connection is closed. This file list the queries that took longest to execute within the session. The number of queries it reports can be configured via a filter parameter.

I wanted to bring some of the flexibility and power of Unix scripting and pipes to the MySQL world and hope that the filtering mechanism in MaxScale achieves at least some of this. There is still a lot of work to be done, but release 0.7 at least previews some of this functionality. Others features yet to come are the ability to have branches in the filter chain, essentially the Unix tee command for MySQL connections. This will open up a number of new areas, either diverting or duplicating statements into a second database server or, due to the way MaxScale allows for protocol plugins, to other non-MySQL data stores.

Friday, 6 June 2014

MaxScale 0.7 Feature Highlights


The latest set of updates for MaxScale has just been released, among the highlights of this alpha release are improved Galera support, the introduction of the first phase of the filter API and server maintenance mode.

Galera Support

MaxScale has always supported Galera, at least since the earliest downloadable release, now it has the ability to use Galera in a different way.

The initial Galera support simply involved using the Galera Monitor plugin to provide the routers with information as the the state of each of the nodes in the Galera Cluster. This meant that routing of queries could be controlled such that queries would only be sent to nodes that were fully fledged members of the cluster. No statements would get sent to a node whilst it was still joining a cluster or during state transfers.

The latest release provides a new monitor mechanism that as well as being able to report the state of the nodes within the Galera cluster will also elect one node within the cluster as a master and the remaining nodes as slaves. This then means that the routers within MaxScale which have been designed to work with MySQL Master/Slave replication can now be used with Galera Cluster, treating one of the Galera Cluster nodes as master and the others as slaves.

Using the Read/Write Splitter routing module with Galera and this monitor will then result in all the writes going to a single database node, the one that has been designated as the master. Meanwhile all of the read operations will be distributed over a number of the other nodes in the Galera Cluster. Since all the writes are sent to a single node there is no write contention. Because the backend databases are in a Galera Cluster, should the master fail, failover can take place almost instantly, once the failure is detected. There is no need to do any work on the remaining nodes to re-establish the replication.  The MaxScale monitor merely detects the failure, elects a new master and write operations can continue on that master.


Filter API

One of the long planned API's for MaxScale makes an appearance in this version, the first phase of the filter API has been included. This allows filters to be added, as plugin modules, that sit between the client interface to MaxScale and the routing modules. All data that arrives via the client protocol may be passed through the filter before it is  delivered to the routing module for eventual routing onwards to a database server for execution.

The filters may either inspect the content of these requests and pass them on, or the filter may modify the request and pass this modified data on to the router and ultimately the database server that will execute the request. Filters may be chained together to form processing pipelines through which requests are passed before eventual routing to a server and execution.
In the next phase of the filter API development will add the ability to filter the result set data as it comes back from the router and is sent to the client.

Included in the package are three example filters, these are unsophisticated and are really just included to demonstrate the functionality. One of these allows for the counting of statements, another provides  basic logging functionality, whilst the third provides the means to use regular expressions to change the  statement contents.

Server Maintenance Mode

Server maintenance mode is a very simple idea, it is just a way to tell MaxScale to ignore a particular database server for the time. The server remains in the MaxScale configuration, it merely is put into a dormant state in which MaxScale will not create any new connections to the server or issue any monitoring commands against the server. Any connections that exist to the server when it is put into maintenance mode remain, they are not closed.

The idea is that a DBA wishing to carry out some maintenance to a database would first put that database into maintenance mode, over time the connections to the database node would be closed naturally, assuming an environment in which connections are not held permanently open. Once those connections had diminished to zero, or an acceptable level, the DBA could take the database out of service and perform whatever actions are required.

Tuesday, 27 May 2014

MaxScale debug CLI, software repurposing in action

It is strange the way code evolves and takes on a life that was not originally intended for it. The debug command line interface (CLI) of MaxScale, the MySQL pluggable proxy, is an example of this and something I have been thinking about and working on as part of an exercise to develop a true management interface to MaxScale.

Originally the debug CLI was meant as a debugging tool for the developers working on MaxScale, it gave a mechanism to be able to query the internal data structures of MaxScale and observe what was happening without the need to run against a debugger and deal with the threaded nature of MaxScale within the debugger. Hence it was built so that is allowed the user to type in memory addresses and examine the contents of this memory as if it was a particular data structure. It was a deliberate decision not to validate these addresses by checking that they existed within the lists of known objects, since in a debug situation it is quite possible these lists are corrupted and it is also advisable not to take out locks in order to walk linked lists and the like.

As a quick aside it is probably also worth saying a few words about how the debug CLI got implemented in the first place as this itself was another example of "repurposing". MaxScale was always designed to support proxy operations, with network protocol modules that could be plugged in at will and routing modules that could be used to determine how to direct incoming requests to one of a set of candidate backend servers that could service that request. During the development it became obvious that we could "mis-use" this concept to provide a simple terminal protocol, telnet, to allow users to connect to MaxScale and that we could provide a routing module that instead of routing requests to a backend could itself provide the result to the request that was made. This was how the debugcli routing module was born and the mechanism for providing a debug interface for the developers came about.

Moving on a few months in the story, we now have a group of users that have become interested in MaxScale and who are experimenting with it to see what it is capable of. They wanted some way to see what was going on and to an extent to the able to control MaxScale, the debug CLI seemed to offer precisely this. So the interface originally meant for developer use and never intended to even be released, other than to other developers within the community, was suddenly the main interface to monitor MaxScale. The major problem was this was of course the use of memory addresses to identify objects; this allowed arbitrary examination of memory but also could cause MaxScale to crash due to users giving invalid memory addresses. Clearly neither of these are good to allow, but the debug facility of the interface was still useful to developers, so my thoughts turned to what we could do to improve things in the short term whilst giving us time to create a really good administration interface to MaxScale?

The answer that came back to me was pretty simple, reuse the mechanism that we use in other router modules to control the routing decisions they make to provide two modes in the debug CLI - router options. The result is that we now have a debug CLI with two modes; developer mode and user mode. In developer mode the debug CLI works as before, with memory addresses that are unchecked. However in user mode the arguments passed to the majority of commands are the names of objects as defined in the configuration file for MaxScale. So for example if a service is define with a section name of "My Test Service" the show service command in the debug CLI can be passed this name, e.g.

    MaxScale> show service "My Test Service"

Likewise the server can be stopped with

    MaxScale> shutdown service "My Test Service"

and restarted with

    MaxScale> restart service "My Test Service"

Note that because these names may contain whitespace the concept of quoting has been introduced in the debug CLI, either using the \ mechanism to escape white space or the double quotes.

Where objects are not named in the configuration file, such as the descriptor control blocks and sessions, the memory addresses are still used, however these are checked in user mode to ensure that they are valid objects of the type expected. The show dcb command is an example of this, it will take a memory address as an argument and before dereferencing that address it will check to see that the address given matches a DCB in the internal list of all known DCBs within the system. This prevents the passing of an arbitrary address allowing inspection of memory or worst still crashing MaxScale itself.

Whilst not a perfect solution from a usability perspective these changes have at least given us a reasonably secure interface that can be used whilst the mainstream development of MaxScale continues and the eventual administration interface is developed.

The code that implements these changes has been developed and is currently in the public GitHub repository on the development branch. It is planned to merge this to the master branch and release it in the next binary release in the MaxScale alpha release path.

Even with these changes I would recommend that the interface is not made completely open and is restricted to connections from the local machine. The protocol used is telnet, hence the data is not encrypted and therefore open to eavesdroppers on a network. To restrict the debug CLI in this way ensure the listener defines an address parameter of localhost.

To add the debug CLI to your MaxScale configuration simply add the following two sections to your MaxScale.cnf file.

[Debug Service]
type=service
router=debugcli
router_options=user

[Debug Listener]
type=listener
service=Debug Service
protocol=telnetd
address=localhost
port=4442

Please note, if no router options are given the default mode of the debug CLI is user mode. To revert to the previous developer mode then set the router options to developer.

Documentation on the debugcli is available as a PDF file in the Documentation directory of GitHub.


IRC: #maxscale on freenode