Monday, 31 March 2014

MaxScale read/write splitting, authentication and networking updates

It is now 3 months since we put MaxScale out in the public domain in GitHub, we have not been sitting idle during that time, rather we have been adding some new features and fixing some bugs. We focused on three main areas of functionality; statement routing, authentication and network support.

Statement Routing

Our main focus has been on the read/write splitter routing module, as it stood when we put the code out into GitHub there were a number of limitations on its use. We wanted to remove as many of these as we could and refresh what we had made available in both GitHub and as a downloadable binary.

Session Commands

This might seem like a strange thing, but one of the considerations when doing statement based routing,  in which statements in a single client connections may be diverted to one of many backend servers, is managing statements that have an effect on the execution environment. It is important to ensure that the same effect is propagated to any server to which a future statement might be sent.

A simple example of this would be the USE DATABASE statement, although other statements, such as SET also modify the client session. Consider a simple environment in which we have two possible servers to which to route statements; server A and server B. Server A will handle any statement that modifies the database content and server B will be used to read data from the database when the statement does not update the database. If we route the USE statement to server A then we will potentially read the wrong data back when we send a SELECT statement, since this will be routed to server B. The opposite applies if we just send the USE statement to server B, INSERT, DELETE and UPDATE will get run on the wrong database. Therefore we must send the USE statement to both server A and server B.

The problem now is that the client has sent one statement to MaxScale, but MaxScale has sent two statements, albeit identical, to two different backend servers. MaxScale can not simply send the responses back the client, otherwise it will get two responses to the same statement. Therefore MaxScale must manage this, what it now does is send back the first reply it gets from either server. This solves the problem for the client, but gives MaxScale another issue.

As soon as the client receives the response it may send another statement, MaxScale has to be careful not to forward that statement to a server that is still executing a previous session command, such as the USE statement in the example above. There may also be multiple successive session modification commands in the incoming client stream, MaxScale must therefore ensure it maintains the ordering and overlapping of these commands to the various backend servers.

Transaction Support

Statement routing needs to be aware of transactional boundaries in the incoming stream of statements from a client and ensure that all the statements within the transaction are executed within a single transaction on a single backend database server. The approach the read/write splitter takes currently is to send the entire transaction to the master server. This is done as it is not possible to determine if the transaction will at any point cause a database update. Therefore the safest option is to send all transactions to the master server.

There is currently a limitation in the read/write splitter implementation, it does not take note of implicit commit operations caused by setting auto commit to true within a running transaction.

Authentication

MaxScale has been updated to handle authentication in the same way as is done within MySQL, it is not just a username and password that is used to authenticate, but rather a triple consisting of username, password and client host are used. This allows MaxScale to apply the same rules as the backend servers themselves apply. Therefore user A from host X is treated as different from user A at host Y. It should be noted however that the backend database sees all user as connecting from the host that runs MaxScale, therefore there must be permission for users to also connect from the MaxScale host. The password for the MaxScale host must also be the same as the password used to connect from the originating host.

One problem with previous versions of MaxScale was that it only loaded the user data at startup, this meant that any changes or additions to the users within the backend database was not reflected within MaxScale until the next restart. This restriction has been removed, with MaxScale now reloading the user data whenever there is an authentication failure. This reload facility is rate limited to prevent bogus authentication requests being used to launch a denial of service attack on MaxScale or on the backend servers themselves.

The other change related to authentication is the introduction of an explicit configuration option on a per MaxScale service basis that can be used to enable and disable access with the root user.

Networking Support

The two main changes in the networking area are the introduction of the ability to bind listeners for services to a particular address and the option to use Unix domain sockets for connections from clients to MaxScale. The use of Unix domain sockets allows for a more efficient implementation when installing MaxScale on the same host as the client software, e.g. an application server and also gives the option for improved security by blocking network access to those MaxScale instances.

Other Changes

A great many bugs and smaller fixes have been incorporate into MaxScale, details of the bug fixes can be found on the SkySQL bugzilla system (http://bugs.skysql.com), a release note is available within the GitHub repository for MaxScale, as is all of the code. For those of you that would rather download a binary rather than compile the source, this is also available on the SkySQL download site.


IRC: #maxscale on freenode

Tuesday, 11 February 2014

MaxScale - Just What Can It Do Today?

It seems that in all the writing around what MaxScale is, why I think we need it and what we plan to do with it, we have created some confusion as regards what it can do as of today, mid-February 2014. So I thought I would try to clear it up a little my writing this short post. The version that is described here is the one that is available from SkySQL download site and is labelled as version 0.4, later code is available in GitHub, with the changes we are working on, but I will not cause more confusion by referring to these.

Our design concept for MaxScale is a core with five plugin interfaces that provide different functionality, currently we have a core, that supports the event driven model we wanted, but we have only implemented three of the five plugin interfaces. The interfaces that have been implemented are the protocol plugin, the query router and the monitor.

The authentication mechanism we are using currently is built into the core, not the long term goal but something we have done in order to get a version that we can start sharing with everyone. The authentication offers the standard MySQL authentication protocol, the client applications authenticate themselves with MaxScale and then MaxScale will deal with the authentication between itself and the backend databases. The username and password used to authenticate with MaxScale are extracted from the backend database, in the 0.4 version the client host information is not checked, this is considered a bug which is being worked on.

The core also includes the MariaDB parser which is used by the routers or other modules via an interface we call the query classifier. This allows us to extract the read, write or session modification properties of the statement. The result is that we can create routers such as the read/write splitter that we have, but more of this in a moment.

Protocol Modules

Currently we have a number of protocol modules, the two most important of these are the MySQL Client protocol and the MySQL Backend protocol modules. The client protocol module is used by client applications that connect to MaxScale. In this case MaxScale acts as a server and offers a full implementation of the MySQL protocol with the exception that does not support compression or SSL. We currently turn off these capabilities in the handshake we perform with the clients.

The MySQL Backend protocol module is designed to allow MaxScale to attach to a MySQL, MariaDB or Percona Server database. It operates as the client side of the protocol, although it differs from a normal connector since it is designed to take data in a way that is more efficient for the proxy activities it is performing. In particular if we are doing connection based routing we want to take advantage of having the ready made packet formats.

The combination of these two protocol modules means that the only database clients and servers we can support with the version 0.4 are those that talk the MySQL protocol.

In addition to these two protocol modules used for database connections we have a telnet protocol module that we use for a debugging interface and an experimental HTTP protocol implementation.

Routing Modules

We have two routing modules currently, a connection based router and a statement based router. The connection based router is known as the readconnrouter, which is perhaps not the best name for it. What this router can do is evenly distribute connections between a set of servers, it keeps a count of the number of connections that are currently open to a server via MaxScale and uses this to find the server that has the least number of connections. If multiple servers have the same number of connections then it will round-robin between these servers.

The connection router can be configured with some options, these options are used to decrease the set of servers that it will consider when it looks for the server to which connections are routed. The options supported in the 0.4 version are;
  • master - Only route to servers that are marked as replication masters
  • slave - Only route to servers that are marked as replication slaves
  • synced - Only route to servers that are marked as synced, in a Galera cluster
This allows a service in MaxScale to be setup for read/write connections and have them sent to only the  master, by specifying the router option "master". Another service, on a different port, can use the same set of servers but by using the router option "slave" it will route to just the slave servers. This option allows for read only connections to scale the read load by making use of the slaves. The synced option is for Galera Clusters and will distribute connections across all the servers that are synced and part of the Galera cluster. If no router option is given then the router will distribute the connections across all the servers that are available.

The read/write splitter module is a statement based router, it uses the query classifier to parse the query and determine if it is a read statement, write statement or a session modification statement. Read statements are sent to a slave server whilst write statements are sent to the master server. Session modification statements are sent to all the servers, these are statements such as those that modify a variable or change the default database. Statements that may have unknown side effects, such as selects that use a stored procedure or user defined function are sent to the master. Any statement that fails to parse is also sent to the master rather than depend on the parser that is built into MaxScale.

The read/write splitter is only designed to work with MySQL replication and does not support any router options, however a bug in the current version means that it does not ignore the "master" and "slave" options, these should not be given as they cause the router to fail. There are also some limitations in the 0.4 version that we are working on. These limitations are:
  • Session modification commands may result in multiple returns begin sent to the client for a single statement
  • Very long statements that require multiple packets fail to parse and are thus always routed to the master
  • Prepared statements and the execution of prepared statements may not always occur on the right server
  • Transaction support is currently missing - a transaction should be execute on a single server
  • The choice of master and slave server is currently only done as connect time, if the master moves the connection to the client will be failed. Reconnection to the new master should be handled by the router without intervention by the client.
We also have one other module that implements the router API, the debugcli module. This is not a true router as it does not route queries, however it is used to obtain a debug hook into MaxScale in order to examine the internal state of MaxScale. It makes use of the telnet protocol in order to allow connections to MaxScale and a simple set of administrative commands are supported. This is documented in one of the PDF files that can be found in the Documentation directory in the GitHub repository.

Monitor Modules

Currently there are two monitor modules available in MaxScale, the mysqlmon and the galeramon. The MySQL monitor is designed for use with Master/Slave replication environments and provides MaxScale data to determine if each database server is up and whether the server is current master or a slave. This information is used by the routers when they determine the set of servers to which to route connections or queries.

The galeramon monitor is designed for Galera clusters and looks at the Galera status variables to determine if each database server is part of the cluster and eligible for statement execution to take place or not. It will set the "synced" property on only those servers that report as begin synced to the cluster.

Was I Clear?

Hopefully I have been clear here as to what we support and cleared up some of the confusion that I probably caused in my enthusiasm to share our ideas with you all. Things are changing all the time of course, there are already fixes and improvements in the GitHub repository and we are working on many more. In the next few weeks I will do another update of what we have achieved, I hope this all makes sense and please feel free to comment if there is still some confusion, you wish to point us in a particular direction or get more involved in what we are doing.

Monday, 27 January 2014

MaxScale - Do we need another MySQL proxy?

I have spent some time thinking about and working on a project that went public on GitHub at the beginning of this year. That project is called MaxScale and is primarily a proxy server for MySQL/MariaDB databases, although it can be something much more than this. The obvious and often asked question is why do we need another proxy? I want to try to give you a flavor for what MaxScale is and why I think there is a need for a tool like MaxScale.

The architecture of MaxScale makes it different from your average proxy
  • MaxScale has awareness of the content it is shipping.
  • As well as being aware of the content the proxy is shipping it is also aware of the configuration and state of the servers to which requests are proxied.
  • It provides plugin modules that can be used to implement the routing logic, supported protocols, authentication methods, monitoring and filters.
  • The implementation is a lightweight, event driven core executed by multiple threads,  efficient multiplexing of multiple requests on a single thread allows the best use to be made of the available threads.
Why content awareness helps

MaxScale is able to look inside of the requests it is forwarding and use data gathered by viewing the request content to help it decide the best location to send the request. This means that it is possible to look at the SQL statements, determine the read or write scope of the statement, the database objects that are manipulated and the keys involved in the statement. Using all of this data MaxScale can then make very well educated decisions as to which is the best destination database server to handle the request.

Configuration awareness

As well as content awareness MaxScale is also aware of the dynamic configuration of the database servers. This means that it can take into account the current state of each server and the role it is currently assigned within a cluster of servers. In a typical Master/Slave replication scenario this could mean knowing which server is currently the writeable master and which are slaves. It is possible to take this even further and have MaxScale aware of any replication lag in the system on a per server, per schema or per table basis. The result is that MaxScale has the information to be able to make the best use possible of the database servers that are behind the proxy. This can all be done transparently to the client applications that simply attach to MaxScale as if it were a single instance of MySQL or MariaDB.

Plugin Architecture

Rather than have MaxScale attempt to be all things for all men it has been designed with the concept of using plugin modules to allow the functionality to be tailored to individual implementation needs. This same architecture allows for very simple addition to the available functionality by third party developers. The plugin API and complexity has been deliberately designed to ease this task.

Plugin interfaces exist to allow new protocols to be added, currently MaxScale supports the MySQL protocol for both client connections and for connections to the underlying databases. It is possible to add a new protocol for client connections or for connection to a database that does not support the MySQL protocol. In addition to the protocol plugin modules there are also query routing plugins. These query routers are the modules that make the decision as to which backend database server a particular request is routed. They may either route on a connection or individual statement basis, the routing decision they make is dependent on the statement being routed, data collected by the MaxScale monitors as to the state of the underlying databases and the policy implemented by the router itself.

Monitors are the third plugin module type supported in MaxScale. These are responsible for gathering data from the underlying databases and storing it within MaxScale. This data is used as input to the various query routing modules in order to influence the routing decision of those modules based on the database state. Currently two monitors are available, one targeted to Master/Slave replication setups and the other to Galera clusters.

Two more plugin module types are planned but not yet implemented in MaxScale; authentication plugins and filters. Authentication plugins allow the use of non-native methods of authentication and are planned to follow the same model as the authentication plugins in MariaDB.

Filter plugins are designed to allow arbitrary operations and transformations on the actual SQL requests and the result sets returned. Filters will be able to be connected together to form a chain, with branching options, so that a single request can traverse a complex set of filtering modules and even be split and sent to multiple underlying or external systems. The filters will be able to attach hints to the request data that can be used in downstream filters or the final routing process for the request. A filter may also reject the request, causing it to send a failure notification back to the client, thus providing firewall capabilities for your underlying database. Filters may also modify the request itself or simply pass the request on unaltered, logging information from the request. As well as allowing filtering on the requests it is also planned to allow filters to be used on the result sets returned from these requests.

Event driven, light weight and multi-threaded

It was desirable that MaxScale should have as low a footprint as possible, to that end it uses an efficient polling core, based on the Linux epoll mechanism. All I/O operations are non-blocking, the event driven model allows for the easy sharing of a thread between multiple requests. Threads are occupied only for the time it takes to do the internal processing and not for the duration of any database activities.

Another important design decision is the internal buffering strategy. It has been designed to allow the minimum of data copies; with the ultimate goal of zero data copies provided the required routing and filtering operations can be achieved without the need to examine the request or the response. In practice zero data copies is difficult to achieve for the request packets in most practical cases, but can be achieved for the result sets in the majority of cases.

Decide for yourself

Hopefully I have given you a flavor for why I think there is a place for a proxy like MaxScale. There is a long way to go yet, the implementation that is available via GitHub has some limitations and features that have not yet been started, however it does illustrate the concept. Please feel free to take a look at MaxScale and give the developers your feedback via the MaxScale Google Group.