18.02.2016 // Back to events list // Русская версия // Comments

TECH: HandlerSocket protocol explained

So, you come to HandlerSocket. Nice. It is a damn fast voodoo. And the protocol it talks to its clients is really simple. And because in most cases you just use some connector library, you don't really care about the details of the protocol. Yet if you're curious, you find this page. And, if you spend a few hours there, you'll figure out how it goes. Or... you may spend just 15 minutes reading this article.



I will assume that you already have a working MySQL/MariaDB/Percona server installation with a handlersocket plugin. I used MariaDB, and it ships with a built-in handlersocket, you will just have to enable it and supply a little bit of configuration.

Check out my PHP Extension for HandlerSocket on GitHub: https://github.com/crocodile2u/zhandlersocket !!! and...

Let's start, huh! First, I will log in to MariaDB console and create a table:

CREATE TABLE movie ( id int not null auto_increment primary key, genre varchar(20) not null, title varchar(100) not null, view_count int default 0, key(genre) ) engine innodb;

Now, let's connect to HandlerSocket. I will connect to port 9999, which is the default write port, because with a write connection, we will be able to issue both read and write requests:

~/ telnet localhost 9999 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'.

Great! Now this is time when we have to talk. First thing you have to do when using HandlerSocket is to open index. For all the requests that actually find/modify data, you have to provide an opened index identifier. Lets check what the docs say.

The 'open_index' request has the following syntax.

P /indexid/ /dbname/ /tablename/ /indexname/ /columns/ [/fcolumns/]

  • /indexid/ is a number in decimal.
  • /dbname/, /tablename/, and /indexname/ are strings. To open the primary key, use PRIMARY as /indexname/.
  • /columns/ is a comma-separated list of column names.
  • /fcolumns/ is a comma-separated list of column names. This parameter is optional.

OK, that's pretty straightforward. Wait. What's that /indexid/ thing? You have to provide that by your application, it's not generated by HS. In fact, your app can send to HS a request consisting of a bunch of lines at a time:

Open index 1 Find row(s) in index 1 Find/Modify row(s) in index 1 ...

The example above is only possible if the index number is assigned by the application, otherwise we would have to wait for the result of "Open index" request, which would probably give us some number, and only after that you could use that index number.

Well, let's proceed to the HS console (telnet). BTW, values in a single HS request line are separated by TAB character. For this example, I specify the optional /fcolumns/ parameter for the first index, we'll need this later on.

~/ telnet localhost 9999 ... P   1   test    movie   id,genre,title,view_count   genre 2   1   idxnum

What? The docs say the response should be like "0 1"! Ahh.. My fault. Missed the /indexname/.

P   1   test    movie   PRIMARY id,genre,title,view_count   genre 0   1

Now it's OK. We are going to query the Primary Key, so the /indexname/ is PRIMARY. Let's open another one, for the genre index:

P   2   test    movie   genre   id,genre,title,view_count 0   1

That one is open successfully, too. Now what about populating the table with some rows? Check the docs...

The 'insert' request has the following syntax.

/indexid/ + /vlen/ /v1/ ... /vn/

  • /vlen/ indicates the length of the trailing parameters /v1/ ... /vn/. This must be smaller than or equal to the length of /columns/ specified by the corresponding 'open_index' request.
  • /v1/ ... /vn/ specify the column values to set. For columns not in /columns/, the default values for each column are set.

And we should get a response like this:

If 'insert' is succeeded, HanderSocket returns a line of the following syntax.

0 1

Let's try:

1   +   3   0   Sci-Fi  Star wars 0   1   1

The response is different (notice the extra "1"). Let's check what's going on, with the MariaDB console:

SELECT * FROM movie; +----+--------+-----------+ | id | genre  | title     | +----+--------+-----------+ |  1 | Sci-Fi | Star wars | +----+--------+-----------+

Huh, it worked! Now, let's do more of those...

1   +   3   0   Comedy  Dumb & Dumber 0   1   2 1   +   3   0   Thriller    The Silence of the Lambs 0   1   3

Back to MariaDB:

SELECT * FROM movie; +----+----------+--------------------------+ | id | genre    | title                    | +----+----------+--------------------------+ |  1 | Sci-Fi   | Star wars                | |  2 | Comedy   | Dumb & Dumber            | |  3 | Thriller | The Silence of the Lambs | +----+----------+--------------------------+

Now we see that there's a perfect match between the extra integer returned by HS and the AUTO_INCREMENT id! That's cool!

What if we specify the "id" column value manually instead of providing zero?

1   +   3   1   Sci-Fi  Star Trek 1   1   121

OK, the "1" in first place tells us there was an error. HS is not extremely verbose in its error messages though. All the extra information is just the "121" which means... Hell knows, but I guess it's a duplicate key error ;-) Another try...

1   +   3   4   Sci-Fi  Star Trek 0   1   0

A quick check in MariaDB console ensures us that the record is there, under id of 4:

SELECT * FROM movie; +----+----------+--------------------------+ | id | genre    | title                    | +----+----------+--------------------------+ |              ...                         | |  4 | Sci-Fi   | Star Trek                | +----+----------+--------------------------+

This is how we can get the AUTO_INCREMENT values for just-inserted rows. The docs do not mention this, probably because there were times when HS did not support AUTO_INCREMENT.

Be careful with columns that have dynamic default values like "created_at DATETIME DEFAULT CURRENT_TIMESTAMP" or "updated_at DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP". You have to set all of those values by hand! Otherwise, instead of having CURRENT_TIMESTAMP in the created_at column, you will get "0000-00-00 00:00:00". And if you simply skip "updated_at" column when doing an UPDATE, it will contain the old value.

OK, we're done with the INSERTs. Now, lets get some data from HS! Again, check the docs...

The 'find' request has the following syntax.

/indexid/ /op/ /vlen/ /v1/ ... /vn/ [LIM] [IN] [FILTER ...]

LIM is a sequence of the following parameters.

/limit/ /offset/

IN is a sequence of the following parameters.

@ /icol/ /ivlen/ /iv1/ ... /ivn/

FILETER is a sequence of the following parameters.

/ftyp/ /fop/ /fcol/ /fval/

  • /indexid/ is a number. This number must be an /indexid/ specified by a 'open_index' request executed previously on the same connection.
  • /op/ specifies the comparison operation to use. The current version of HandlerSocket supports '=', '/', '/=', '/', and '/='.
  • /vlen/ indicates the length of the trailing parameters /v1/ ... /vn/. This must be smaller than or equal to the number of index columns specified by the /indexname/ parameter of the corresponding 'open_index' request.
  • /v1/ ... /vn/ specify the index column values to fetch.
  • LIM is optional. /limit/ and /offset/ are numbers. When omitted, it works as if 1 and 0 are specified. These parameter works like LIMIT of SQL. These values don't include the number of records skipped by a filter.
  • IN is optional. It works like WHERE ... IN syntax of SQL. /icol/ must be smaller than the number of index columns specified by the /indexname/ parameter of the corresponding 'open_index' request. If IN is specified in a find request, the /icol/-th parameter value of /v1/ ... /vn/ is ignored.
  • FILTERs are optional. A FILTER specifies a filter. /ftyp/ is either 'F' (filter) or 'W' (while). /fop/ specifies the comparison operation to use. /fcol/ must be smaller than the number of columns specified by the /fcolumns/ parameter of the corresponding 'open_index' request. Multiple filters can be specified, and work as the logical AND of them. The difference of 'F' and 'W' is that, when a record does not meet the specified condition, 'F' simply skips the record, and 'W' stops the loop.

Everything is so clear... I think we can skip this. Just kidding ;-) First, lets find a record by ID. First goes /indexid/, which in our case will be 1 (the index we opened for PRIMARY KEY). Second, /op/: we want a row with id=1, so this is "=". Third, /vlen/. This is actually the length of the value. A key in InnoDB table may consist of more than one column, and we still want to be able to query these multi-column indexes. In our case, /vlen/ is obviously 1. Then goes the /v1/ ... /vn/ part. Those are values that we are going to query our index. For this example, it is a single column, and the value is 1:

1   =   1   1 0   3   1   Sci-Fi  Star wars

Cool! It works! The result is interpreted as follows:

0 - no errors; 3 - number of columns [1  Sci-Fi  Star wars] - the row that we've got from InnoDB.

What if we try a different /op/? For example, let us get rows with IDs greater than 1:

1   /   1   1 0   3   2   Comedy  Dumb & Dumber

Only one row? We have more! Well, its documented: "LIM is optional. /limit/ and /offset/ are numbers. When omitted, it works as if 1 and 0 are specified". Let's try to specify LIMIT high enough for us to see all the interesting rows.

1   /   1   1   10  0 0   3   2   Comedy  Dumb & Dumber   3   Thriller    The Silence of the Lambs    6   Sci-Fi  Star Trek

You see, all the results are returned in a single line:

0 - no errors; 3 - number of columns [2  Comedy  Dumb & Dumber] - row 1 [3  Thriller    The Silence of the Lambs] - row 2 [6  Sci-Fi  Star Trek] - row 3

The client app has to deal with this (well, its no problem since you know the number of columns in each row). Are you asking yourself a question "what if we actually have a TAB character in our data"? Everything will be screwed up! Well, HandlerSocket clients encode characters from 0x00 to 0x0f using a special algo (such characters are prepended with 0x01 and shifted by 0x40). And that is what you client library should take care of. When HS is generating response, it also encodes those chars in the same, and your client has to decode them. For the sake of simplicity, we will not use characters that need encoding here.

Now, we got the IN clause, which means that, having a bunch of IDs, you can get them all at once from HS! I made a few experiments. First, I did not really know what to do with /op/, /vlen/ and /v1/ ... /vn/. Actually, we are going to specify the IDs that we are going to get, in the IN clause. OK, lets try like we did for basic ID search

The IN part is "@ 0 1 2", which is: column number 0, 1 values in total, and this value is 2

1   =   1   1   @   0   1   2 2   1   modop

"modop". Thank you HandlerSocket! That makes everything clear! Lets try to modfy it a little bit...

1   =   1   0   @   0   1   2 2   1   modop

OK... What do you want? May be if I specify LIMIT things will be better?

1       =   1   0   1   0   @   0   1   2 0   3   2   Comedy  Dumb & Dumber

Yeah! It seems like for the IN query, you HAVE to explicitly specify LIMIT and OFFSET. What is also weird is that you have to specify /op/, /vlen/ and /v1/ ... /vn/ - and apparently this just leads to confusion. In the example below, I specify value of 1 instead of 0 in the /vn/ section, and the answer is quite the same:

1   =   1   1   1   0   @   0   1   2 0   3   2   Comedy  Dumb & Dumber

Then I try the change the /op/:

1   /   1   1   1   0   @   0   1   2 0   3   3   Thriller    The Silence of the Lambs

Hell! Where does this come from?! Let us now try to get more that one ID:

1   /   1   1   1   0   @   0   2   2   3 0   3   3   Thriller    The Silence of the Lambs

OK, forgot about the LIMIT, lets specify that we want 2 rows:

1   /   1   1   2   0   @   0   2   2   3 0   3   3   Thriller    The Silence of the Lambs    6   Sci-Fi  Star Trek

I do not know where the hell does the "6 Sci-Fi Star Trek" come from. In the end, I got it that it safe to these requests like this:

1   =   1   0   3   0   @   0   3   2   3   1 0   3   2   Comedy  Dumb & Dumber   3   Thriller    The Silence of the Lambs    1   Sci-Fi  Star wars

You may notice that the results come in the very same order as we specified the IDs for the IN clause. However, I do not know whether it is a coincidence or expected behavior. At least, I would not rely on this. Now, let's proceed with the FILTERs part of the query syntax? What if we take our last query and ask HS to apply filter by genre = 'Sci-Fi' ? Remember when we opened our first index, we specified /fcolumns/? This is when we need them - in order to be able to apply additional filtering. If we did not do that, any request containing FILTERs, would return somthing like "2 1 filterfld", which is soooo self-explanating...

1   =   1   0   3   0   @   0   3   2   3   1   F   =   0   Sci-Fi 0   3   1   Sci-Fi  Star wars

Worked!

Well, that's actually all about finding rows. Lets get into UPDATEs. Hey Doc?

The 'find_modify' request has the following syntax.

/indexid/ /op/ /vlen/ /v1/ ... /vn/ [LIM] [IN] [FILTER ...] MOD

MOD is a sequence of the following parameters.

/mop/ /m1/ ... /mk/

  • /mop/ is 'U' (update), '+' (increment), '-' (decrement), 'D' (delete), 'U?', '+?', '-?', or 'D?'. If the '?' suffix is specified, it returns the contents of the records before modification (as if it's a 'find' request), instead of the number of modified records.
  • /m1/ ... /mk/ specifies the column values to set. The length of /m1/ ... /mk/ must be smaller than or equal to the length of /columns/ specified by the corresponding 'open_index' request. If /mop/ is 'D', these parameters are ignored. If /mop/ is '+' or '-', values must be numeric. If /mop/ is '-' and it attempts to change a column value from negative to positive or positive to negative, the column value is not modified.

The first part seems to be familiar. We've just seen that in the FIND section, and it looks like a WHERE clause for our UPDATE, right? Lets update the "Star wars" row and set view_count to 100.

1   =   1   1   U   1   Sci-Fi  Star Wars   100 2   1   modop

"modop". So nice of you, HandlerSocket! What if I add explicit LIMIT?

1   =   1   1   1   0   U   1   Sci-Fi  Star Wars   100 0   1   1

Huh! Check in MariaDB:

SELECT * FROM movie WHERE id = 1; +----+--------+-----------+------------+ | id | genre  | title     | view_count | +----+--------+-----------+------------+ |  1 | Sci-Fi | Star Wars |        100 | +----+--------+-----------+------------+

OK. Let's do something a bit more complicated. Increment all comedies' view_count by 10?

1   /   1   0   1000    0   F   =   0   Comedy  +   0   0   0   10 0   1   1

Great, it worked! What does MariaDB say?

SELECT * FROM movie; +----+----------+--------------------------+------------+ | id | genre    | title                    | view_count | +----+----------+--------------------------+------------+ |  1 | Sci-Fi   | Star Wars                |        100 | |  2 | 0        | 0                        |         10 | |  3 | Thriller | The Silence of the Lambs |          0 | |  6 | Sci-Fi   | Star Trek                |          0 | +----+----------+--------------------------+------------+

Nope, it did not work. At least, it did not work good. It incremented view_count in the specified row by 10 but it also updated all the other columns. Because "id" column is numeric, and we said to increment it by zero, it was left untouched. I will update it back:

1   =   1   2   1   0   U   2   Comedy  Dumb & Dumber   10

I can tell you beforehand, that specifying string values for the "genre" and "title" columns in a increment request will not work. They will be updated to "0", just like our first increment did. For the increment to work as expected, we'll need to open a different index, which will only contain "id" and "view_count" columns:

P   3   test    movie   PRIMARY id,view_count   genre 0   1 3   /   1   0   1000    0   F   =   0   Comedy  +   0   10 0   1   1

Now it is really OK:

SELECT * FROM movie; +----+----------+--------------------------+------------+ | id | genre    | title                    | view_count | +----+----------+--------------------------+------------+ |  1 | Sci-Fi   | Star Wars                |        100 | |  2 | Comedy   | Dumb & Dumber            |         30 | |  3 | Thriller | The Silence of the Lambs |          0 | |  6 | Sci-Fi   | Star Trek                |          0 | +----+----------+--------------------------+------------+

And once again. Be careful with columns that have dynamic default values like "created_at DATETIME DEFAULT CURRENT_TIMESTAMP" or "updated_at DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP". You have to set all of those values by hand! Otherwise, instead of having CURRENT_TIMESTAMP in the created_at column, you will get "0000-00-00 00:00:00". And if you simply skip "updated_at" column when doing an UPDATE, it will contain the old value.

I will leave the rest of experiments up to you, but I hope this article was a little bit of a help if you wanted to know about the HandlerSocket protocol.


Order print