Base64 encoding library with Arduino

Base64 encoding library with Arduino

Base64 encoding library with Arduino

Base64 is a coding system that uses 64 symbols grouped into messages that have a length multiple of four. These messages (data packets) are completed, if necessary, with a plus symbol (so 65 are used), often the equal sign (=), if the useful information encoded results in a shorter length.

Table of Contents

    Using 64 signs you can work with the 10 numbers and upper and lower case letters (26+26) of the code ASCII, the problem is that there are 62, let's say, unambiguous symbols plus two that vary in different implementations. Although sometimes referred to by the expression "characters ASCII printable", in reality they are those that range from the one represented by the code 32 (space) to 126 (~) the 95 truly printable.

    The implementation of coding Base64 most used, that of PEM, which is also used by MIME, work with the extra "+" and "/" signs and the "=" sign to pad so that the packets have a length multiple of four. The letters AZ occupy positions 0-25, the letters az occupy positions 26-51, the numbers 0-9 occupy positions 52-61, the plus sign (+) positions 62, and position 63 is occupied by the slash (/ ).

    The way to represent data in format Base64 consists of taking, from the original data, groups of 6 bits which are represented with the corresponding code. If there are bits left over, they are filled with zeros to the right. If the resulting number of codes is not a multiple of four, it is filled in with equal signs to the right.

    The following image shows the coding ASCII of a text ("ohm") and the way in which it is converted to Base64. Since there are 7 symbols, the final message would need to be filled with an equal sign at the end. It could be said that the text "ohm" in ASCII equivalent to «b2htaW8=" in Base64.

    Base64 encoding example

    Specific uses of coding Base64 They also usually impose a maximum line length. The implementation MIME Limits each line to 76 characters. Normally the lines will be separated by an end-of-line code (CR, represented by the value 0x0D in ASCII) and another new line (NL, which corresponds to the code ASCII 0x0A).

    The inconvenience that is added when implementing coding Base64 on a device with few resources, as is often the case with a microcontroller is that you have to code as the information arrives or with a buffer minimum, which also requires providing a system that indicates that the end of the original message has been reached, for example, by adding a special code, or by using a pin whose level (synchronized with reception) indicates the status of the message.

    The example code below is a library for Arduino to encode in Base64 which is implemented with both criteria: encoding the information that arrives (without a buffer) and wait for a warning signal to finish.

    The fundamental part of the code calculation Base64 It is done with the expression:
    (valor_original>>(2+(numero_valor%3)*2))|resto_base64
    and the calculation of the remainder with the expression:
    (valor_original&(MASCARA_B64>>desplazamiento))<<desplazamiento,
    siendo desplazamiento a value that is calculated with the expression:
    4-(numero_valor%3)*2

    The process followed to obtain these expressions consists of generalizing the calculation of each of the four codes Base64 that result from representing three bytes of the original value.

    Base64=((byte_1>>2)|resto)&0b00111111 resto=(byte_1&0b00000011)<<4
    Base64=((byte_2>>4)|resto)&0b00111111 resto=(byte_2&0b00001111)<<2
    Base64=((byte_3>>6)|resto)&0b00111111 resto=(byte_3&0b00111111)<<0
    Base64=((byte_3>>0)|resto)&0b00111111 resto=(byte_3&0b00111111)<<0

    With the text Base64 The pseudocode above refers to the code in Base64 that is being calculated. The expression has been used byte_n to refer to the nth byte being encoded. The text resto represents the leftover bits of the byte being encoded. At the beginning of the calculation it is assumed that the remainder is zero

    For clarity, in the previous pseudocode the 6-bit mask has been included in the calculation of all the codes, although it is only necessary to determine the last of them, since the others are rotated so that the two most bits are always lost. significant.

    As can be seen, the fourth code is all remainder and there is no need to calculate a remainder afterwards; It is therefore only necessary to perform three steps, one per encoded byte. It is important to remember that, if a third byte in a packet were not encoded, the last code would have to be filled with zeros on the right. Base64 obtained.

    To generalize, the right rotation of the expression that calculates the code in Base64 can be represented as 2+(numero_byte%3)*2 so that the part inside the parentheses would rotate from zero to two, resulting in 2, 4 and 6 at each step. Of course it is not the only way to generalize, but I have chosen this one for functionality and above all for clarity. Since the mask (AND) was only necessary in the fourth code and it has already been seen that it is not necessary to calculate it (it is all remainder), it is not included in the final expression to simplify it, although we must remember that the type of data used (byte ) only the 6 least significant bits are taken.

    The left rotation of the rest can be generalized in a way analogous to the previous one. It can also be seen that the mask that is applied (AND) undergoes the same bit rotation but in the opposite direction. That is the reason for calculating the displacement with 4-(numero_valor%3)*2 before applying it in the sense corresponding to each part of the expression.

    The following example shows how to use the library to encode a text string (remember that Base64 can be used for any data set, such as an image, for example). In the following code there are a couple of details that are interesting to clarify. First, a special symbol (the ~ symbol) has been used to indicate the end of the text, instead of a hardware signal or indicating the length of the text. Logically, that symbol cannot be part of the data that is encoded.

    The second issue that must be considered, as important as it is obvious, is that the decoder at the destination must know how the information that reaches it is represented. The text includes characters that do not belong to the set ASCII printable (from 32 to 126), letters with an accent, for example. Arduino will use two bytes (UTF-8) to represent these characters. The usual one cannot simply be used \0 as a text terminator since, in many cases, the first byte with which a character is represented will be precisely zero.

    Line 26 of the previous example shows the use of the library for Arduino to encode in Base64. It is only necessary to indicate the method convertir each byte you want to encode and optionally whether it is the last one or, if not, stop the conversion with the method terminar when you reach the end.

    As can be seen in the screenshot below, the example program of the library for Arduino to encode in Base64 first displays the text to be encoded in Base64, in this case, the beginning of the famous song of the giants Les Luthiers, and subsequently the result of encoding in Base64 using format line length MIME.

    Base64 encoding with Arduino. Text conversion output example

    Post Comment

    You May Have Missed