Saturday, June 11, 2016

Converting a sixty four bit number into a string

This is one of the functions that I had the most trouble with, even though it is quite simple. The point here is to translate a sixty four bit number into a string using any base from two to sixty four.


long stringAppendInt64(typeString* str,unsigned long long value,long base)
  {
  static const char digits[] = "0123456789ABCDEFGHIGKLMNOPQRSTUVWXYZabcdefghigklmnopqrstuvwxyz-_";
  unsigned long     start,length;

  if ((base < 2) || (base > 64))
    return 0;

  start = str->Length;

  do
    {
    stringAppendCharacter(str,digits[value % base]);

    value = value/base;
    }
  while (value != 0);

  length = str->Length - start;

  stringReverseRange(str,start,length);

  return 1;
  }

long stringAppendInt64(typeString* str,unsigned long long value,long base)
This function takes a sixty four bit number, and appends a string with the text equivalent. The parameters are the string object, the number to be converted as a sixty four bit integer, and the base that we use to represent the number. The return value is just a Boolean value. I call it long because for some reason, my compiler does not like bool as value.

static const char digits[] = "0123456789ABCDEFGHIGKLMNOPQRSTUVWXYZabcdefghigklmnopqrstuvwxyz-_";
The first thing to notice after that is the digits array. In the video below, Tom Scott Explains how the YouTube video ID's use a sixty four bit number system with eleven digits. This code does not limit itself to eleven digits, it just converts any number you give it into a string, and the string is as long as it needs to be. Anyway, the characters used to represent the resulting number are, zero to nine, followed by uppercase A through Z, followed by lowercase a through z followed by two other characters. As it says in the video, the characters that YouTube uses are dash and underscore because that is what works best in URL's. If you want to change what symbols are used to build your number, just change the symbols, just make sure that there are enough of them. This function limits itself to sixty four, but you can just use any base you want up to infinity, or until you run out of symbols.

 if ((base < 2) || (base > 64))
    return 0;
The first thing the function does is check that the base is more than two and less than sixty four. A base of zero or one would not make any sense, and any base of more than sixty four would not work simply because we don't have enough symbols. If you don't understand what we mean by base with numbers, read this article on Wikipedia about Positional Notation to get the idea.

 start = str->Length;
 length = str->Length - start;
 stringReverseRange(str,start,length);
These three lines of code are important because of the way the string object is structured, they really have nothing to do with building the number string. We are appending the numbers to the end of an existing string. We also need to reverse the resulting string because of the way the math works. We need the start and length variables in order to know where the resulting numbers string is located inside the main string. The first line saves the current length of the string before we start adding digits, the second lines calculates how far we have gone into after we have added our digits. After that, the third line reverses that portion of the string. The reason we reverse is because we are subtracting digits out of the number starting with the least significant digits at the right, but we are adding characters to the string by starting at the left and going right. This results in the number being inverted. There is a way we could do this from left to right, but it would be slow and complicated.

 do
    {
    stringAppendCharacter(str,digits[value % base]);

    value = value/base;
    }
  while (value != 0);
This is the part were we actually convert the number into a string. First, the reason why it is a do while loop is for when the number we are converting is zero. In that case, we want to run through the loop at least once, otherwise nothing would be displayed. We just loop as long as the result is not zero. This method would work for negative numbers, but there is the problem of what to do with the minus sign. I don't want to include that here because with the various bases, that is best left with the client function to decide how to represent extra symbols like plus or minus signs as well as hexadecimal or bit filed markers. The math is, divide the number by the base, and use the remainder as in index into the list of symbols. Here we do the division twice, first with the modulo operator to get the remainder, and then again to set the value to the result of the division. I am wondering if there is a way to do this in one operation, I don't think there is. Each time we loop, the value last digit of the number is taken out. For example, with the number one hundred and twenty three in base ten, we would go through the loop three times. After the first time the digit to be printed would be three, and the value would become twelve. After the second loop the digit to be printed will be two, and the value will become one. And after the third loop , the digit we print will be one, and the value is now zero and the loop will not run again because it will hit the while part of the while loop. After all that we reverse the string to get it printed correctly and we are done. Note here that there is no way to know how long the final string will be. A number that is three digits long is bases sixty four may be dozens of digits long is base two, so make sure that there is enough space allocated before trying to print large numbers.

No comments:

Post a Comment