# Machine representation of data

## Convert decimal to binary or hex conversion 

We can use the *bin* function to convert a decimal integer to binary in Python, with the result stored as a string.

For example, 
```python
bin(10)
```

returns '0b1010'. The prefix '0b' denotes that this is a binary number

In [None]:
bin(10)

Similarly, the *hex* function can be used to convert from a decimal to a hexadecimal value. The value returned will begin with '0x', which denotes a hexadecimal representation.

In [None]:
hex(10)

### Question

What is the number 47 in binary and hex?

## Conversion from binary/hex to decimal

To convert from binary or another representation to a decimal (integer) value, we can use

```python
int(s, b)
```

where

- s = the number stored as a string
- b = the base of the number system used for s

Example converting from binary to decimal (note that the second argument is 2):

In [None]:
int('1010',2)

Example converting from hexadecimal to decimal (note that the second argument is 16)

In [None]:
int('AA', 16)

### Question

What is the decimal value of the hex number 'FF'?

## Unicode representation of data

Each character has a unicode *code point*, which is a number assigned to each character.

A computer stores this character by *encoding* this number, using an encoding method (such as ASCII, UTF-8, and others). The encoding method determines the binary representation of the number. ASCII uses 8 bits to encode data, and is therefore limited to 256 characters; UTF-8 is a variable length encoding (where different characters are represented by different numbers of bits, and can encode > 1 million characters. The first 128 Unicode characters are have the same encoding in both ASCII and UTF-8.

We can use the *ord* function to get the Unicode code point (ordinal value) of a character, and *chr* to get the unicode string from a Unicode code point.

See this table as a reference: https://unicode-table.com/en/. Note that on this web page, the *Unicode number* is the hexadecimal representation of the code point, while the number in the *HTML code* is the corresponding decimal value.

The unicode code point of 'A':

In [None]:
ord('A')

Convert from unicode code point to character:

In [None]:
chr(65)

Let us look at how the character 'A' is encoded, using the default encoding, which is UTF-8. The *encode* function returns a *bytes* object, which is a sequence of values corresponding to how each character in the string is represented. When printing or displaying a *bytes* object, python shows us a *printable* representation of the object (not necessarily how it is stored in memory). In this case we see that the bytes string represents the character 'A'.

In [None]:
b = 'A'.encode()
b

But looking at the first (and only) byte, we see that it contains the integer value 65.

In [None]:
b[0]

We can get the hexadecimal representation of *b* by using the *hex* method:

In [None]:
b.hex()

And we can get the binary representation of a bytes object, which is how the machine actually stores the data, by using the following:

In [None]:
bhex = b.hex()       # convert to hex
bint = int(bhex,16)  # convert to an int
bin(bint)            # convert to binary

If we want, we can do this all in one statement:

In [None]:
bin(int(b.hex(),16))

We can *decode* a bytes object to get get the *Unicode* character/symbol:

In [None]:
b.decode()

To further demonstrate what is happening, we can create our own bytes object and decode it. A bytes object allows us to specify characters or hexadecimal values. For hexadecimal numbers, every two digits should be preceeded by \x.

In [None]:
b1 = b'\x41'
b1.decode()

### Another example (ðŸ˜Ž)

Reference: https://unicode-table.com/en/1F60E/

In [None]:
s = 'ðŸ˜Ž'
b = s.encode()
print('Unicode character:', s)
print('Unicode code point:', ord(s))
print('UTF-8 encoding:', b)
print('Encoded hex value:', b.hex())
print('Encoded decimal value:', int(b.hex(), 16))
print('Encoded binary value: ', bin(int(b.hex(),16)))

Let us now convert from the numeric representation to the character:

In [None]:
code_point = 128526
hex_encoding = b'\xf0\x9f\x98\x8e'

print('Unicode code point:', code_point)
print('Unicode character:', chr(code_point))
print('Hex encoded value: ', hex_encoding)
print('Coded value: ', hex_encoding.decode())

## The encoding needs to match the decoding

The *encoding* method is important. Unless specified otherwise, *encode* and *decode* use the default UTF-8 encoding method. But what happens when the decoding method does not match the encoding one? You may get a different character/symbol, or an error.

In [None]:
print('decoded using default UTF-8 method:', hex_encoding.decode())
print('decoded using UTF-16: ', hex_encoding.decode(encoding = 'utf-16'))
print('decoded using ASCII: ', hex_encoding.decode(encoding = 'ascii'))

### Unicode in strings

In addition, we can specify characters by including their unicode code points in strings, using \U followed by 8 hex digits.

In [None]:
s = 'Check out this unicode code point: \U0001F44D'
s

### Question

Specify the "Face with Party Horn and Party Hat Emoji" (https://unicode-table.com/en/1F973/) using the following:

- a unicode string
- bytes object using its hexadecimal utf-8 value

## Colors

Colors are often specified as RGB values, corresponding to the intensity of Red, Green, and Blue. On the Web, each color is represented by $1$ byte = $8$ bits = $2^8 = 256$ values. The total number of possible colors is therefore $256^3 = 16,777,216$. Colors can be specified either by name, by its rgb triplet (r,g,b), or as a hexadecimal value. Because Markdown cells allow for HTML/CSS, we can format text using colors as demonstrated in the following paragraphs. Note that rgb(0,0,0) is black and rgb(255,255,255) is white.

<p style = "background-color: yellow"> 
    Here we specify the color using its name
</p>

<p style = "background-color: rgb(255,105,180)">
    Here we specify the color using its rgb value
</p>

<p style = "background-color: #FF00FF">
   Here we specify the color using its hexadecimal value 
</p>

Note that FF is equivalent to $15\times 16^1 + 15\times 16^0 = 255$. In the last example (magenta), we therefore have the equivalent of rgb(255,0,255):

- R = FF (= 255 in decimal)
- G = 00 (= 0 in decimal)
- B = FF (= 255 in decimal)

## Images

We will look at a rainbow image with size 2064 x 1026 pixels (https://www.freepnglogos.com/images/rainbow-12383.html). Each pixel has 4 values, corresponding to a red, green, blue, and alpha (transparency) value. In this case, each pixel value is a value between 0 (no color) and 1 (maximum color). 

### Memory considerations

The data for this image contains 2064 x 1026 x 4 = 8,470,656 values. If 1 byte is used for each value, then we require 8,470,656 bytes. This is equivalent to 

- 8,470,656 bytes / 1024 bytes / kB = 8272.125 kB
- 8,470,656 bytes / 1024^2 bytes / MB = 8.08 MB

Without any optimization, this is the minimum amount of memory the image requires when fully loaded into software. However, various formats (such as png) will compress the image and require less storage space. 

In [None]:
from matplotlib import pyplot as plt
rainbow = plt.imread("https://www.freepnglogos.com/uploads/rainbow-png/rainbow-png-images-colors-the-sky-png-only-0.png")
plt.imshow(rainbow)
None

The data is stored in a multidimensional array (we can think of an array as a table; and a multi-dimensional array as multiple tables). For each of the 1026 x 2064 pixels, we have 4 values.

In [None]:
rainbow.shape

This is the first 'row' of the image, which represents the 4 color values for each of the 2064 columns of the image 

In [None]:
rainbow[0]

The image is defined by these numeric values, which determine the colors to display. To illustrate, let's draw a black bar between rows 300 - 399. We do this by setting each pixel value to 0.

In [None]:
r2 = rainbow.copy()
for row in range(300,400) :
    for col in range(r2.shape[1]) :
        r2[row,col,0] = 0  # set R
        r2[row,col,1] = 0  # set G
        r2[row,col,2] = 0  # set B
        r2[row,col,3] = 1  # alpha value (transparent = 0; opaque = 1)
plt.imshow(r2)
None

Let's turn up the amount of blue in the rainbow:

In [None]:
r2 = rainbow.copy()
for row in range(r2.shape[0]) :
    for col in range(r2.shape[1]) :         
        r2[row,col,2] = 1  # set B  
plt.imshow(r2)
None