UDP protocol with a header implementation in python

Abdella Solomon
6 min readOct 20, 2021

--

Prerequisites- This article assumes a basic understanding of networking, UDP protocol, and python programming language.

Attention: To continue with this article(tutorial), you have to know how UDP protocol works and how we use UDP headers in order to calculate checksum and do some other stuff. If you don’t fulfill this requirement, I will suggest you read my article on the explanation of UDP protocol and headers. Use this link here.

In the previous article(UDP protocol and header explanation), we said that there might be data loss or data corruption while communicating through the UDP protocol. And we said that the solution for data corruption is calculating the checksum of packets on both sender and receiver sides. So, in this article like it says from its title, we are going to experiment with UDP protocol, headers including checksum calculation.

First of all, let’s create a simple script with UDP protocol excluding UDP headers and checksum calculation. In this article, we are going to use python’s built-in module “socket”. You don’t have to worry if you never worked with this module. I will try to explain things briefly. First, let’s create the socket object for the UDP protocol.

import socketsocket_object = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
socket_object.bind(("127.0.0.1", 1111))

Well, on the above 3 lines of code, we imported the socket module, created the socket object(UDP socket object) that we are going to use for sending and receiving data, and bonded with a local host and port number. We bind on a specific address means, senders can send data to us by targeting the specified address. The thing that makes the socket_object a UDP protocol socket is the argument of the socket.socket() object. The first argument(socket.AF_INET) is the address family. It is used to designate the type of addresses that our socket can communicate with (in this case, Internet Protocol v4 addresses). The second parameter(socket.SOCK_DGRAM) is the only one making the socket a UDP protocol socket. If we were to use a socket.SOCK_STREAM, it would have been a TCP protocol socket. So to make things clear, SOCK_DGRAM is for UDP protocol and SOCK_STREAM is for TCP protocol.

data = “Hello world”
packet = data.encode()
receiver_addr = (“127.0.0.1”, 1112) # For now, it is just an arbitary address
socket_object.sendto(packet, receiver_addr)

On the above 4 lines of code, we are trying to send data(packet) to a receiver(client). As you see it there the data is “Hello world”. We are encoding(changing it to bytes) the data because we can only send bytes over a socket. After that on the last line, we are finally sending the packet to a receiver.

In the above simple example, we are just sending the data. There are no UDP headers therefore we didn’t calculate for the checksum. That means, our data is in danger 😢. Like we said earlier, we have to make a UDP header, put the checksum in the UDP header and finally attach it to the data. But wait, how are we going to do that? Well, we have so many modules to work with. In this tutorial, we will use the Zlib library for calculating checksum and the struct library for making the UDP header. First, let’s create a checksum calculator function.

import zlibdef checksum_calculator(data):
checksum = zlib.crc32(data)
return checksum

Great! We have a checksum calculator function now. So, let’s calculate the checksum for the previous data. We usually calculate checksum for bytes data so, we are going to use the encoded data.

data = “Hello world”
packet = data.encode()
checksum = checksum_calculator(packet)

Booyah! We have the checksum calculated now. Well, let’s continue. let’s create the UDP header data. But before we do that, we said that the UDP header has 4 fields. The source port number, the destination port number, the length of the data, and the checksum. For now, we are going to fill them with arbitrary data. But if you just want to deal with data corruption, you can ignore(fill with arbitrary data) the other stuff except for the checksum.

source_port = 1111
destination_port = 1112
data_length = len(packet)
checksum = checksum_calculator(packet)
udp_header = struct.pack(“!IIII”, source_port, destination_port, data_length, checksum)

In the above 6 lines of codes, we are trying to created a UDP header with the 4 fields required. On line number 5, we are creating the UDP header with the struct module. On, the first parameter of the struct.pack() we are telling the UDP header what type of data it is and also how many fields we are using. “IIII” four I means, we are going to packet 4 fields, and “I” itself means, unsigned integer with a standardized size of 4. There are many other symbols(letters) that we can use. They are called Format Characters You can check them from the official docs using this link here. So, because one “I” size is 4 and we have 4 parameters. The total UDP header size is 4 * 4 that is 16.

Combining this UDP header with the actual encoded data is so easy. We will just add them together like we add two strings(bytes)

packet_with_header = udp_header + packet

Great! Now, we have a packet that contains a UDP header. The final thing left to do is just send this packet to the receiver. Hmm, But how are we going to extract the UDP header from the data after the receiver got the full packet? Well, here is the answer. I told you above that the UDP header size is 16. And like you saw, we are combining the UDP header and the data with the UDP header in the first part. So, we are gone divide the data into two parts. The first one(UDP header) will be the first 16 bytes and the second one(data) will be bytes after the 16th byte. And finally, unpack the first part(UDP header) using the struct module as we packed it first.

full_packet, sender_address = socket_object.recvfrom(1024)
udp_header = full_packet[:16]
data = full_packet[16:]
udp_header = struct.unpack("!IIII", udp_header)
correct_checksum = udp_header[3]

On the above 5 lines of code, we are first trying to receive data from the sender. After that, we are dividing the data into two parts. Specifically the UDP header and the data part. After that, we are unpacking the UDP header. After we unpack it, the function will return a tuple with the 4 fields we provided. So, because the checksum is the last(4th) one. we are mapping the correct_checksum variable to that.

Great! Now the left thing is calculating the checksum of the second part(data). After we calculate the checksum, if the checksum is equal to this correct_checksum one, the data is not corrupted otherwise the data is corrupted.

correct_checksum = udp_header[3]
checksum = checksum_calculator(data)
is_data_corrupted = correct_checksum != checksum

We have another problem that we forgot, like we said if the checksums are equal, the data is not corrupted. But what if the checksums are not equal(Which means the data is corrupted)? Don’t worry! We have a good solution for that. If the data is in case corrupted, we have many ways to retrieve the data. One way is asking for the sender to send us the correct data again by giving it some commands or from the sender side, let the sender send the packet again if it didn’t receive an acknowledgment at a specific time. Both of them work. It is up to you to decide which way to use. You can make your own way too.

This is all about UDP headers implementation(experiment) in python. I hope you liked my explanation. If you have any feedback or suggestion, please shoot them in the comment section.

Please support my articles by sharing them with your friends or someone who needs them and following me on Medium. And also follow me on Twitter and Github.

--

--