Writing a Python Program for String Compression

Introduction

In the world of computer programming, dealing with strings is a fundamental task. Strings are sequences of characters that represent text and are used in countless applications. Sometimes, the need arises to optimize the storage or transmission of strings, and one common technique is string compression.

String compression involves converting a longer string into a shorter representation while still retaining its original meaning. In this article, we’ll explore the concept of string compression and learn how to implement a string compression program in Python.

Learn Also: Check Permutation of Two Strings in Python

Understanding String Compression

String compression aims to reduce the length of a string by replacing repeated characters with a single instance of that character followed by the count of its consecutive occurrences. For instance, the string “aaabbbccc” can be compressed to “a3b3c3”. This reduction in length can lead to improved storage efficiency and faster data transmission, particularly in scenarios where long strings with repeated characters are encountered frequently.

The Python Program

Now, let’s implement a simple string compression program in Python. We’ll define a function called `compress_string` that takes an input string and returns its compressed version if the compression leads to a shorter representation. Otherwise, the function will return the original string.

Here’s the implementation:

def compress_string(s):
    compressed = []
    count = 1

    for i in range(1, len(s)):
        if s[i] == s[i - 1]:
            count += 1
        else:
            compressed.append(s[i - 1] + str(count))
            count = 1
    
    # Adding the last character's count
    compressed.append(s[-1] + str(count))  
    
    compressed_str = ''.join(compressed)
    
    return compressed_str if len(compressed_str) < len(s) else s

# Test the function
input_string = input("Enter a string to compress: ")
compressed_result = compress_string(input_string)
print("Compressed string:", compressed_result)

Output

Enter a string to compress: aaabbbcccddd
Compressed string: a3b3c3d3

In this program, we iterate through the input string character by character. We keep track of the current character’s count using the `count` variable. If the current character is the same as the previous one, we increment the count. If they are different, we append the previous character along with its count to the `compressedlist and then reset the count to 1. This process continues until we’ve processed all characters.

Finally, we add the last character and its count to the `compressed` list, then join the elements of the list to create the compressed string. The program then compares the length of the compressed string with the original string. If the compressed string is shorter, it is returned; otherwise, the original string is returned.

Complexity Analysis

Let’s analyze the time and space complexity of the above string compression program.

Time Complexity

The time complexity of the string compression program is primarily determined by the loop that iterates through the input string. Let’s break down the key operations within the loop:

  1. Iteration through the Input String: The loop runs for `n-1` iterations, where `n` is the length of the input string. This is because we compare each character with the previous character (except the first character).
  2. Appending to `compressed` List: The appending operation to the `compressed` list takes constant time since it involves adding elements to the end of a list.
  3. Joining `compressed` List: After the loop, there’s an operation to join the elements of the `compressed` list to create the compressed string. The `join` operation takes `O(m)` time, where `m` is the length of the `compressed` list.

Overall, the dominant factor in the time complexity is the iteration through the input string. Therefore, the time complexity of the program is approximate `O(n)`, where `n` is the length of the input string.

Space Complexity

The space complexity of the program refers to the amount of memory required by the program as a function of the input size. Let’s analyze the key factors contributing to the space complexity:

  1. `compressed` List: The program maintains a `compressed` list to store the compressed characters and their counts. In the worst case, if there are no repeated characters, each character would have its own entry in the list. Therefore, the space complexity due to the `compressed` list is `O(n)`.
  2. Other Variables: The program uses a constant amount of extra memory for variables like `count`, `i`, and `compressed_str`. These variables contribute only a constant amount of space regardless of the input size.

Hence, the dominant factor in the space complexity is the `compressed` list. As a result, the space complexity of the program is `O(n)`.

Learn Also: URLify a Given String using Python

Conclusion

String compression is a useful technique in various scenarios where space efficiency is crucial. By representing repeated characters with a concise notation, we can significantly reduce the length of a string.

In this article, we’ve explored the concept of string compression and walked through the process of implementing a string compression program in Python. This simple program demonstrates how fundamental programming concepts can be used to solve real-world problems efficiently.

Share your love
Subhankar Rakshit
Subhankar Rakshit

Hey there! I’m Subhankar Rakshit, the brains behind PySeek. I’m a Post Graduate in Computer Science. PySeek is where I channel my love for Python programming and share it with the world through engaging and informative blogs.

Articles: 147