Inspiration

Our inspiration for creating DupliCheck and AttachAlert stemmed from a lack of awareness of the digital clutter that burdens many of our day-to-day lives — both in personal and professional contexts. We recognized that inefficient digital storage and forgotten email attachments not only lead to wasted time and resources but also contribute to environmental concerns by increasing the digital carbon footprint. By developing these tools, we aimed to address these challenges, promoting sustainable digital practices. The need to reduce unnecessary digital waste and carbon-dioxide emissions associated with excessive data storage and transmission became a driving force for us. We wanted to empower individuals and organizations to make more conscious, eco-friendly decisions about their digital habits, ultimately helping to lessen their environmental impact.

What it does

DupliCheck is a digital tool designed to streamline data management by identifying and eliminating duplicate files across your storage systems. As digital clutter accumulates, it not only consumes valuable space but also slows down system performance. DupliCheck addresses these issues by scanning your filesystem to locate and flag duplicates. With its intuitive interface, DupliCheck allows users to quickly review and decide whether to delete or keep the identified duplicates, effectively freeing up space and optimizing digital efficiency.

AttachAlert is a Chrome extension designed to prevent the common issue of forgetting email attachments. By monitoring your email composition in real-time, AttachAlert detects mentions of attachments within your text and alerts you if you attempt to send the email without including them. Ideal for both professional and personal use, AttachAlert ensures your communications are complete and accurate.

How we built it

DupliCheck was built using the powerful capabilities of Python, a versatile programming language renowned for its simplicity and robust library ecosystem. We utilized the ImageHash library to efficiently calculate and compare unique hashes of images, allowing for the quick identification of duplicates based on visual content. To manage and process large datasets effectively, we incorporated Apache Spark, which enabled high-performance parallel processing of data across multiple machines.

AttachAlert was crafted using JSON, JavaScript, and HTML, which together provided a robust foundation for our Chrome extension. We utilized JavaScript for its event handling and DOM manipulation capabilities, enabling us to effectively monitor user actions and interact with Gmail's complex web interface. HTML was employed to structure the user interface of our extension. JSON acts as a backbone for the chrome extension. It informs Chrome about the extension’s basic properties, how it should behave, and which permissions it requires.

Challenges we ran into

In the development of DupliCheck, we encountered significant technical challenges, particularly when integrating advanced image analysis features. One of the key hurdles was handling the mathematical computations of image hash values to assess similarity between images effectively. This process proved to be complex due to the nuanced variations in images that required precise calculation to avoid false matches or misses. Additionally, we faced challenges with implementing parallel computation using the Spark library in Python. This was intended to enhance the efficiency of processing large datasets simultaneously. However, configuring Spark to seamlessly manage two inputs for concurrent processing involved intricate tuning of our algorithms and optimization of our system architecture to ensure robust, scalable performance.

While developing AttachAlert, one of the primary difficulties we encountered was accurately determining whether an attachment was included in an email. The challenge stemmed from the complexity of Gmail's dynamic user interface, which changes frequently and varies between users based on their settings and extensions. This variability made it difficult to reliably identify the attachment indicators across different user environments.

Accomplishments that we're proud of

As a team we are immensely proud of the innovative solutions we have developed with DupliCheck and AttachAlert. Despite facing significant technical challenges, our team successfully created DupliCheck, a tool that not only identifies and removes duplicate files efficiently but also integrates cutting-edge image hashing techniques for precision and uses Spark for high-performance parallel computing. Similarly, AttachAlert stands out as a pivotal development in email communication safety, effectively preventing the common mistake of omitting attachments. This Chrome extension showcases our ability to adapt to the dynamic nature of Gmail's interface. These achievements not only reflect our technical proficiency but also our commitment to enhancing user experience and digital efficiency.

What we learned

Throughout the development of DupliCheck and AttachAlert, our team gained invaluable insights into advanced computational techniques and user interface dynamics. We deepened our understanding of image hashing algorithms and their applications in detecting duplicates, which challenged us to think critically about data accuracy and processing efficiency. Working with the Spark library enhanced our skills in parallel computing, teaching us how to optimize data processing for scalability and speed. Additionally, integrating AttachAlert into Gmail's constantly evolving interface provided a practical lesson in the complexities of developing browser extensions that interact with web applications. This experience sharpened our abilities in handling dynamic content. These projects not only broadened our technical expertise but also improved our problem-solving strategies and teamwork dynamics.

What's next for DupliCheck and AttachAlert

For DupliCheck, our focus is on enhancing its versatility and performance. Currently tailored to identify duplicate images, we plan to extend its capabilities to encompass all file types, making it a universally applicable tool for digital cleanup across diverse data formats. Furthermore, we are dedicated to improving the parallel processing power of the Spark framework used in DupliCheck. By incorporating more machines into our system, we aim to dramatically increase the efficiency of data handling and speed of operation, ensuring that DupliCheck can manage larger datasets with even greater agility.

For AttachAlert, we plan to extend its capabilities to further enhance email accuracy and integrity. Our next development phase will focus on not only detecting forgotten attachments but also verifying the inclusion of links when mentioned in the email body. Additionally, we aim to incorporate a feature that checks whether the email includes a subject before sending. These enhancements will provide a more comprehensive safeguard against common email oversights, ensuring that every email sent is complete and professionally crafted.

Built With

Share this project:

Updates