Automated Data Synchronization & Integrity Solution
A Python-based automation solution developed to ensure data integrity and synchronization between centralized cloud spreadsheets (Google Sheets) and local file system documents. This tool automates comparison, filtering, and reporting to identify missing or mismatched files.
Developed during my student position at Ludan Group to address critical data integrity challenges between cloud-based master documents and local file systems.
Position: Student Intern
Organization: Ludan Group
Duration: July 2024 – May 2025
Focus: Data integrity, automation, and cloud integration
This Python solution was designed to solve the critical challenge of maintaining data integrity and synchronization between centralized cloud spreadsheets (Google Sheets) and local file system documents.
The core script automates the comparison, filtering, and reporting of documents, identifying any missing or mismatched files between the master document list in Google Sheets and the documents present in the local directory structure.
| Component | Technology | Role |
|---|---|---|
| Primary Language | Python 3.x | Core logic, data processing, and automation |
| Cloud API | Google Sheets API | Read access to master document sheets |
| Authentication | OAuth 2.0 / Service Account | Secure API access and authorization |
| Main Libraries | gspread, google-auth | Sheets interaction and authentication |
| Monitoring | JavaScript | Data monitoring and visualization scripts |
Python 3.x installed on your system
pip install gspread google-auth
credentials.jsonEdit main.py to configure your settings:
root_folder = "C:\\path\\to\\local\\documents"
sheet_id = "YOUR_GOOGLE_SHEET_ID_HERE"
python main.py
The solution includes several key functions:
Recursively scans the file system and lists all files and folders in the target directory.
Filters out keys from dictionaries that end with specific suffixes, used for data cleaning.
Normalizes and cleans file names to ensure consistent matching between cloud and local files.
Normalizes folder names for accurate path matching and comparison.
Retrieves data from Google Sheets using the API and processes it for comparison.