Property Data Works was a 6 person team aggregating and analyzing real estate data from south Florida (e.g. Miami-Dade, Broward, Palm Beach) for real estate investors, mortgage loan officers, realtors, and more. The team built an impressive web app and extensive data collection and processing system, but we failed to gain enough traction to stay afloat and folded March 2021.
As a software engineer, my contributions included building an Optical Character Recognition pipeline and information extraction system which processed about 20 million PDFs (~10 TB). In essence, we started with many PDFs, and it was my responsibility to create machine readable texts for each document, then extract key fields from each document (viz. document titles, dates, monetary values) to eventually display to the end-user. I wrote an article on OCR in Python based on this experience.