Climate Tech Cities Event Collection Automation
Script Link: https://github.com/Raiidahmed/Event-Extractor-With-GUI
When I started working with the team at Climate Tech Cities, I was tasked with automating the process of event collection/curation.
This process was originally done through a workflow of the following:
Search Climate Tech on eventbrite
Click on Eventbrite Link
Use Airtable Web Clipper (provided that the format of the page hasn’t changed)
Polish information in Airtable
Copy/Paste output of a formatted Airtable formula into the Substack
This workflow was tedious, but why? It’s mainly due to the nonstandard nature of event sources, and the time it takes to physically click and read through each link.
We could use a webscraper, but this approach still creates issues when the web page format changes. In addition, the user still has to curate the events themselves in the output excel sheet.
I proposed a new workflow:
Collect URLs from Eventbrite using a GUI enabled webscraper (Webscraper.io)
Send collected URLs through a python script which will do the following:
Send the URLs to BeautifulSoup to extract the HTML body from the page
Send the HTML body over to GPT
Use GPT to section off parts of the page into the event information
Use GPT to apply a tag that shows if the event is relevant to Climate Tech
Save the event information in an output CSV
User reviews the events in the CSV, then uploads to Airtable to be used in the newsletter
This workflow automates the most time consuming parts of the script, while still giving the user the ability to check the results as needed.