What we did for the project • Our developer will manually feed the system with links that need to be scrapped • However, no two websites will have a similar structure. So specification of various websites has to be entered manually to the system so that it can fetch data accordingly. • Websites have to be manually checked to see whether they allow scrapping or not. • For repetitive scrapping from a page, a program was written in the system so that unchecked or links with error will be crawled after a certain interval of time. • The system has the capacity to crawl and scrape multiple websites. For us, the system had to crawl 16 websites simultaneously and upon ranking. • Ranking can be set in the pipeline of the system. • Developed a parser that will break down the scrapped data. • If there is an issue with the link or even crawling, notification will be sent and will restart the crawling process again from where it has stopped. • Forking was done to spawn subprocesses from a parent process. • An interface was created to review the data collected. • Acquired data has been added to a database. • Developed a program that will send mail to concerned people once a program has ended.