Hot File

How to Image Scraping with Symfony’s DomCrawler

View: 655    Dowload: 0   Comment: 0   Post by: hanhga  
Author: none   Category: Php&mySql   Fields: Other

9 point/2 review File has been tested

A photographer friend of mine implored me to find and download images of picture frames from the internet. I eventually landed on a web page that had a number of them available for free but there was a problem: a link to download all the images together wasn’t present.

Introduction

A photographer friend of mine implored me to find and download images of picture frames from the internet. I eventually landed on a web page that had a number of them available for free but there was a problem: a link to download all the images together wasn’t present.

I didn’t want to go through the stress of downloading the images individually, so I wrote this PHP class to find, download and zip all images found on the website.

How the Class works

It searches a URL for images, downloads and saves the images into a folder, creates a ZIP archive of the folder and finally deletes the folder.

The class uses Symfony’s DomCrawler component to search for all image links found on the webpage and a custom zip function that creates the zip file. Credit to David Walsh for the zip function.

Coding the Class

The class consists of five private properties and eight public methods including the __construct magic method.

Below is the list of the class properties and their roles. 
1. $folder: stores the name of the folder that contains the scraped images. 
2. $url: stores the webpage URL. 
3. $html: stores the HTML document code of the webpage to be scraped. 
4. $fileName: stores the name of the ZIP file. 
5. $status: saves the status of the operation. I.e if it was a success or failure.

Let’s get started building the class.

Create the class ZipImages containing the above five properties.

<?php
class ZipImages {
    private $folder;
    private $url;
    private $html;
    private $fileName;
    private $status;

Create a __construct magic method that accepts a URL as an argument. 
The method is quite self-explanatory.

public function __construct($url) {
    $this->url = $url; 
    $this->html = file_get_contents($this->url);
    $this->setFolder();
}

The created ZIP archive has a folder that contains the scraped images. The setFolder method below configures this.

By default, the folder name is set to images but the method provides an option to change the name of the folder by simply passing the folder name as its argument.

public function setFolder($folder="image") {
    // if folder doesn't exist, attempt to create one and store the folder name in property $folder
    if(!file_exists($folder)) {
        mkdir($folder);
    }
    $this->folder = $folder;
}

setFileName provides an option to change the name of the ZIP file with a default name set tozipImages:

public function setFileName($name = "zipImages") {
    $this->fileName = $name;
}

At this point, we instantiate the Symfony crawler component to search for images, then download and save all the images into the folder.

public function domCrawler() {
    //instantiate the symfony DomCrawler Component
    $crawler = new Crawler($this->html);
    // create an array of all scrapped image links
    $result = $crawler
        ->filterXpath('//img')
        ->extract(array('src'));

// download and save the image to the folder 
    foreach ($result as $image) {
        $path = $this->folder."/".basename($image);
        $file = file_get_contents($image);
        $insert = file_put_contents($path, $file);
        if (!$insert) {
            throw new \Exception('Failed to write image');
        }
    }
}

After the download is complete, we compress the image folder to a ZIP Archive using our customcreate_zip function.

public function createZip() {
    $folderFiles = scandir($this->folder);
    if (!$folderFiles) {
        throw new \Exception('Failed to scan folder');
    }
    $fileArray = array();
    foreach($folderFiles as $file){
        if (($file != ".") && ($file != "..")) {
            $fileArray[] = $this->folder."/".$file;
        }
    }

    if (create_zip($fileArray, $this->fileName.'.zip')) {
        $this->status = <<<HTML
File successfully archived. <a href="$this->fileName.zip">Download it now</a>
HTML;
    } else {
        $this->status = "An error occurred";
    }
}

Lastly, we delete the created folder after the ZIP file has been created.

public function deleteCreatedFolder() {
    $dp = opendir($this->folder) or die ('ERROR: Cannot open directory');
    while ($file = readdir($dp)) {
        if ($file != '.' && $file != '..') {
            if (is_file("$this->folder/$file")) {
                unlink("$this->folder/$file");
            }
        }
    }
    rmdir($this->folder) or die ('could not delete folder');
}

Get the status of the operation. I.e if it was successful or an error occurred.

public function getStatus() {
    echo $this->status;
}
Process all the methods above.
public function process() {
    $this->domCrawler();
    $this->createZip();
    $this->deleteCreatedFolder();
    $this->getStatus();
}

You can download the full class from Github.

Class Dependency

For the class to work, the Domcrawler component and create_zip function need to be included. You can download the code for this function here.

Download and install the DomCrawler component via Composer simply by adding the following require statement to your composer.json file:

"symfony/dom-crawler": "2.3.*@dev"

Run $ php composer.phar install to download the library and generate the vendor/autoload.phpautoloader file.

Using the Class

Make sure all required files are included, via autoload or explicitly.

Call the setFolder , and setFileName method and pass in their respective arguments. Only call thesetFolder method when you need to change the folder name.

Call the process method to put the class to work.

<?php
    require_once 'zipfunction.php';
    require_once 'vendor/autoload.php';
    use Symfony\Component\DomCrawler\Crawler;
    require_once 'vendor/autoload.php';

    //instantiate the ZipImages class
    $object = new ArchiveFile('http://sitepoint.com');
    // set the zip file name
    $object->setFolder('pictureFrames');
    // set the zip file name
    $object->setFileName('myframes');
    // initialize the class process
    $object->process();

 

How to Image Scraping with Symfony’s DomCrawler

How to Image Scraping with Symfony’s DomCrawler Posted on 01-04-2016  A photographer friend of mine implored me to find and download images of picture frames from the internet. I eventually landed on a web page that had a number of them available for free but there was a problem: a link to download all the images together wasn’t present. 4.5/10 655

Comment:

To comment you must be logged in members.

Files with category

  • How to Picking the Brains of Your Customers with Microsoft’s Text Analytics

    View: 3806    Download: 0   Comment: 0   Author: none  

    How to Picking the Brains of Your Customers with Microsoft’s Text Analytics

    Category: Php&mySql
    Fields: Other

    2.5/2 review
    With the explosion of machine learning services in recent years, it has become easier than ever for developers to create “smart apps”. In this article, I’ll introduce you to Microsoft’s offering for providing machine-learning capabilities to apps.

  • How to MySqli Tutorial PHP MySqli Extension

    View: 320    Download: 0   Comment: 0   Author: none  

    How to MySqli Tutorial PHP MySqli Extension

    Category: Php&mySql
    Fields: Other

    0/0 review
    PHP provides three api to connect mysql Database.

  • Make Laravel Artisan Commands

    View: 294    Download: 0   Comment: 0   Author: none  

    Make Laravel Artisan Commands

    Category: Php&mySql
    Fields: Other

    0/0 review
    Artisan is the command line tool used in Laravel framework. It offers a bunch of useful command that can help you develop application quickly. Apart from Artisan available commands, you can create your own custom commands to improve your workflow.

  • Check if a Number is a Power of 2

    View: 278    Download: 0   Comment: 0   Author: none  

    Check if a Number is a Power of 2

    Category: Php&mySql
    Fields: Other

    2.25/2 review
    How to check if a number is a power of 2. To understand this question, let’s take some example.

  • Concatenate columns in MySql

    View: 341    Download: 0   Comment: 0   Author: none  

    Concatenate columns in MySql

    Category: Php&mySql
    Fields: Other

    0/1 review
    Artisan is the command line tool used in Laravel framework. It offers a bunch of useful command that can help you develop application quickly. Apart from Artisan available commands, you can create your own custom commands to improve your workflow

  • How to Query NULL Value in MySql

    View: 269    Download: 0   Comment: 0   Author: none  

    How to Query NULL Value in MySql

    Category: Php&mySql
    Fields: Other

    5/1 review
    Misunderstanding NULL is common mistake beginners do while writing MySql query. While quering in MySql they compare column name with NULL. In MySql NULL is nothing or in simple word it isUnknown Value so if you use comparison operator for NULL values...

  • How to Abstract Class in PHP

    View: 308    Download: 0   Comment: 0   Author: none  

    How to Abstract Class in PHP

    Category: Php&mySql
    Fields: Other

    0/0 review
    What is an abstract class in PHP and when to use an abstract class in your application. In this tutorial, we’ll learn about abstract class and their implementation.

  • Use Enums in Rails for Mapped Values

    View: 283    Download: 0   Comment: 0   Author: none  

    Use Enums in Rails for Mapped Values

    Category: Php&mySql
    Fields: Other

    2.5/2 review
    When I worked in a call center, we used to mark cases with different statuses. This allowed upper management to get a handle on where cases stood, what the bottlenecks were and flow of calls. Thankfully it has been a long time since I worked in a...

 
Newsletter Email

File suggestion for you

File top downloads

logo codetitle
Codetitle.com - library source code to share, download the file to the community
Copyright © 2015. All rights reserved. codetitle.com Develope by Vinagon .Ltd