Sometimes, to work on different sites, I need to know the number of links on their pages. That is, find out how many incoming links are on each page of the site (other pages link to it), and how many outgoing links. This is required both for internal site optimization, as it can be useful for SEO promotion. For example, you might see pages that other pages have little links to.

When I was looking for ready-made solutions on the Internet, then they all had some limitations, they were paid or did not suit me in terms of functionality. So I decided to write my own mini app in Laravel – Link Counter Tool for Websites.

What a Link Counter Tool can do?

  1. Traverse all pages of the site and list these pages;
  2. Count the number of links on each page to other pages of the site;
  3. Count how many other pages on the site link to each page.

How does link counting work?

The principle of work of counting links on sites is as follows.

  • On the main page of the application, the user enters the site domain;
  • The program reads the main page, saves all links to other pages of the site and saves the links (that the main links those pages);
  • Then the program parses the next of the saved links and repeats what it did before – saves new links to pages and adds links;
  • Link Counter Tool repeats the past action until it crawls all the pages of the site;
  • The result of the work will be a list of all pages of the site, as well as incoming and outgoing links.

Link Counter Tool – Application on Laravel

To implement the intended task, I used the familiar Laravel framework, on which I had previously done, for example, Pizza shop on Laravel 8.

This project has only 3 routes:

Route::get('/', [MainController::class, 'index']);
Route::post('/get_links', [MainController::class, 'getLinks']);
Route::get('/domain/{domain_name}', [MainController::class, 'getSite']);

The first is the home page, where the user will enter the site’s domain.

The second one handles the POST request from the form.

The third is a page with the received link data for each domain. The single MainController has 3 methods:

public function index()
{
    return view('welcome', []);
}

public function getLinks(EnterDomainRequest $request, SiteService $siteService)
{
    $requestArray = $request->validated();
    $result = $siteService->getLinks($requestArray['domain']);
    if ($result) {
        return redirect('/domain/'.$requestArray['domain']);
    } else {
        return redirect('/?error=1');
    }
}

public function getSite(Request $request, SiteService $siteService)
{
    $links = $siteService->getSite($request->domain_name);
    if (empty($links))
        return redirect('/?error=1');

    return view('domain', compact('links'));
}

The index method – immediately returns the main page.

getLinks validates the request and passes the domain to the service. If the service has successfully completed its work, then the user will be redirected to the information page for his domain. In case of an error, it will be displayed.

getSite passes the domain to the service to get site links and if they are successfully received, it displays them on the domain page, otherwise it will display an error.

The SiteService has the following functionality.

The getLinks method takes a domain as input, saves it to the database in the sites table. It then creates the client using the Guzzle parsing library. Then it creates a record of the main page of the site in the database in the links table and begins to iterate over all the pages from this table that have not yet been visited until the entire site has been crawled. Inside the loop, there is also a call to the getLink method, where the application receives a page, searches for links on it, and adds it to the database.

public function getLinks(string $domain): bool
{
    try {
        $this->site = Site::where('domain', $domain)->first();
        if (!$this->site) {
            $this->site = new Site();
            $this->site->domain = $domain;
            $this->site->save();
        }
        $mainUrl = 'https://' . $domain;
        $this->client = new Client([
            'base_uri' => $mainUrl,
            'timeout'  => 10.0,
        ]);

        $sourceLink = '/';
        $baseLink = Link::where('site_id', $this->site->id)->where('url', $sourceLink)->first();
        if (!$baseLink) {
            $baseLink = Link::createLink($this->site->id, $sourceLink);
        }

        $issetNotParsed = Link::where('site_id', $this->site->id)->where('is_parsed', Link::NOT_PARSED)->first();
        if ($issetNotParsed) {
            while ($issetNotParsed) {
                DB::table('links')
                    ->where('site_id', $this->site->id)
                    ->where('is_parsed', Link::NOT_PARSED)
                    ->orderBy('id')
                    ->chunk(100, function ($links) use ($mainUrl) {
                        foreach ($links as $link) {
                            $this->getLink($mainUrl, $link);
                        }
                    });
                $issetNotParsed = Link::where('site_id', $this->site->id)->where('is_parsed', Link::NOT_PARSED)->first();
            }
        }
        return true;
    } catch (\Exception $exception) {
        $errorLink = Link::where('site_id', $this->site->id)->where('url', $this->currentUtl)->first();
        if ($errorLink) {
            $errorLink->is_parsed = Link::PARSED;
            $errorLink->error = $exception->getMessage();
            $errorLink->save();
        }
        return false;
    }
}

public function getLink($mainUrl, $baseLink)
{
    $this->currentUtl = $baseLink->url;
    $response = $this->client->get($baseLink->url);
    preg_match_all("/\<a(.*?)href=(\"|\')(.*?)(\"|\')/i", $response->getBody()->getContents(), $matches);
    $linkIds = [];
    foreach ($matches[3] as $link) {
        if ($link[0] == '/') {
            $link = $mainUrl . $link;
        } else if (strpos($link, $mainUrl) === false || $link === $mainUrl || $link === $mainUrl.'/' ) {
            continue;
        }
        $link = strtok($link, '#');
        $link = strtok($link, '?');
        $link = explode('://'.$this->site->domain, $link)[1];
        $issetLink = Link::where('site_id', $this->site->id)->where(function ($query) use ($link) {
            $query->where('url', $link)
                ->orWhere('url', $link.'/');
        })->first();
        if (!$issetLink) {
            $issetLink = Link::createLink($this->site->id, $link);
        }
        if ($baseLink->id != $issetLink->id) {
            $linkIds[] = $issetLink->id;
        }
    }
    $baseLink = Link::where('site_id', $this->site->id)->where('url', $baseLink->url)->first();
    if ($baseLink) {
        $baseLink->sourceLinks()->syncWithoutDetaching($linkIds);
        $baseLink->is_parsed = Link::PARSED;
        $baseLink->save();
    }
    sleep(1);
}

The getSite method by site domain gets its pages and all incoming and outgoing links.

public function getSite(string $domain): array
{
    $targetLinks = [];
    $site = Site::where('domain', $domain)->first();
    if ($site) {
        foreach ($site->links as $link) {
            if ($link->url == '/') continue;
            $targetLinks[] = [
                'url' => $link->url,
                'links' => count($link->sourceLinks),
                'linked' => count($link->targetLinks),
            ];
        }
        usort($targetLinks, function($a, $b)
        {
            return $a['linked'] - $b['linked'];
        });
    }

    return $targetLinks;
}

An example of how the Link Counter Tool works

To demonstrate how the application works, I analyzed the links on my blog. To do this, on the main page of the application, I entered the domain and clicked on the “Submit” button.

Link Counter Tool
Link Counter Tool

After waiting for the Link Counter Tool to finish, I was redirected to the next page.

Counting links on the site
Counting links on the site

Here in the Url column there are links to the pages of my site, in the Output links column – the number of links on these pages, and in the Input links – the number of pages on the site that link to this url. The rows are sorted by the last column so that you can easily find pages that I don’t link to much.

I wrote the Link Counter Tool mini app on Laravel very quickly, so it is still quite raw and there are things that I would like to improve. In the future, when the time comes, I will do it.

You can look at the full source code of the project in the repository on my github. If you also want to participate in the finalization of the application – send your pull requests and I will definitely consider them.

If you are not yet very familiar with the framework on which the project is made, then the Top 5 Laravel Best Practices article may be useful for you.

That’s all for today. Deploy the application at home and count the links on your sites. And don’t forget to subscribe to my Twitter, where I post new articles.

Share post
Twitter Facebook