How to Find Broken links using Selenium Webdriver
When you encounter 404/Page Not Found/Dead Hyperlinks on a website, what do you think of it? You would find it irritating to encounter broken links, so you should continuously concentrate on eliminating their presence in your web product (or website).
The rankings of your web product on search engines (like Google) will also suffer if it contains numerous pages (or links) that lead to a 404 error (or page not found). Dead links could damage the credibility of your product as the user needed to get the information for which they have visited the page.
In this blog, we will look into finding and locating broken links using Selenium WebDriver; check out this detailed tutorial on how to use Selenium WebDriver to know more about what and how to use Selenium WebDriver.
Table Of Contents
- 1 What are Broken links?
- 2 Common reasons for broken links.
- 3 Reasons to check for broken links
- 4 Different HTTP status codes
- 5 How to find broken links in the Selenium WebDriver?
- 6 Conclusion
- 7 Frequently Asked Questions
What are Broken links?
A broken link on a website leads to a page or resource that is no longer available. This can happen for various reasons, such as the page being deleted, moved to a different URL, or the website being temporarily down. Broken links can occur on any website and can be found on pages, images, videos, and other resources.
Broken links can be a problem for website visitors and search engines, leading to a poor user experience and negatively impacting search engine optimization (SEO). When a user clicks on a broken link, they will typically be taken to a “404 not found” page, which can be frustrating and cause them to leave the website. Additionally, search engines may view broken links as a sign of a poorly maintained website, which can lead to a lower search engine ranking.
To avoid these issues, it’s essential to regularly check your website for broken links and fix them as soon as they are discovered. This can be done manually by visiting each page of your website and clicking on each link to see if it leads to a “404 not found” page or by using automated tools such as Selenium or other link checker software.
Common reasons for broken links.
There are several reasons why broken links may appear on a website, including.
- Page Deletion: A previously linked page may have been deleted, causing the link to lead to a “404 not found” page.
- URL Changes: If a page’s URL is changed, any links pointing to the old URL will become broken.
- Website Redesign: During a website redesign, URLs may change, causing previously valid links to become broken.
- Moving to a New Domain: If a website is moved to a new domain, all links pointing to the old domain will become broken.
- Typos: Mistyping a URL or copying and pasting it wrongly can result in a broken link.
- Third-Party Content: If a website links to content hosted on another website, and that content is removed or moved, the link will become broken.
- Server Outages: If a website is temporarily down, links to that site will become broken.
By regularly checking for and fixing broken links, website owners can ensure that their website provides a good user experience and positively impacts their SEO.
Reasons to check for broken links
There are several reasons why it’s important to check for and fix broken links on a website:
- User Experience: Broken links can frustrate website visitors and lead to a poor user experience. If a user clicks on a broken link, they will be taken to a “404 not found” page, which can be confusing and cause them to leave the website.
- Search Engine Optimization (SEO): Search engines view broken links as a sign of a poorly maintained website, which can negatively impact the website’s search engine ranking.
- Maintenance: Broken links can signal that a website needs to be regularly maintained and updated, leading to other issues, such as outdated content and security vulnerabilities.
- Data Accuracy: Broken links can lead to incorrect or outdated information being displayed on a website, which can harm the website’s reputation.
- Link Equity: When a website links to another page, it passes on some of its link equity, a measure of the value that search engines place on a link. If the linked page is no longer available, the link equity is lost, which can negatively impact the linking website’s search engine ranking.
- By regularly checking for and fixing broken links, website owners can ensure that their website provides a good user experience, positively impacts their SEO, and is regularly maintained and updated.
Different HTTP status codes
HTTP (Hypertext Transfer Protocol) status codes are numerical codes that are returned by a web server to indicate the status of a request made by a client (e.g., a web browser). The most encountered HTTP status codes are
- 200 OK: The request was successful, and the server has returned the requested content.
- 201 Created: The request was successful, and the server has created a new resource.
- 204 No Content: The request was successful, but there needs to be a representation to return (i.e., the response is empty).
- 300 Multiple Choices: The request has multiple options, and the server directs the client to choose one.
- 301 Moved Permanently: The requested resource has been moved to a new URL, and future requests should use the new URL.
- 302 Found (Previously “Moved temporarily”): The requested resource has been temporarily moved to a new URL.
- 304 Not Modified: The client’s cached copy of the resource is up to date, and the server indicates that the client should use its cached copy.
- 400 Bad Request: The request must be understood or include the required parameters.
- 401 Unauthorized: Authentication failed, or the user does not have permission for the requested operation.
- 403 Forbidden: Authentication succeeded, but the authenticated user cannot access the requested resource.
- 404 Not Found: The requested resource could not be found on the server.
- 500 Internal Server Error: An error occurred on the server while processing the request.
How to find broken links in the Selenium WebDriver?
To find broken links in Selenium WebDriver, you can use the following steps:
- Retrieve all links from a web page: You can use Selenium WebDriver’s “findElements” method to find all the links (i.e., “a” elements) on a web page. You can store the links in a list for later processing.
- Verify each link: For each link, you can use the “getAttribute” method to retrieve the “href” attribute of the link, which represents the URL that the link points to. You can then use the “HttpURLConnection” class in Java to check the status code of the URL.
- Check the status code: You can use the “getResponseCode” method of the “HttpURLConnection” class to retrieve the HTTP status code of the URL. The link is broken if the status code is not in the 200 or 300 range. You can store the broken links in a separate list for later reporting.
- Repeat for each link: Repeat the above steps on the page until all links have been checked.
- Report the results: After all, links have been checked, you can use Selenium WebDriver’s “assert” method to check that the list of broken links is empty. If the list is not empty, you can report the broken links to the console or write them to a file for later analysis.
Let’s understand the code.
The output indicates all valid links except one broken link.
Broken links are a common issue in web development and can negatively impact the user experience. Regular checking for broken links is essential for maintaining a functional and professional-looking website. Using tools such as Selenium WebDriver, developers can automate the process of checking for broken links, saving time and effort while ensuring a higher level of website quality. Fixing broken links provides users with a seamless and error-free browsing experience. In addition, checking for broken links can help identify other issues, such as outdated or misconfigured URLs, which can be corrected to improve the website’s overall functionality.
Frequently Asked Questions
How do you handle broken links?
Handling broken links means finding and fixing the underlying causes of the broken links. Here are some steps you can follow to manage broken links:
- Identify the broken links: Use tools like Selenium WebDriver or online broken link checkers to identify all broken links on your website.
- Analyze the cause: Determine why the link is broken. Is it due to a misconfigured URL, a page that has been moved or deleted, or a server-side error?
- Fix or replace the link: If the cause of the broken link is a misconfigured URL or a page that has been moved, update the link to the correct URL. If the reason is a server-side error, contact the website owner or webmaster to resolve the issue.
- Monitor the links: Regularly check for broken links to ensure they do not reappear. You can automate this process using tools like Selenium WebDriver to save time and effort.
- Implement a redirect: If a page has been permanently removed, consider implementing a redirect to a related page to maintain a positive user experience.
What causes a broken link?
A variety of reasons can cause broken links. A few are listed below:
- Moved or deleted pages: If a page has been moved or deleted, the link pointing to it will become broken.
- Misconfigured URLs: Typing errors, incorrect link syntax, or incorrect casing can cause links to be misconfigured and lead to broken links.
- Server-side errors: Broken links can also be caused by server-side errors, such as a misconfigured server, a server that is down, or a server that is experiencing high traffic.
- Outdated links: Over time, some links may become outdated as the information they point to changes or becomes unavailable.
- Third-party services: If a third-party service that your website depends on is down or unavailable, links that point to it may become broken.
- Change in website structure: If the website’s structure changes, links that point to pages within the website may become broken.
How do I track broken links?
There are several ways to track broken links on your website, including
- Manual checking: Check each link on your website to see if it works correctly. This method can be time-consuming but is useful for small websites.
- Online broken link checkers: Use online tools, such as W3C Link Checker, Dead Link Checker, or Broken Link Checker, to scan your website for broken links. These tools will identify broken links and report the issue for you.
- Automated testing with Selenium WebDriver: Use Selenium WebDriver to automate checking for broken links. You can write a script that visits each page on your website, follows all the links, and checks the response code of each link to determine if it is broken.
- Monitoring tools: Use monitoring tools, such as Google Analytics or Google Search Console, to monitor your website’s performance and receive notifications when broken links are detected