Chromedriver - Under The Hood






Selenium Webdriver has been our companion in our test automation adventures for some time now. Over the years the Selenium open source project has obtained something I can fairly describe as global domination. With Selenium, we are able to interact with different browsers using the WebDrivers. But, how much do we really know about them and the way they work?
In this article, we will take a closer look into "Chromedriver" and explore a little bit about how it works.

So, what is the role of this Webdriver?

Starting from the release of Selenium 2, we were introduced to the concept of Webdriver.
The "Chromedriver" for example is a .exe file that is able to act as a server. His role is to allow us to "Interact" with the browsers.
Eventually, the responsibility for developing and maintaining these drivers was transferred to the browser vendors. The idea is, that they know their browsers the best and the produces driver would be much more stable and robust. In reality, that caused numerous problems. There were methods that were not properly invoked by the WebDrivers, and insufficient maintenance and bug fixes on the vendor's side.

Let's start with a fairly primitive example 

In the example below, we have a simple Selenium Code which is:

- Opening a Chrome browser
- Navigating to Google.com



What happens when we execute this code?

When we add the Selenium Library dependencies to our code, we are able to use the bindings that we see in the above snippet.

Once we execute this code, this is what we will see in our terminal:


In the above example, we are starting a session with the ChromeDriver through a port (In this case 17494).
What will happen next is that the Webdriver will act as a server and receive commands that we want to perform on our browser. (The bindings we discussed).
All commands are following a specific REST API protocol called the WebDriver protocol, the documentation of which can be found here.
The WebDrivers themselves must be able to implement all of the calls made to them for our commands to be performed successfully.



In this example...

- We start by posting a session to the WebDriver with a body that can potentially include our DesiredCapabilities if given (Desired Capabilities class contains pairs of keys and values that can help us determine browser properties).
The call would look something like this:



- The WebDriver responds with a session ID, status code for the operation, etc...

- We then try to navigate to our desired URL
The call would look something like this:


The server itself interacts with the Chrome browser via the Remote Debugger Protocol.
(https://chromedevtools.github.io/debugger-protocol-viewer/1-2/).

It translates the API calls to something that can be received by the Chromes socket.
In our case, I guess the navigation call will be translated to:



On the other side of the socket, there is a dispatcher listening and executing a chromium code.
I actually found the piece of code being invoked on the chromes side:



This is a very simple introduction to "Chromedriver" and the way it works.
Basically, the Webdriver makes direct calls to the browser using each browser's native support for Automation / debugging How these calls are made and the features they support depends on the browser you are using.




Comments

Post a Comment

Popular posts from this blog

Sharing is caring - Intro to Jenkins shared libraries

Intro to Terraform and how it is related to test automation infrastructure

Test Automation, Security, and other vegetables