# Selenium

Selenium (https://www.seleniumhq.org/) automates browsers. Primarily, it is for automating web applications for testing purposes, but it not limited to just that. Boring web-based tasks can (and should!) be automated.

In the code cell below, the statement
```python
from selenium import webdriver
```
is used to import the webdriver, which is always necessary for automating browsing.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By

### Using selenium to open a website

We first need to create an object for the web driver, which we use to open the page. In our class we use the Firefox webdriver, but others are available (see section 1.5 here: https://selenium-python.readthedocs.io/installation.html)

Note that on a school computer, you will need to specify the executable path to the webdriver, but this (likely) will not be the case on your personal computer.

```python
driver = webdriver.Firefox(executable_path='C:\geckodriver\geckodriver.exe')
```

Create the web driver object that controls the browser; this will open a Firefox brower with an empty url.

In [None]:
driver = webdriver.Firefox()

To browse to a page, simply use the *driver.get* method and specify the URL.

In [None]:
driver.get('http://www.easternct.edu')

### Locating elements

In order to find the *first* element matching a particular *id*, *tag name*, etc, use the *find_element* method:

```python
find_element(By.TAG_NAME, value)
```

In order to find multiple elements that match, use the *find_elements* method, which returns a list:

```python
find_elements(By.TAG_NAME, value)
```

These methods can be called using the *driver* or any selenium web element.

The first argument can be any one of the following, and the second argument is the corresponding *value* to search for:

- By.ID 
- By.XPATH
- By.LINK_TEXT
- By.PARTIAL_LINK_TEXT
- By.NAME
- By.TAG_NAME
- By.CLASS_NAME
- By.CSS_SELECTOR

Note: For CLASS_NAME, any elements with that class will be returned (even if the element contains multiple classes). 

Note: if no elements exist, a *NoSuchElementException* will be raised.

The code below finds the first *ul* element on the page, which holds the list of menu items in the header of the page. The element is stored in a *webelement* object.

In [None]:
ul = driver.find_element(By.TAG_NAME, 'ul')
ul

### Extracting text from elements
To extract text from an element, simply access its *text* field.

In [None]:
list_items = ul.find_elements(By.TAG_NAME, 'li')
for li in list_items :
    print(li.text)

### Clicking on an element

You can click on an element using the *click* method. Note that you will get an error if the element cannot be clicked. For example, this happens if you run the cell below twice.

In [None]:
searchButton = driver.find_element(By.ID, 'search-button')
searchButton.click()

### Adding text to an input

The *send_keys* method can be used to add text to an input. Here we add "How are you" to the search input that is now visible because we clicked on the search icon.

In [None]:
elem = driver.find_element(By.ID, 'q')
elem.send_keys("How are you?")

We can clear input using the *clear* method.

In [None]:
elem.clear()

Let's search for "Computer Science", by entering the text and then pressing the *Enter* key.

The statement
```python
from selenium.webdriver.common.keys import Keys
```
is needed so that we can simulate a user hitting the ENTER (RETURN) button.

In [None]:
from selenium.webdriver.common.keys import Keys
elem.send_keys('Computer Science')
elem.send_keys(Keys.RETURN)

### Getting the value of an attribute

The method *get_attribute* can be used to get the value of an attribute of an element. Here we get all links on the page, and display the text of the link as well as the URL (the *href* attribute).

In [None]:
links = driver.find_elements(By.TAG_NAME, 'a')
for l in links :
        text = l.text
        if l.text != '':
            print(text, l.get_attribute('href'), sep = ': ')

### Searching by link text

- use the *By.LINK_TEXT* option to search for elements whose link text is an exact match
- use the *By.PARTIAL_TEXT* to search for elements whose link text *contains* the text

Note: *text* here refers to the text value of the element, which can contain the *text* from more than one tag, as is the case for the last link in the second example.

In [None]:
driver.find_element(By.LINK_TEXT, 'CONTACT US')

In [None]:
cs_links = driver.find_elements(By.PARTIAL_LINK_TEXT, 'Computer')
for cs in cs_links :
    print(cs.text)

### Close the driver

Close the driver when you are done.

In [None]:
driver.close()

### Headless browsers and screenshots

It is possible to make a browser *headless* (meaning the browser no longer has a GUI and you therefore will not see it), by setting *options* as in the code below. You can also save a screenshot of, which is commonly done with testing. 

In [None]:
# configure headless browser
from selenium.webdriver.firefox.options import Options
options = Options()
options.headless = True
print('configuring headless browser ...')
driver = webdriver.Firefox(options=options)

# go to Google News and take a screenshot
print('opening http://news.google.com ...')
driver.get('http://news.google.com')

print('take a screenshot ...')
driver.save_screenshot('google_news.png')

# close the browser
print('close the browser...')
driver.close()

print('done!')

### Searching by xpath

Xpath uses path expressions to select nodes in an XML (or HTML) document. For more information, see: https://www.w3schools.com/xml/xpath_syntax.asp. In some cases, specifying the *xpath* may be more intuitive and/or more powerful.

In [None]:
driver = webdriver.Firefox()
driver.get('http://www.easternct.edu')

Here we use a CSS selector to get the 3rd list item inside of the *div* with class 'main-menu-bg'.

In [None]:
info_link = driver.find_element(By.CSS_SELECTOR, 'div.main-menu-bg li:nth-child(3)')
info_link.text

We can do the same thing using xpath. Note that we use the following:

- two slashes (//) says to search starting from the current node (if you use a single slash, then the path must match exactly)
- you can look for an attribute using [@attribute = value] (an exact match is required)
- element[n] will match the *nth* element

Note: @class= identifies classes that match exactly; if the element could contain multiple classes, you shouls use the *contains* xpath function (see link below).

In [None]:
info_link = driver.find_element(By.XPATH, '//div[@class="main-menu-bg"]//li[3]')
info_link.text

In general, anything you can match using a CSS SELECTOR can also be matched by specifying an XPATH. But XPATH also allows for "other things", such as text matches, that could not be specified otherwise. See https://devhints.io/xpath

In [None]:
ugr = driver.find_element(By.XPATH, '//h3[(text() = "Undergraduate Research")]/..')
ugr.text

But in order to get the summary, we need to click on it, since the summary is currently not displayed.

In [None]:
ugr.click()

In [None]:
print(ugr.text)

### Exercise: 
Search for a movie on IMDB and go to the page for the first result by *clicking* on the link. 
Can you extract the title and rating?

Note: It is important to sleep for a second or two between carrying out the search and going to the first result. 


In [None]:
import time