Search Engine Optimization

What is a search engine?
- user types a query
- search engine displays results
- results are displayed as a ranked list
→ results are always displayed in some kind of order
→ what factors go into a rank?
⇒ # of clicks plays a role
⇒ relevance to query
⇒ paid promotions
- definition: software tools to retrieve and display relevant intoformation from the WWW

Types of search engine
- enterprise - uses company specific knowledge and tasks to process information
- desktop - search various types of informaiton to provide a list of sources (e.g. windows search)
- open source - can have multiple applicaitons such as Lucene (java), Lemur (C++), and Galago (java)
→ search engines whose source code is publicly available
- web search engines

Steps performed by web search engiens
- crawling
→ webcrawlers: bots who are traversing different websites, jumping from one link to another within a site and between others
→ indexes data on websites
→ use traversal algorithms to attempt to visit every node on the internet that it can, collecting data about website structure
→ “crawl through the web to collect informatio nabout a page, create an index and provide ranked responses to a search query”
- indexing
→ once web crawlers have collected information, an index is created. stores references to data effiienctly
- ranking algorithm
→ pagerank
- user interface
→ how users interact with search engine
- query processing
→ how the search engine processes what the user enters
→ matches keywords
- search and retrieval
→ takes output of ranking algorithm and query processing and displays results
- user feedback and personalization
- web search front end and back end
- continuous updates and monitoring
→ webpages change and the index needs to stay up to date
→ when resources get updated, moved, or removed the index should stay up to date


How would you classify whether a search engine is good or bad?
- effectiveness (quality of result)
- efficiency ( response time and throughput)

Important parameters:
- response time
→ time between submitting quiery and obtaining results
- query throughput
→ measure of the number of queries processed at a given time
- speed
→ rate at which documents can be transformed into search indexes
- coverage
→ how much info has been indexed and stored in a search engine
- freshness
→ age of the information stored
- scalability
→ should still work as users and information grow

sitemap.xml: metadata and infomration about a website. how urls are linked, date last updated.. etc

“working of a search engine” defined by two phases
- indexing
→ text acquisition
⇒ identifies documents for searching and indexing
→ text transformation
⇒ transforms documents into index terms or features
→ index creation
⇒ takes output of text transformation and creates indexes that allow for fast searching
- querying



2023/11/02 - 10:16

Hundreds of Changes anually
- is it better to have big website with mediocre content or small websites with great content?
- no need for SEO tricks because google tells us what to do
- in 2020, Your Money or Your Life and new health sites got hit?
→ sites that handle money or health information are under a higher level of scrutiny for trustworthiness
- Expertise, Authoritativenesss and Trustworthiness sites got rewarded?
→ sites that prove their credibillity are awarded higher SEO

Core vitals
- site speed, mobile, website usability (including design)
- Responsive web design

SEO
- the goal of SEO is to attract organic (non paid) traffic to your website

On-Page SEO
- techniques you can apply on your website itself so that it can increase your SEO rank

Off-page SEO
- building a strong online presence and attracting external links from reputable sources

White hat SEO
- ethical and legitimate techniques and strategies to optimize a website and improve its rank
Black hat SEO
- unethical and manipulative techniques used to improve a site's rank illegitimately

SEO in three phases
- keyword research
→ the challenge is that the search is competitive
→ long tail vs short tail keywords:
⇒ long tail, denser and more descriptive
⇒ short tail are shorter- two or three words
⇒ ex:
• Lucrative: cooking pan
• Long tail: large cooking pan
• more long tail: large, non-stick, cooking pan
• more long tail: large, non-stick, no toxins, safe for kids cooking pan, made in italy
→ have to find the most profitable keywords
→ wrong keywords mean no traffic or buying
→ timing and intent: what exactly people want... and they want it now
→ thinking in keywords: avoid technical terms in keywords, use words likely to be used by newbies and people who are not experts
→ keyword mapping
⇒ process of assigning or mapping keywords to specific pages on a website based on keyword research
⇒ write down all the possible keywords, perform a current relevancy check, and prepare a keyword mapping document
→ keyword density
⇒ number of times a keyword apepars on a given webpage or within a piece of content as a ratio/percentage of the overall word count
⇒ also referred to as keyword frequency
→ keyword stuffing
⇒ practise of inserting a large number of keywords into web content and meta tags in attempt to artificially increase a page's ranking in search results and drive more traffic to the site
⇒ generally, 2-5% of keyword stuffing is considered legitimate
- on-page SEO
- process of ranking pages higher

Things we can do as a developer
- title tags
→ descriptive or relevant titles for each page. should accurately represent content in each page
→ <title>...</title>
- meta description tags
→ <meta name="description" content="detailed description of whats on the page"/>
→ should summarize page content and encourage users to click
- heading tags
→ use heading tags to highlight key points in the content
- image alt text
→ including descripting alt text for images improves your score for accessibility which is factor for SEO
- internal linking
→ include internal links within your content to connect related pages on your website
→ use descriptive anchor tect to help search engines understand the context of the linked pages
- schema markup
- mobile friendly design
- page speed
→ fast loading pages enhance use satisfaciton
→ improving page speed::
⇒ compress images and use appropriate web formats
⇒ minimize and optimize CSS and JS
⇒ leverage browser caching
- valid HTML markup
→ W3C validation. when HTML is valid is reduces risk of rendering issues, broken functionality, and accesibility problems
- canonical tags
→ places in head section
→ uses ‘rel=canonical’ attribute
→ used to address duplicate content issues on a website
→ help search engines understandwhich version of a website should be considered the authoritative or preferred version when multiple similar pages exist
→ when to use
⇒ duplicate content
⇒ product variation
⇒ pagination
• for multi page content like articles, you can specify the canonical url of the first page to consolidate search results
⇒ sorting and filtering options
• if you have multiple sorting or filtering options for product listings, specifying canonical urls can prevent search engines from indexing all variations

Index