Search Engine Optimization
What is a search engine?
- user types a query
- search engine displays results
- results are displayed as a ranked list
   → results are always displayed in some kind of order
   → what factors go into a rank?
      ⇒ # of clicks plays a role
      ⇒ relevance to query
      ⇒ paid promotions
- definition: software tools to retrieve and display relevant intoformation from the WWW
Types of search engine
- enterprise - uses company specific knowledge and tasks to process information
- desktop - search various types of informaiton to provide a list of sources (e.g. windows search)
- open source - can have multiple applicaitons such as Lucene (java), Lemur (C++), and Galago (java)
   → search engines whose source code is publicly available
- web search engines
Steps performed by web search engiens
- crawling
   → webcrawlers: bots who are traversing different websites, jumping from one link to another within a site and between others
   → indexes data on websites
   → use traversal algorithms to attempt to visit every node on the internet that it can, collecting data about website structure
   → “crawl through the web to collect informatio nabout a page, create an index and provide ranked responses to a search query”
- indexing
   → once web crawlers have collected information, an index is created. stores references to data effiienctly
- ranking algorithm
   → pagerank
- user interface
   → how users interact with search engine
- query processing
   → how the search engine processes what the user enters
   → matches keywords
- search and retrieval
   → takes output of ranking algorithm and query processing and displays results
- user feedback and personalization
- web search front end and back end
- continuous updates and monitoring
   → webpages change and the index needs to stay up to date
   → when resources get updated, moved, or removed the index should stay up to date
How would you classify whether a search engine is good or bad?
- effectiveness (quality of result)
- efficiency ( response time and throughput)
Important parameters:
- response time
   → time between submitting quiery and obtaining results
- query throughput
   → measure of the number of queries processed at a given time
- speed
   → rate at which documents can be transformed into search indexes
- coverage
   → how much info has been indexed and stored in a search engine
- freshness
   → age of the information stored
- scalability
   → should still work as users and information grow
sitemap.xml: metadata and infomration about a website. how urls are linked, date last updated.. etc
“working of a search engine” defined by two phases
- indexing
   → text acquisition
      ⇒ identifies documents for searching and indexing
   → text transformation
      ⇒ transforms documents into index terms or features
   → index creation
      ⇒ takes output of text transformation and creates indexes that allow for fast searching
- querying
2023/11/02 - 10:16
Hundreds of Changes anually
- is it better to have big website with mediocre content or small websites with great content?
- no need for SEO tricks because google tells us what to do
- in 2020, Your Money or Your Life and new health sites got hit?
   → sites that handle money or health information are under a higher level of scrutiny for trustworthiness
- Expertise, Authoritativenesss and Trustworthiness sites got rewarded?
   → sites that prove their credibillity are awarded higher SEO
Core vitals
- site speed, mobile, website usability (including design)
- Responsive web design
SEO
- the goal of SEO is to attract organic (non paid) traffic to your website
On-Page SEO
- techniques you can apply on your website itself so that it can increase your SEO rank
Off-page SEO
- building a strong online presence and attracting external links from reputable sources
White hat SEO
- ethical and legitimate techniques and strategies to optimize a website and improve its rank
Black hat SEO
- unethical and manipulative techniques used to improve a site's rank illegitimately
SEO in three phases
- keyword research
   → the challenge is that the search is competitive
   → long tail vs short tail keywords:
      ⇒ long tail, denser and more descriptive
      ⇒ short tail are shorter- two or three words
      ⇒ ex:
         • Lucrative: cooking pan
         • Long tail: large cooking pan
         • more long tail: large, non-stick, cooking pan
         • more long tail: large, non-stick, no toxins, safe for kids cooking pan, made in italy
   → have to find the most profitable keywords
   → wrong keywords mean no traffic or buying
   → timing and intent: what exactly people want... and they want it now
   → thinking in keywords: avoid technical terms in keywords, use words likely to be used by newbies and people who are not experts
   → keyword mapping
      ⇒ process of assigning or mapping keywords to specific pages on a website based on keyword research
      ⇒ write down all the possible keywords, perform a current relevancy check, and prepare a keyword mapping document
   → keyword density
      ⇒ number of times a keyword apepars on a given webpage or within a piece of content as a ratio/percentage of the overall word count
      ⇒ also referred to as keyword frequency
   → keyword stuffing
      ⇒ practise of inserting a large number of keywords into web content and meta tags in attempt to artificially increase a page's ranking in search results and drive more traffic to the site
      ⇒ generally, 2-5% of keyword stuffing is considered legitimate
- on-page SEO
- process of ranking pages higher
Things we can do as a developer
- title tags
   → descriptive or relevant titles for each page. should accurately represent content in each page
   → <title>...</title>
- meta description tags
   → <meta name="description" content="detailed description of whats on the page"/>
   → should summarize page content and encourage users to click
- heading tags
   → use heading tags to highlight key points in the content
- image alt text
   → including descripting alt text for images improves your score for accessibility which is factor for SEO
- internal linking
   → include internal links within your content to connect related pages on your website
   → use descriptive anchor tect to help search engines understand the context of the linked pages
- schema markup
- mobile friendly design
- page speed
   → fast loading pages enhance use satisfaciton
   → improving page speed::
      ⇒ compress images and use appropriate web formats
      ⇒ minimize and optimize CSS and JS
      ⇒ leverage browser caching 
- valid HTML markup
   → W3C validation. when HTML is valid is reduces risk of rendering issues, broken functionality, and accesibility problems
- canonical tags
   → places in head section
   → uses ‘rel=canonical’ attribute
   → used to address duplicate content issues on a website
   → help search engines understandwhich version of a website should be considered the authoritative or preferred version when multiple similar pages exist
   → when to use
      ⇒ duplicate content
      ⇒ product variation
      ⇒ pagination
         • for multi page content like articles, you can specify the canonical url of the first page to consolidate search results
      ⇒ sorting and filtering options
         • if you have multiple sorting or filtering options for product listings, specifying canonical urls can prevent search engines from indexing all variations Index
  Index