What is Named Entity Recognition (NER)?

Named Entity Recognition (NER) is an NLP task that automatically identifies and classifies named entities in text into predefined categories such as person names (PERSON), organisations (ORG), locations (LOC), dates (DATE), and monetary values (MONEY). For example, in the sentence 'Elon Musk founded Tesla in California', NER would identify 'Elon Musk' as PERSON, 'Tesla' as ORG, and 'California' as LOC.

What is the BIO tagging scheme in NER?

The BIO (Beginning-Inside-Outside) tagging scheme labels each token in a sequence to mark named entities. B-TYPE marks the Beginning of an entity, I-TYPE marks tokens Inside an entity (continuation), and O marks tokens Outside any entity. For example: 'Elon(B-PER) Musk(I-PER) founded(O) Tesla(B-ORG) in(O) California(B-LOC)'. This scheme allows the model to handle multi-word entities correctly.

Named Entity Recognition (NER)

Identifying People, Places & Organisations in Text — For Engineering Students

Last Updated: March 2026

📌 Key Takeaways

Definition: NER identifies and classifies named entities (people, organisations, locations, dates) in unstructured text.
Common entity types: PERSON, ORG, LOC, DATE, TIME, MONEY, PERCENT, PRODUCT, EVENT.
Tagging scheme: BIO — B=Beginning of entity, I=Inside entity, O=Outside entity.
Best tool for production: spaCy — fast, accurate, supports 60+ languages.
Best accuracy: Fine-tuned BERT — state-of-the-art on standard benchmarks.
Evaluated using: Precision, Recall, and F1 score at entity level.

1. What is Named Entity Recognition?

Named Entity Recognition (NER) is an NLP task that automatically identifies and classifies named real-world entities mentioned in text into predefined categories.

Example: Given the sentence — “Apple Inc. CEO Tim Cook announced the new iPhone in Cupertino on September 9th, valued at $999.”

NER would extract:

“Apple Inc.” → ORG (Organisation)
“Tim Cook” → PERSON
“iPhone” → PRODUCT
“Cupertino” → LOC (Location)
“September 9th” → DATE
“$999” → MONEY

NER is a foundational NLP task — it is a prerequisite for many downstream applications including information extraction, question answering, knowledge graph construction, and document summarisation.

2. Common Entity Types

Entity Type	Label	Examples
Person	PERSON / PER	Narendra Modi, Sundar Pichai, A.P.J. Abdul Kalam
Organisation	ORG	ISRO, Tata Motors, IIT Bombay, WHO
Location / Place	LOC / GPE	Mumbai, Himalayan range, Bay of Bengal
Date	DATE	15 August 1947, next Monday, Q3 2026
Time	TIME	3:30 PM, morning, within an hour
Money	MONEY	₹50 crore, $1.2 billion, €500
Percentage	PERCENT	GDP growth of 7.3%, 45% market share
Product	PRODUCT	iPhone 16, Maruti Swift, Python 3.12
Event	EVENT	World Cup 2026, GATE 2026, COVID-19 pandemic
Quantity	QUANTITY	5 kilometres, 100 litres, 3.5 kg

Different NER systems use different entity type sets. spaCy’s English model uses 18 types; CoNLL-2003 (the standard benchmark) uses only 4: PER, ORG, LOC, MISC.

3. BIO Tagging Scheme

NER is typically framed as a sequence labelling problem — assign one label to each token in the input. The BIO (Beginning-Inside-Outside) scheme handles multi-word entities:

B-TYPE: Beginning of an entity of TYPE
I-TYPE: Inside (continuation) of an entity of TYPE
O: Outside any named entity

Example: “Jawaharlal Nehru University hosted the event”

Token	BIO Tag
Jawaharlal	B-ORG
Nehru	I-ORG
University	I-ORG
hosted	O
the	O
event	O

Extended schemes include BIOES (adding E for End and S for Single-token entity) which can improve model performance by providing more boundary information.

4. Approaches to NER

Rule-Based

Handcrafted rules using regular expressions, dictionaries (gazetteers), and grammatical patterns. Example: “any word following ‘Mr.’ or ‘Dr.’ is likely a PERSON.” Fast, interpretable, no training data needed. But brittle — fails on unseen patterns and requires constant manual updates.

Classical ML — CRF (Conditional Random Field)

CRF is the traditional ML approach for NER. It models the conditional probability of the entire label sequence given the input, considering dependencies between adjacent labels (e.g., I-ORG cannot follow B-PER). Features include: current word, POS tag, word shape, prefix/suffix, surrounding words. Requires feature engineering but achieves good results on small datasets.

Deep Learning — BiLSTM-CRF

The dominant approach before transformers: a bidirectional LSTM processes the sequence and produces contextual representations, which are then fed into a CRF layer for sequence-level decoding. Learns features automatically from data — no manual feature engineering.

Transformer-Based — BERT Fine-Tuning

Current state-of-the-art: fine-tune BERT (or domain-specific models like BioBERT for medicine, LegalBERT for law) on NER-labelled data. BERT’s contextual embeddings capture nuanced entity semantics — “Apple” as a company vs “apple” as a fruit. Achieves F1 scores above 90% on standard benchmarks.

5. Evaluation Metrics

NER is evaluated at the entity level, not at the token level. A prediction counts as correct only if both the entity span (all tokens) and the entity type are exactly correct.

Precision = Correctly predicted entities / All predicted entities

Recall = Correctly predicted entities / All true entities

F1 = 2 × (Precision × Recall) / (Precision + Recall)

State-of-the-art models achieve F1 ~93% on CoNLL-2003 English NER. Domain-specific NER (medical, legal) typically achieves 80–88% F1 due to specialised terminology and ambiguity.

6. Applications

Domain	Application
News Media	Auto-tag articles with mentioned people, companies, locations for search and recommendation
Finance	Extract company names, monetary values, dates from earnings reports and news
Healthcare	Extract drug names, diseases, symptoms, dosages from clinical notes (MedNER)
Legal	Identify parties, dates, legal provisions in contracts and case documents
E-commerce	Extract product names, brands, specifications from user queries and product descriptions
Search Engines	Improve search by identifying query intent (“flights from Mumbai to Delhi” → LOC, LOC)

7. Python Code


# Install: pip install spacy transformers
# Download model: python -m spacy download en_core_web_sm

import spacy
from transformers import pipeline

# ============================================================
# Approach 1: spaCy (fast, production-ready)
# ============================================================
nlp = spacy.load("en_core_web_sm")

text = """
ISRO launched the Chandrayaan-3 spacecraft on July 14, 2023, from
Sriharikota, Andhra Pradesh. The mission cost approximately 615 crore rupees.
Dr. S. Somanath led the team at the Indian Space Research Organisation.
"""

doc = nlp(text)

print("Named Entities found:")
print("-" * 50)
for ent in doc.ents:
    print(f"  Text: {ent.text:<35} Label: {ent.label_:<10} ({spacy.explain(ent.label_)})")

# Visualise in Jupyter Notebook:
# from spacy import displacy
# displacy.render(doc, style="ent", jupyter=True)

# ============================================================
# Approach 2: HuggingFace Transformers (BERT-based, highest accuracy)
# ============================================================
ner_pipeline = pipeline(
    "ner",
    model="dbmdz/bert-large-cased-finetuned-conll03-english",
    aggregation_strategy="simple"  # Merge B/I tokens into full entities
)

text2 = "Sundar Pichai, CEO of Google, announced new AI features in San Francisco."
entities = ner_pipeline(text2)

print("\nBERT NER Results:")
print("-" * 50)
for entity in entities:
    print(f"  Entity: {entity['word']:<25} Type: {entity['entity_group']:<10} Score: {entity['score']:.3f}")

# ============================================================
# Approach 3: spaCy with custom entity training (outline)
# ============================================================
# For custom entity types not in the pre-trained model:
# 1. Annotate training data with your custom entities
# 2. Load blank or existing spaCy model
# 3. Add 'ner' component with your custom labels
# 4. Train with nlp.update() on annotated examples
# 5. Save model with nlp.to_disk()

# Example annotation format:
TRAIN_DATA = [
    ("The RTX 4090 GPU is powerful", {"entities": [(4, 13, "PRODUCT")]}),
    ("Python 3.12 was released last year", {"entities": [(0, 10, "PRODUCT")]}),
]
# Use spaCy's DocBin format and train with: python -m spacy train config.cfg

8. Common Mistakes Students Make

Evaluating at token level instead of entity level: A model that correctly tags "New" and "York" individually but misses that they form a single "New York" entity should not get full credit. Always evaluate at the entity span level.
Ignoring ambiguity: "Apple" can be a company (ORG) or a fruit (no entity). Context is essential — rule-based systems fail badly on ambiguous entities. Use contextual models (BERT) for production.
Using a general model for domain-specific text: A model trained on news text will perform poorly on medical reports or legal documents. Use domain-specific pre-trained models (BioBERT, ClinicalBERT, LegalBERT) and fine-tune on domain data.

9. Frequently Asked Questions

How much data do I need to train a custom NER model?

For fine-tuning BERT: 500–2,000 annotated examples per entity type can give reasonable results. For training from scratch with BiLSTM-CRF: typically 5,000–50,000 examples. The best tool for annotation is Label Studio or Prodigy (spaCy's annotation tool). For very small datasets, consider few-shot NER approaches using GPT-4 as an annotator to bootstrap your training data.

What is the best NER library for Indian language text?

For Indian languages: iNLTK supports Hindi, Gujarati, Punjabi, and several other Indian languages. IndicBERT and MuRIL (Multilingual Representations for Indian Languages) are BERT-based models fine-tuned on Indian language corpora. For Hindi specifically, the iSarcasm and iNER datasets provide annotated training data. spaCy has limited Indian language support in its standard models.

Named Entity Recognition (NER)

Named Entity Recognition (NER)

📌 Key Takeaways

1. What is Named Entity Recognition?

2. Common Entity Types

3. BIO Tagging Scheme

4. Approaches to NER

Rule-Based

Classical ML — CRF (Conditional Random Field)

Deep Learning — BiLSTM-CRF

Transformer-Based — BERT Fine-Tuning

5. Evaluation Metrics

6. Applications

7. Python Code

8. Common Mistakes Students Make

9. Frequently Asked Questions

How much data do I need to train a custom NER model?

What is the best NER library for Indian language text?

Next Steps

Next Steps

Leave a Comment