The Large envelope protein (S), also termed L-HBsAg or L glycoprotein, is encoded by the HBV S open reading frame (ORF). It includes three regions: pre-S1, pre-S2, and S, which mediate viral attachment to hepatocytes and nucleocapsid assembly . In genotype C subtype adr, this protein is associated with higher viral replication rates and severe liver disease compared to other genotypes .
Genotype C exhibits higher intracellular HBV DNA accumulation than genotype B, correlating with severe liver damage .
Pre-S1 mutations in genotype C alter HBsAg antigenicity, complicating diagnostic detection .
Monoclonal antibodies targeting the pre-S1 domain show promise in blocking viral entry .
Recombinant S proteins are used to study immune evasion mechanisms, particularly in chronic HBV .
Antigenic Variability: Mutations in pre-S1 (e.g., T68N, A74T) reduce antibody binding efficacy .
Diagnostic Gaps: Current assays often miss mutated S proteins, necessitating genotype-specific reagents .
Vaccine Development: Pre-S1 epitopes are under investigation to enhance existing HBsAg-based vaccines .
HBV produces three envelope proteins known as Large (L), Middle (M), and Small (S) surface antigens (collectively referred to as HBsAg). These proteins share their C-terminal regions but differ in their N-terminal domains. The S protein forms the basic structural component, while M contains an additional PreS2 domain, and L contains both PreS1 and PreS2 domains .
The S protein is an integral membrane protein with four predicted transmembrane-spanning helices, interrupted by an N-proximal cytosolic loop. After the second helix, there is an antigenic loop presenting a complex structure stabilized by multiple disulfide bridges . The PreS1 and PreS2 domains together (collectively termed PreS) are suspected to represent intrinsically disordered protein domains, though experimental confirmation for human HBV PreS has been limited compared to the avian counterparts .
HBV is classified into at least 10 genotypes (A through J) with distinct geographic distributions . Within genotype C, multiple subgenotypes (C1-C16) and recombinant variants have been identified . The classification system is based on phylogenetic analysis of full-length genome sequences, with a sequence divergence of >4% typically required to define a new subgenotype .
Subtypes are designated based on antigenic determinants in the surface protein, specifically the "a" determinant (common to most viruses) combined with mutually exclusive determinants d/y and w/r . Genotype C is predominantly associated with subtype adr, though rare cases of adw have been reported . The genetic divergence within the same subtype is approximately 8%, similar to that found between different subtypes .
The following table summarizes the relationship between genotype C and its serological subtypes:
| Genotype | Predominant Subtype | Alternative Subtypes |
|---|---|---|
| C | adr (adrq+, adrq-) | adw (rare) |
Genotype C exhibits several distinctive virological characteristics compared to other genotypes, particularly genotype B. These differences impact disease progression and treatment outcomes:
| Virological Parameter | Genotype B | Genotype C | Genotype A | Genotype D |
|---|---|---|---|---|
| Serum HBV DNA level | Lower | Higher | ND* | ND* |
| Frequency of precore A1896 mutation | Higher | Lower | Lower | Higher |
| Frequency of basal core promoter A1762T/G1764A mutation | Lower | Higher | Lower | Higher |
| Frequency of pre-S deletion mutation | Lower | Higher | ND* | ND* |
| Intracellular expression of HBV DNA | Lower | Higher | Lower | Higher |
| Secretion of HBeAg | Lower | Higher | ND* | ND* |
In vitro studies have demonstrated that intracellular expression of HBV DNA is higher for genotype C than genotype B, which may explain the association of genotype C with more severe liver disease . The higher replication capacity of genotype C contributes to its clinical impact, and research has shown an increase in intracellular core protein expression when basal core promoter (BCP) A1762T/G1764A variants are introduced in genotype C strains .
Patients infected with genotype C HBV experience significantly different disease progression patterns compared to other genotypes:
Delayed HBeAg seroconversion: The estimated annual rates of HBeAg seroconversion are 7.9% for genotype C versus 15.5% for genotype B infections .
Longer duration of high viral replication: A long-term follow-up study with 460 Taiwanese children chronically infected with HBV showed that after 20 years, 70% of genotype C carriers remained HBeAg-positive compared to only 40% of genotype B carriers .
Higher risk of severe disease: Genotype C infection is associated with more severe liver disease, including an increased risk of cirrhosis and hepatocellular carcinoma (HCC) compared to genotypes A and B .
Higher frequency of mutations: Genotype C has a higher frequency of core promoter and pre-S mutations than genotypes A and B, factors that correlate with an increased risk of HCC .
These characteristics have important implications for monitoring and managing patients with genotype C infections, as they may require more intensive surveillance and potentially earlier therapeutic intervention.
Several methods have been developed for the recombinant expression of HBV surface antigens, with varying degrees of efficiency, authenticity, and applicability to research contexts:
Mammalian cell expression systems: Recent developments include streamlined methods for transient expression in mammalian cells, which can produce recombinant S-HBsAg virus-like particles (VLPs) with proper post-translational modifications . This approach aims to display uniform antigenic epitopes on the surface to improve serological detection of anti-HBs antibodies .
Yeast expression systems: Historically common, but may result in different glycosylation patterns compared to native human HBV proteins .
Bacterial expression systems: Often used for producing non-glycosylated fragments for structural studies.
Comparative analysis has shown that mammalian cell-derived S-HBsAg VLPs demonstrated the highest sensitivity and specificity in multiplex serology when compared to yeast or serum-derived HBsAg, making them particularly suitable for analysis of HBV immunity through anti-HBs serostatus .
Optimization of purification protocols for recombinant HBV Large envelope protein requires careful consideration of several factors:
Solubilization strategy: The membrane-associated nature of the Large envelope protein necessitates effective solubilization, typically using detergents that maintain protein structure and function.
Chromatographic approach: A combination of affinity chromatography (using tagged constructs), ion exchange, and size exclusion chromatography can achieve high purity.
Verification methods: Purified proteins should be characterized using:
For Large envelope protein specifically, additional considerations include preserving the native conformation of the PreS domains, which may require specialized approaches due to their intrinsically disordered nature .
A recently developed method involves purification from mammalian cell lysates that enables the production of VLPs with properly displayed epitopes, which is particularly valuable for serological applications .
Recombination events significantly impact HBV genotype C classification and potentially alter viral function:
Impact on phylogenetic analysis: Inclusion of recombinant sequences changes the topology of phylogenetic trees, though it generally does not affect the subgenotyping of non-recombinant sequences . For example, in trees built with all genotype C sequences, C2 appears closer to the root than C9, but in trees using only non-recombinant sequences, C9 becomes the closest subgenotype to the root .
Recombinant subgenotypes: Several recombinant subgenotypes have been identified:
Sequence divergence considerations: The sequence divergence between CD1 and C2 was found to be 3.8%, which is less than the 4% threshold typically used to define a new subgenotype, highlighting challenges in classification .
Functional implications: Recombination may alter viral properties related to replication efficiency, immune escape, and pathogenesis, though these effects require further study specifically for genotype C.
Genotype C exhibits substantial subgenotypic diversity, with more than ten reported subgenotypes (C1-C16) plus two C/D recombinant subgenotypes (CD1 and CD2) . Recent phylogenetic analyses have corrected several misclassifications:
C11 (proposed by Utsumi and colleagues) was found to group with C12 (proposed by Mulyanto and colleagues) .
Two sequences previously designated as C6 (GQ358157 and GU721029) have been re-designated as C12 and C7, respectively .
A "quasi-subgenotype C2" has been proposed, which includes the original C2, several previously unclassified sequences, and sequences previously designated as C14 .
A novel subgenotype, tentatively named C14, has been identified based on phylogenetic analysis and sequence divergence >4% .
Geographically, genotype C is predominant in East and Southeast Asia, with specific subgenotypes showing regional clustering patterns. The prevalence of specific mutations, such as basal core promoter mutations, also varies regionally within genotype C distributions.
Phosphorylation of the HBV Large envelope protein (L) represents an important post-translational modification that influences viral functionality. Recent research has established specific phosphorylation sites in the human HBV large envelope protein . These modifications affect several aspects of the protein's function:
Conformational dynamics: Phosphorylation likely induces conformational changes in the PreS domains, which are suspected to represent intrinsically disordered protein regions .
Host interactions: Phosphorylation status may modulate interactions with host factors, potentially influencing cellular attachment and entry processes.
Immunogenicity: The phosphorylation pattern could affect epitope presentation and recognition by the host immune system.
Viral particle assembly and secretion: Phosphorylation state may regulate the incorporation of L protein into viral particles and subsequent secretion.
The PreS domains (PreS1 and PreS2) in the L protein are particularly important subjects of study regarding phosphorylation, as these regions mediate crucial virus-host interactions .
The HBV envelope consists of three proteins (L, M, and S) that share their C-terminal regions but differ in their N-terminal domains:
Shared structural elements: All three proteins share the S domain, which contains four transmembrane-spanning helices and an antigenic loop stabilized by multiple disulfide bridges .
Distinctive domains:
Topological arrangements: The L protein can adopt multiple topological arrangements, with the PreS domains either exposed on the viral surface or located inside the virion, demonstrating conformational flexibility that is crucial for viral function .
Functional differences: The L protein, with its complete PreS region, plays critical roles in:
Viral assembly
Receptor binding (particularly through the PreS1 domain)
Host-cell entry
Viral maturation processes
The PreS domains in the L protein are particularly important for the virus life cycle, with the PreS1 domain containing the receptor-binding site for the sodium taurocholate cotransporting polypeptide (NTCP), the primary receptor for HBV entry into hepatocytes.
Optimization of recombinant HBV surface antigens for serological assays requires addressing several key considerations:
Expression system selection: Mammalian expression systems have demonstrated superior performance for producing S-HBsAg VLPs with appropriate antigenic properties compared to yeast-derived or serum-isolated antigens . A streamlined method for transient expression in mammalian cells followed by purification from cell lysates has been shown to produce antigens with high sensitivity and specificity for detecting anti-HBs antibodies .
Structural characterization: Comprehensive characterization using:
Comparative benchmarking: New preparations should be benchmarked against reference standards to evaluate their performance in detecting anti-HBs antibodies, with particular attention to sensitivity, specificity, and reproducibility .
Application-specific optimization: For multiplex serology applications, recombinant S-HBsAg VLPs produced in mammalian systems have demonstrated the highest performance, making them particularly suitable for analysis of HBV immunity through anti-HBs serostatus .
Recent studies have shown that serum-isolated and recombinant HBsAg VLPs assemble differently, highlighting the importance of appropriate production methods for specific research applications .
Several sophisticated experimental approaches can be employed to study host-viral interactions mediated by the genotype C Large envelope protein:
Multiomics approaches: Integration of transcriptomics, proteomics, and ribosome profiling has revealed that HBV induces significant changes in both transcription and translation of host genes, including PPP1R15A, PGAM5, and SIRT6 . These methods can identify differentially expressed genes orchestrated by HBV to remodel host proteostasis networks .
Protein interaction studies:
Co-immunoprecipitation followed by mass spectrometry to identify binding partners
Proximity labeling techniques (BioID, APEX) to capture transient interactions
Surface plasmon resonance or biolayer interferometry for kinetic analyses of binding interactions
Functional genomics:
CRISPR-Cas9 screens to identify host factors essential for viral entry and replication
RNA interference approaches to validate specific host-viral interactions
Structural biology techniques:
Cryo-electron microscopy for visualization of virus-receptor complexes
X-ray crystallography or NMR spectroscopy for atomic-level interaction details
Hydrogen-deuterium exchange mass spectrometry to map interaction interfaces
Cellular models: Development of physiologically relevant cellular systems, including primary human hepatocytes or differentiated hepatocyte-like cells derived from induced pluripotent stem cells, for studying genotype-specific aspects of viral entry and replication .
Research using these approaches has revealed that some host factors, such as SIRT6, directly bind to the mini-chromosome and deacetylate histone H3 lysine 9 (H3K9ac) and histone H3 lysine 56 (H3K56ac), with chemical activation of endogenous SIRT6 suppressing HBV infection both in vitro and in vivo .
The nomenclature of HBV genes and their products has developed inconsistently, creating significant confusion even among experts. Several specific challenges exist:
Inconsistent gene and protein designations: One frequent misunderstanding involves the designation of HBV "pre-core" gene. The HBeAg is made from a pre-pro-protein, containing a pre-sequence upstream of gene C, which should be designated "pre-C" rather than "pre-E" . Similarly, the mRNA encoding the HBeAg precursor protein should be designated accordingly .
Confusion regarding PreS regions: The functions of the pre-S region were elucidated only years after their discovery. Initial identification came from the observation that HBsAg purified from HBeAg-positive carriers contained additional minor protein pairs slightly larger than the well-known major proteins .
Historical terminology persistence: Terms like "pHSA-receptor" (based on polymerized human serum albumin binding) persist despite being misnomers .
Inconsistent subgenotype designations: Various research groups have independently proposed new subgenotypes, leading to overlapping or conflicting classifications that require reconciliation through comprehensive phylogenetic analyses .
Standardization efforts require consensus-building among the research community to establish consistent terminology that accurately reflects the biological relationships between viral components.
Several promising research directions could advance understanding of genotype C envelope proteins in viral pathogenesis:
Integrated multiomics approaches: Building on recent work that established the first multiomics landscape of host-HBV interaction , future research should focus on genotype-specific aspects of these interactions to better understand why genotype C is associated with more severe disease progression.
Structural biology of PreS domains: While PreS domains are suspected to represent intrinsically disordered protein regions, experimental data specifically for human HBV PreS is limited compared to avian counterparts . Advanced structural techniques could elucidate genotype-specific conformational properties.
Development of genotype-specific antivirals: Understanding the unique properties of genotype C envelope proteins could lead to tailored therapeutic approaches that target genotype-specific vulnerabilities.
Epigenetic regulation: Building on findings that proteins like SIRT6 interact with the HBV mini-chromosome , investigating genotype-specific differences in epigenetic regulation could reveal mechanisms underlying differential disease progression.
Immune evasion mechanisms: Investigating how genotype C-specific envelope protein variations contribute to immune evasion and persistent infection would provide valuable insights into pathogenesis.
Host factor interactions: Identifying genotype C-specific host factor interactions could explain the higher replication capacity and more severe disease outcomes associated with this genotype .
These research directions could ultimately lead to improved diagnostic, prognostic, and therapeutic approaches for managing HBV infections, particularly those caused by the more virulent genotype C.