BACKGROUND AND PURPOSE (AIMS): Measurement error of intraoral quantitative sensory testing (QST) has been assessed using traditional methods for reliability, such as intraclass correlation coefficients (ICCs). Most studies reporting QST reliability focused on assessing one source of measurement error at a time, e.g., inter- or intra-examiner (test-retest) reliabilities and employed two examiners to test inter-examiner reliability. The present study used a complex design with multiple examiners with the aim of assessing the reliability of intraoral QST taking account of multiple sources of error simultaneously. METHODS: Four examiners of varied experience assessed 12 healthy participants in two visits separated by 48h. Seven QST procedures to determine sensory thresholds were used: cold detection (CDT), warmth detection (WDT), cold pain (CPT), heat pain (HPT), mechanical detection (MDT), mechanical pain (MPT) and pressure pain (PPT). Mixed linear models were used to estimate variance components for reliability assessment; dependability coefficients were used to simulate alternative test scenarios. RESULTS: Most intraoral QST variability arose from differences between participants (8.8-30.5%), differences between visits within participant (4.6-52.8%), and error (13.3-28.3%). For QST procedures other than CDT and MDT, increasing the number of visits with a single examiner performing the procedures would lead to improved dependability (dependability coefficient ranges: single visit, four examiners=0.12-0.54; four visits, single examiner=0.27-0.68). A wide range of reliabilities for QST procedures, as measured by ICCs, was noted for inter- (0.39-0.80) and intra-examiner (0.10-0.62) variation. CONCLUSION: Reliability of sensory testing can be better assessed by measuring multiple sources of error simultaneously instead of focusing on one source at a time. In experimental settings, large numbers of participants are needed to obtain accurate estimates of treatment effects based on QST measurements. This is different from clinical use, where variation between persons (the person main effect) is not a concern because clinical measurements are done on a single person. IMPLICATIONS: Future studies assessing sensory testing reliability in both clinical and experimental settings would benefit from routinely measuring multiple sources of error. The methods and results of this study can be used by clinical researchers to improve assessment of measurement error related to intraoral sensory testing. This should lead to improved resource allocation when designing studies that use intraoral quantitative sensory testing in clinical and experimental settings.