/
How are segments in MS Office files (Word, Excel, PowerPoint) extracted?
How are segments in MS Office files (Word, Excel, PowerPoint) extracted?
- 1 Introduction
- 2 What is and can be extracted for a particular MS Office file?
- 2.1 Word – .doc, .docx, .rtf
- 2.1.1 DoNotTranslate restriction
- 2.1.2 Hidden option
- 2.1.3 Alternate content
- 2.1.4 Text not exposed in the source file
- 2.1.5 What else is extracted?
- 2.2 Excel (normal) – .xls, .xlsx, .xlsm
- 2.2.1 General information
- 2.2.2 What cell formats are extracted?
- 2.2.3 Other formats
- 2.2.4 What else is extracted?
- 2.3 PowerPoint – .pttx
- 2.3.1 What is extracted?
- 2.4 Custom configuration
- 2.5
- 2.1 Word – .doc, .docx, .rtf