summaryrefslogtreecommitdiff
path: root/docs/contenthandler.txt
diff options
context:
space:
mode:
Diffstat (limited to 'docs/contenthandler.txt')
-rw-r--r--docs/contenthandler.txt184
1 files changed, 184 insertions, 0 deletions
diff --git a/docs/contenthandler.txt b/docs/contenthandler.txt
new file mode 100644
index 00000000..899554af
--- /dev/null
+++ b/docs/contenthandler.txt
@@ -0,0 +1,184 @@
+The ContentHandler facility adds support for arbitrary content types on wiki pages, instead of relying on wikitext
+for everything. It was introduced in MediaWiki 1.21.
+
+Each kind of content ("content model") supported by MediaWiki is identified by unique name. The content model determines
+how a page's content is rendered, compared, stored, edited, and so on.
+
+Built-in content types are:
+
+* wikitext - wikitext, as usual
+* javascript - user provided javascript code
+* css - user provided css code
+* text - plain text
+
+In PHP, use the corresponding CONTENT_MODEL_XXX constant.
+
+A page's content model is available using the Title::getContentModel() method. A page's default model is determined by
+ContentHandler::getDefaultModelFor($title) as follows:
+
+* The global setting $wgNamespaceContentModels specifies a content model for the given namespace.
+* The hook ContentHandlerDefaultModelFor may be used to override the page's default model.
+* Pages in NS_MEDIAWIKI and NS_USER default to the CSS or JavaScript model if they end in .js or .css, respectively.
+ Pages in NS_MEDIAWIKI default to the wikitext model otherwise.
+* The hook TitleIsCssOrJsPage may be used to force a page to use the CSS or JavaScript model.
+ This is a compatibility feature. The ContentHandlerDefaultModelFor hook should be used instead if possible.
+* The hook TitleIsWikitextPage may be used to force a page to use the wikitext model.
+ This is a compatibility feature. The ContentHandlerDefaultModelFor hook should be used instead if possible.
+* Otherwise, the wikitext model is used.
+
+Note that is currently no mechanism to convert a page from one content model to another, and there is no guarantee that
+revisions of a page will all have the same content model. Use Revision::getContentModel() to find it.
+
+
+== Architecture ==
+
+Two class hierarchies are used to provide the functionality associated with the different content models:
+
+* Content interface (and AbstractContent base class) define functionality that acts on the concrete content of a page, and
+* ContentHandler base class provides functionality specific to a content model, but not acting on concrete content.
+
+The most important function of ContentHandler is to act as a factory for the appropriate implementation of Content. These
+Content objects are to be used by MediaWiki everywhere, instead of passing page content around as text. All manipulation
+and analysis of page content must be done via the appropriate methods of the Content object.
+
+For each content model, a subclass of ContentHandler has to be registered with $wgContentHandlers. The ContentHandler
+object for a given content model can be obtained using ContentHandler::getForModelID( $id ). Also Title, WikiPage and
+Revision now have getContentHandler() methods for convenience.
+
+ContentHandler objects are singletons that provide functionality specific to the content type, but not directly acting
+on the content of some page. ContentHandler::makeEmptyContent() and ContentHandler::unserializeContent() can be used to
+create a Content object of the appropriate type. However, it is recommended to instead use WikiPage::getContent() resp.
+Revision::getContent() to get a page's content as a Content object. These two methods should be the ONLY way in which
+page content is accessed.
+
+Another important function of ContentHandler objects is to define custom action handlers for a content model, see
+ContentHandler::getActionOverrides(). This is similar to what WikiPage::getActionOverrides() was already doing.
+
+
+== Serialization ==
+
+With the ContentHandler facility, page content no longer has to be text based. Objects implementing the Content interface
+are used to represent and handle the content internally. For storage and data exchange, each content model supports
+at least one serialization format via ContentHandler::serializeContent( $content ). The list of supported formats for
+a given content model can be accessed using ContentHandler::getSupportedFormats().
+
+Content serialization formats are identified using MIME type like strings. The following formats are built in:
+
+* text/x-wiki - wikitext
+* text/javascript - for js pages
+* text/css - for css pages
+* text/plain - for future use, e.g. with plain text messages.
+* text/html - for future use, e.g. with plain html messages.
+* application/vnd.php.serialized - for future use with the api and for extensions
+* application/json - for future use with the api, and for use by extensions
+* application/xml - for future use with the api, and for use by extensions
+
+In PHP, use the corresponding CONTENT_FORMAT_XXX constant.
+
+Note that when using the API to access page content, especially action=edit, action=parse and action=query&prop=revisions,
+the model and format of the content should always be handled explicitly. Without that information, interpretation of
+the provided content is not reliable. The same applies to XML dumps generated via maintenance/dumpBackup.php or
+Special:Export.
+
+Also note that the API will provide encapsulated, serialized content - so if the API was called with format=json, and
+contentformat is also json (or rather, application/json), the page content is represented as a string containing an
+escaped json structure. Extensions that use JSON to serialize some types of page content may provide specialized API
+modules that allow access to that content in a more natural form.
+
+
+== Compatibility ==
+
+The ContentHandler facility is introduced in a way that should allow all existing code to keep functioning at least
+for pages that contain wikitext or other text based content. However, a number of functions and hooks have been
+deprecated in favor of new versions that are aware of the page's content model, and will now generate warnings when
+used.
+
+Most importantly, the following functions have been deprecated:
+
+* Revisions::getText() and Revisions::getRawText() is deprecated in favor Revisions::getContent()
+* WikiPage::getText() is deprecated in favor WikiPage::getContent()
+
+Also, the old Article::getContent() (which returns text) is superceded by Article::getContentObject(). However, both
+methods should be avoided since they do not provide clean access to the page's actual content. For instance, they may
+return a system message for non-existing pages. Use WikiPage::getContent() instead.
+
+Code that relies on a textual representation of the page content should eventually be rewritten. However,
+ContentHandler::getContentText() provides a stop-gap that can be used to get text for a page. Its behavior is controlled
+by $wgContentHandlerTextFallback; per default it will return the text for text based content, and null for any other
+content.
+
+For rendering page content, Content::getParserOutput() should be used instead of accessing the parser directly.
+ContentHandler::makeParserOptions() can be used to construct appropriate options.
+
+
+Besides some functions, some hooks have also been replaced by new versions (see hooks.txt for details).
+These hooks will now trigger a warning when used:
+
+* ArticleAfterFetchContent was replaced by ArticleAfterFetchContentObject
+* ArticleInsertComplete was replaced by PageContentInsertComplete
+* ArticleSave was replaced by PageContentSave
+* ArticleSaveComplete was replaced by PageContentSaveComplete
+* ArticleViewCustom was replaced by ArticleContentViewCustom (also consider a custom implementation of the view action)
+* EditFilterMerged was replaced by EditFilterMergedContent
+* EditPageGetDiffText was replaced by EditPageGetDiffContent
+* EditPageGetPreviewText was replaced by EditPageGetPreviewContent
+* ShowRawCssJs was deprecated in favor of custom rendering implemented in the respective ContentHandler object.
+
+
+== Database Storage ==
+
+Page content is stored in the database using the same mechanism as before. Non-text content is serialized first. The
+appropriate serialization and deserialization is handled by the Revision class.
+
+Each revision's content model and serialization format is stored in the revision table (resp. in the archive table, if
+the revision was deleted). The page's (current) content model (that is, the content model of the latest revision) is also
+stored in the page table.
+
+Note however that the content model and format is only stored if it differs from the page's default, as determined by
+ContentHandler::getDefaultModelFor( $title ). The default values are represented as NULL in the database, to preserve
+space.
+
+Storage of content model and format can be disabled altogether by setting $wgContentHandlerUseDB = false. In that case,
+the page's default model (and the model's default format) will be used everywhere. Attempts to store a revision of a page
+using a model or format different from the default will result in an error.
+
+
+== Globals ==
+
+There are some new globals that can be used to control the behavior of the ContentHandler facility:
+
+* $wgContentHandlers associates content model IDs with the names of the appropriate ContentHandler subclasses.
+
+* $wgNamespaceContentModels maps namespace IDs to a content model that should be the default for that namespace.
+
+* $wgContentHandlerUseDB determines whether each revision's content model should be stored in the database.
+ Defaults is true.
+
+* $wgContentHandlerTextFallback determines how the compatibility method ContentHandler::getContentText() will behave for
+ non-text content:
+ 'ignore' causes null to be returned for non-text content (default).
+ 'serialize' causes the serialized form of any non-text content to be returned (scary).
+ 'fail' causes an exception to be thrown for non-text content (strict).
+
+
+== Caveats ==
+
+There are some changes in behavior that might be surprising to users:
+
+* Javascript and CSS pages are no longer parsed as wikitext (though pre-save transform is still applied). Most
+importantly, this means that links, including categorization links, contained in the code will not work.
+
+* With $wgContentHandlerUseDB = false, pages can not be moved in a way that would change the
+default model. E.g. [[MediaWiki:foo.js]] can not be moved to [[MediaWiki:foo bar]], but can still be moved to
+[[User:John/foo.js]]. Also, in this mode, changing the default content model for a page (e.g. by changing
+$wgNamespaceContentModels) may cause it to become inaccessible.
+
+* action=edit will fail for pages with non-text content, unless the respective ContentHandler implementation has
+provided a specialized handler for the edit action. This is true for the API as well.
+
+* action=raw will fail for all non-text content. This seems better than serving content in other formats to an
+unsuspecting recipient. This will also cause client-side diffs to fail.
+
+* File pages provide their own action overrides that do not combine gracefully with any custom handlers defined by a
+ContentHandler. If for example a File page used a content model with a custom revert action, this would be overridden by
+WikiFilePage's handler for the revert action.