Demand for highly flexible and fast implementations for bitstream parsing and variable-length-decoding (VLD) arises, if applications are targeted that shall support either MPEG-4 or multiple standards like MPEG-2 H.263 or Dolby AC3. The paper shows that especially today's multimedia oriented RISC processors incorporating multiple parallel arithmetic units are slowed down by these kind of bit-level operations. Therefore, a new architecture is proposed, that adds function specific blocks into the data path of a RISC processor, that are highly adapted to the processing of variable-length coded bitstream data. The increased functional complexity of basic instructions results in a significant speedup over software implementations on standard RISC processors. Two typical functions, that are frequently used in bitstream parsing, ShowBits (reading a certain number of bits) and GetBits (reading and removing a certain number of bits from the incoming bitstream), are executed in a single clock-cycle with a 64 bit rotator circuit. Constant input-rate VLD of one, two or four bits per clock-cycle can be implemented using internal RAM. Lookup-tables can be used for word-parallel decoding and VLC. Optionally memory entries can be saved using content addressable memories (CAMs) in addition to a data RAM. The proposed architecture has been implemented as a functional extension to an existing RISC core with additional 9k gates of logic, 8k RAM and an interface to a CAM. Synthesis results show an estimate of 160 MHz achievable clock frequency using a 0.35 mu technology. The resulting performance is sufficient for MPEG-2 HDTV or MPEG-4 applications.