> For the complete documentation index, see [llms.txt](https://docs.workongpt.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.workongpt.com/copy-of-language-zhong-wen/guidance-book/voice-to-text.md).

# Voice to Text

Workongpt.com also offers Speech to Text, which allows you to upload an audio file and turn it into a Word document.With Whisper and other AI models developed by OpenAI, we can not only transcribe the audio, but also optimize its readability. Additionally, you can provide a description of the scenario and processing requirements to take the secondary processing one step further.

Using Speech to Text

* **Upload an audio file:**
  * Click the ‘Upload’ button and select the desired audio file from your device.
  * Make sure the file format is supported. Supported formats include flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav and webm.
  * Only files less than 100M are supported.
* **Perform secondary processing (optional)**
  * If you want AI to optimise the transcription to improve readability, select the "Secondary Processing" option. This allows the AI to refine the content to ensure it is natural and easy to read. <mark style="color:red;">For better transcription results, we highly recommend that you enable secondary processing.</mark>
  * If you decide to use the Secondary Processing feature, enter the necessary contextual description to guide the AI in understanding the context of the audio.
* **Start transcribing:**
  * After uploading the file and completing the corresponding settings, click the ‘Convert’ button.
  * Depending on the length of the audio, the conversion may take a while.
* **Download results:**
  * Once the conversion is complete, a download button will appear.
  * Click the button to download the transcribed content in Word document format, as well as a processed version (if available).
  * The format and structure of the secondary processed output Word document have been optimised for readability.
  * You can also download the transcribed documents by clicking on the triangle symbol next to your username in the navigation bar. In the expanded menu, click on History Files.

<mark style="color:blue;">**Billing**</mark>

The fee for the ‘Speech to Text’ feature is based on the duration of the audio file (in minutes, rounded upwards, at ￥0.08 per minute), as well as the amount of secondary processing tokens consumed; no fee is incurred if you use the ChatGLM model for secondary processing. Fees will be deducted from your account balance. Please make sure you have a sufficient balance before starting the conversion. This feature does not support the deduction of members' GPT quota currently. For billing rates, please see the Site Policies section ([Click to jump to Pricing Policies](/copy-of-language-zhong-wen/policy-section/pricing-policy.md)).

<mark style="color:blue;">**Saftey & Privacy**</mark>

Your uploaded audio files and generated (including secondary processed) transcriptions are protected with the highest degree of privacy. They are stored on the server for a maximum of 30 days, after which they are automatically deleted. With user privacy in mind, by encrypting your uploaded files (SHA-256), we cannot access your user data, not even the website administrator.

For longer audio files, make sure you have a stable internet connection during the upload process. If you experience any problems with the quality of the transcription, make sure the original audio file sounds clear and without too much background noise.