Questions tagged [tfrecord]
TensorFlow Record Format. A TFRecord file represents a sequence of (binary) strings. The format is not random access, so it is suitable for streaming large amounts of data but not suitable if fast sharding or other non-sequential access is desired.
423
questions
73votes
7answers
51kviews
How to inspect a Tensorflow .tfrecord file?
I have a .tfrecord but I don't know how it is structured. How can I inspect the schema to understand what the .tfrecord file contains?
All Stackoverflow answers or documentation seem to assume I know ...
61votes
8answers
69kviews
How do I convert a directory of jpeg images to TFRecords file in tensorflow?
I have training data that is a directory of jpeg images and a corresponding text file containing the file name and the associated category label. I am trying to convert this training data into a ...
35votes
7answers
40kviews
TensorFlow - Read all examples from a TFRecords at once?
How do you read all examples from a TFRecords at once?
I've been using tf.parse_single_example to read out individual examples using code similar to that given in the method read_and_decode in the ...
31votes
5answers
21kviews
Obtaining total number of records from .tfrecords file in Tensorflow
Is it possible for obtain the total number of records from a .tfrecords file ? Related to this, how does one generally keep track of the number of epochs that have elapsed while training models? While ...
29votes
3answers
14kviews
how to store numpy arrays as tfrecord?
I am trying to create a dataset in tfrecord format from numpy arrays. I am trying to store 2d and 3d coordinates.
2d coordinates are numpy array of shape (2,10) of type float64
3d coordinates are ...
20votes
1answer
27kviews
TensorFlow strings: what they are and how to work with them
When I read file with tf.read_file I get something with type tf.string. Documentation says only that it is "Variable length byte arrays. Each element of a Tensor is a byte array." (https://www....
18votes
2answers
19kviews
Tensorflow TFRecord: Can't parse serialized example
I am trying to follow this guide in order to serialize my input data into the TFRecord format but I keep hitting this error when trying to read it:
InvalidArgumentError: Key: my_key. Can't parse ...
16votes
1answer
9kviews
Numpy to TFrecords: Is there a more simple way to handle batch inputs from tfrecords?
My question is about how to get batch inputs from multiple (or sharded) tfrecords. I've read the example https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L410. ...
16votes
3answers
3kviews
How to efficiently save a Pandas Dataframe into one/more TFRecord file?
First I want to quickly give some background. What I want to achieve eventually is to train a fully connected neural network for a multi-class classification problem under tensorflow framework.
The ...
13votes
6answers
5kviews
Split .tfrecords file into many .tfrecords files
Is there any way to split .tfrecords file into many .tfrecords files directly, without writing back each Dataset example ?
13votes
3answers
4kviews
How to visualize a TFRecord?
I was asked this on another forum but thought I'd post it here for anyone that is having trouble with TFRecords.
TensorFlow's Object Detection API can produce strange behavior if the labels in the ...
10votes
1answer
11kviews
AttributeError: 'Tensor' object has no attribute 'numpy' in Tensorflow 2.1
I am trying to convert the shape property of a Tensor in Tensorflow 2.1 and I get this error:
AttributeError: 'Tensor' object has no attribute 'numpy'
I already checked that the output of tf....
10votes
1answer
2kviews
How to download a sentinel images from google earth engine using python API in tfrecord
While trying to download sentinel image for a specific location, the tif file is generated by default in drive but its not readable by openCV or PIL.Image().Below is the code for the same. If I use ...
9votes
2answers
7kviews
TensorFlow - Read video frames from TFRecords file
TLDR; my question is on how to load compressed video frames from TFRecords.
I am setting up a data pipeline for training deep learning models on a large video dataset (Kinetics). For this I am using ...
9votes
1answer
11kviews
Proper way to iterate tf.data.Dataset in session for 2.0
I have downloaded some *.tfrecord data from the youtube-8m project. You can download a 'small' portion of the data with this command:
curl data.yt8m.org/download.py | shard=1,100 partition=2/video/...
9votes
1answer
13kviews
Numpy array to TFrecord
I'm trying to train a custom dataset through tensorflow object detection api. Dataset contains 40k training images and labels which are in numpy ndarray format (uint8). training dataset shape=2 ([...
9votes
1answer
5kviews
Tensorflow: Modern way to load large data
I want to train a convolutional neural network (using tf.keras from Tensorflow version 1.13) using numpy arrays as input data. The training data (which I currently store in a single >30GB '.npz' ...
9votes
0answers
2kviews
How to decode Unicode string in Tensorflow's graph pipeline
I have created a tfRecord file to store data. I have to store Hindi text so, I have saved it in the bytes using string.encode('utf-8').
But, I am stuck at the time of reading the data. I am reading ...
8votes
2answers
5kviews
How to use Dataset API to read TFRecords file of lists of variant length?
I want to use Tensorflow's Dataset API to read TFRecords file of lists of variant length. Here is my code.
def _int64_feature(value):
# value must be a numpy array.
return tf.train.Feature(...
8votes
3answers
6kviews
Tensorflow object detection API killed - OOM. How to reduce shuffle buffer size?
System information
OS Platform and Distribution: CentOS 7.5.1804
TensorFlow installed from: pip install tensorflow-gpu
TensorFlow version: tensorflow-gpu 1.8.0
CUDA/cuDNN version: 9.0/7.1.2
GPU model ...
8votes
2answers
7kviews
Tensorflow/models uses COCO 90 class ids although COCO has only 80 categories
The labelmaps of Tensorflows object_detection project contain 90 classes, although COCO has only 80 categories.
Therefore the parameter num_classes in all sample configs is set to 90.
If i now ...
8votes
0answers
672views
TensorFlow Example vs SequenceExample
Theres not that much information given in the TensorFlow documentation:
https://www.tensorflow.org/api_docs/python/tf/train/Example
https://www.tensorflow.org/api_docs/python/tf/train/SequenceExample
...
7votes
1answer
11kviews
TensorFlow Dataset.shuffle - large dataset [duplicate]
I'm using TensorFlow 1.2 with a dataset in a 20G TFRecord file. There is about half a million samples in that TFRecord file.
Looks like if I choose a value smaller than the amount of records in the ...
7votes
3answers
3kviews
Tensorflow: Count number of examples in a TFRecord file -- without using deprecated `tf.python_io.tf_record_iterator`
Please read post before marking Duplicate:
I was looking for an efficient way to count the number of examples in a TFRecord file of images. Since a TFRecord file does not save any metadata about the ...
7votes
2answers
6kviews
tensorflow ValueError: features should be a dictionary of `Tensor`s. Given type: <class 'tensorflow.python.framework.ops.Tensor'>
This is my code!
My tensorflow version is 1.6.0, python version is 3.6.4.
If I direct use dataset to read csv file, I can train and no wrong. But I convert csv file to tfrecords file, it's wrong. I ...
7votes
1answer
2kviews
How to import tfrecord files in a pandas dataframe?
I have a tfrecord file and would like to import it in a pandas dataframe or numpy array.
I found tools to read tfrecords but they only work inside a tensorflow session, which is not the use case I ...
7votes
2answers
6kviews
In TensorFlow 2.0, how to feed TFRecord data to keras model?
I've tried to solve classification problem whose input data having 32 features and 16 labels by Deep Neural Network (DNN).
They look like,
# Input data
shape=(32,), dtype=float32,
np.array([-0....
7votes
0answers
160views
Writing from TF Dataset to tfrecords file
It's really easy to read a TFRecords file into a TF Dataset by using a TFRecordDataset, but is there a similar way to write into a TFRecord file given that I already have the info in a Dataset, or do ...
6votes
1answer
3kviews
how can I save a string data to TFRecord?
when save to TFRecord, I use:
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=...
6votes
1answer
4kviews
Read data from TFRecord file used in Object Detection API
I want to read the data stored in a TFRecord file that I've used as a train record in TF Object Detection API.
However, I get an InvalidArgumentError: Input to reshape is a tensor with 91090 values, ...
6votes
0answers
653views
Tensorflow get_single_element not working with tf.data.TFRecordDataset.batch()
I am trying to perform ZCA whitening on a Tensorflow Dataset. In order to do this, I am trying to extract my data from my Dataset as a Tensor, perform the whitening, then create another Dataset after. ...
5votes
1answer
3kviews
Shuffling tfrecords files
I have 5 tfrecords files, one for each object. While training I want to read data equally from all the 5 tfrecords i.e. if my batch size is 50, I should get 10 samples from 1st tfrecord file, 10 ...
5votes
1answer
6kviews
Tensorflow 2.0: how to transform from MapDataset (after reading from TFRecord) to some structure that can be input to model.fit
I've stored my training and validation data on two separate TFRecord files, in which I store 4 values: signal A (float32 shape (150,)), signal B (float32 shape (150,)), label (scalar int64), id (...
5votes
1answer
4kviews
Writing and Reading lists to TFRecord example
I want to write a list of integers (or any multidimensional numpy matrix) to one TFRecords example. For both a single value or a list of multiple values I can creates the TFRecord file without error. ...
5votes
2answers
302views
Chunk tensorflow dataset records into multiple records
I have an unbatched tensorflow dataset that looks like this:
ds = ...
for record in ds.take(3):
print('data shape={}'.format(record['data'].shape))
-> data shape=(512, 512, 87)
-> data ...
5votes
4answers
10kviews
How to load tfrecord in pytorch?
How to use tfrecord with pytorch?
I have downloaded "Youtube8M" datasets with video-level features, but it is stored in tfrecord.
I tried to read some sample from these file to convert it to numpy ...
5votes
3answers
2kviews
TFRecord format for multiple instances of the same or different classes on one training image
I am trying to train a Faster R-CNN on grocery dataset detection using the new Object Detection API, but I do not quite understand the process of creating a TFRecord file for that. I am aware of the ...
5votes
3answers
2kviews
How to create multiple TFRecord files instead of making a big one and then splitting it up?
I'm dealing with quite big time series dataset, one that prepared as SequenceExamples is then written to a TFRecord. This results in a quite large file (over 100GB) but I'd like to have it stored in ...
5votes
2answers
2kviews
How to add class to existing model?
I have trained a model using tensorflow object detection/SSD mobilenet. It works great!
I'd like to add a class to it - just to detect pens or something.
How can I do this?
I have created my image ...
5votes
1answer
607views
Generating TFRecord format data from C+
I'm trying to use TFRecord format to record data from C++ and then use it in python to feed TensorFlow model.
TLDR; Simply serializing proto messages into a stream doesn't satisfy .tfrecord format ...
5votes
0answers
151views
tfRecords shown faulty in TF2
I have a couple of own tfrecord file made by myself.
They are working perfectly in tf1, I used them in several projects.
However if i want to use them in Tensorflow Object Detection API with tf2 (...
4votes
2answers
4kviews
Tensorflow Object Detection, error while generating tfrecord [TypeError: None has type NoneType, but expected one of: int, long]
When checking across different solutions available on the net, most people (including datitran) pointed out that it might be a missing class or a misspell of a class in the train csv file. Am not able ...
4votes
2answers
2kviews
TFrecords occupy more space than original JPEG images
I'm trying to convert my Jpeg image set into to TFrecords. But TFrecord file is taking almost 5x more space than the image set. After a lot of googling, I learned that when JPEG are written into ...
4votes
2answers
2kviews
Tensorflow: read variable length data, via Dataset (tfrecord)
Best
I would like to read some TF records data.
This works, but only for Fixed length data, but now I would like to do the same thing with variable length data VarLenFeature
def load_tfrecord_fixed(...
4votes
2answers
3kviews
Write and Read SparseTensor to and from a tfrecord file
Is it possible to do this elegantly?
Right now only thing I can think of is to save the indices (tf.int64), values (tf.float32), and shape (tf.int64) of the SparseTensor in 3 separate Features (the ...
4votes
3answers
1kviews
Output TFRecord to Google Cloud Storage from Python
I know tf.python_io.TFRecordWriter has a concept of GCS, but it doesn't seem to have permissions to write to it.
If I do the following:
output_path = 'gs://my-bucket-name/{}/{}.tfrecord'.format(...
4votes
1answer
4kviews
How do I write an encoded jpeg as bytes to Tensorflow tfrecord and then read it?
I am trying to use tensorflows tfrecords format to store my datasets.
I managed to read in jpeg images and decode them to raw format and write them to a tfrecord file. I can then later read them ...
4votes
1answer
2kviews
Unable to retrain the instance segmentation model
Im trying to train the instance segmentation model. Im using the following code to generate the tfrecord.
flags = tf.app.flags
flags.DEFINE_string('data_dir', '', 'Root directory to raw pet dataset.'...
4votes
2answers
2kviews
writing tfrecord with multithreading is not fast as expected
Tried to write tfrecord w/ and w/o multithreading, and found the speed difference is not much (w/ 4 threads: 434 seconds; w/o multithread 590 seconds). Not sure if I used it correctly. Is there any ...
4votes
1answer
2kviews
create tfrecord for object detection task
I'm creating my dataset for a fine tuning task using tensorflow object detection api.
My directory structure is :
train/
-- imgs/
---- img1.jpg
-- ann/
---- img1.csv
where the csv, one per ...